Text Mining Belval Campus

Text Mining and Topic Modelling using a Structural Topic Model

In 2001 a project has been created to transform the former steel production site in Belval, Luxembourg into the Cité des Sciences. In 2015 the new campus has been opened. Campus Belval

In this analysis I want to explore how this topic is being represented in the news.

Therefore I first collected the top 58 news articles from Google News. The texts have been translated to English with DeepL and structured in a .txt document like so:

Title: Title_Name
DATE: dd.mm.yyyy
.
. Text
.
Title: Title_Name
DATE: dd.mm.yyyy
.
. Text
.

Wordcloud

Next a Wordcloud with the word frequency of the whole corpus has been created: We can already see that the new campus is of high importance. But also, the look into the future (“2022”, “future”), and the acknowledgement of its history (“furnace”, “steel”, “industrial”…) are often thematised.

Term Frequency-Inverse Document Frequency

At this point the corpus does not contain many articles for 2014 and 2015. Therefore these articles have been combined for the further analysis. The next figure shows the number of articles per year:

With the wordcloud we explored the word frequency of the whole corpus. To analyse the keywords of each year, a term frequency-inverse document frequency has been conducted:

Structural Topic Model

Next a Structural Topic Model (STM) has been applied on the data set. From the previous steps we already have gained an understanding of the complexity of the coverage of reports about Belval. Therefore, we knew that there are not that many different topics. After an iterative process of setting the parameter K (number of topics) and interpreting the results, I set K = 6. These 6 topics have been labeled manually. To evaluate the importance and the change over time of these topics, the topic of each article within the corpus has been predicted. The result is shown in the final figure: The most frequent articles are those that deal with the university campus in Belval itself. The second most frequent type of article is the one dealing with structural change, i.e. the transformation from a former steel industry location to the Cité des Sciences, the city of science. It should be emphasized that articles on this topic predominantly highlight the positive implementation of structural change. This is followed by articles on the topic of culture, although it is clearly visible here that there has been increased reporting on this area since 2019. This is due to the fact that this year Belval was admitted to the Capital of Culture 2022. This analysis was complemented by a qualitative interview with a partner from the University of Luxembourg about the Belval project.


The code is available on GitHub.

Sebastian T. Brinkmann
Sebastian T. Brinkmann
BSc Student in Physical Geography

My research interests include spatial analysis, remote sensing, data science, urban forestry and epidemiology.

Related