Skip to main content

What's on the agenda? Topic modelling parliamentary debates before and during the COVID-19 pandemic

Goals and Objectives  

The main goal of this tutorial is to introduce basic text mining concepts to digital humanities beginners by applying the Latent Dirichlet Allocation (LDA) topic modelling to a specific use case.

Learning Outcomes

By following this tutorial, the students will learn to:
  • independently perform topic modelling on new data, typically on a comparable corpus of parliamentary debates;
  • understand the pitfalls of topic modelling and know when to and not to apply the method.

Author(s)

Ajda Pretnar Žagar

Researcher

Institute of Contemporary History

Privoz 11, 1000 Ljubljana, Slovenia
 

Other contributors

Kristina Pahor de Maiti: result interpretation, theoretical part on parliamentary debates, testing

Darja Fišer: conceptual design, testing

Description of the Training Materials

(Sub)discipline & language(s)

Topics: Digital Humanities | Language: English

Keywords

Topic modelling, LDA, parliamentary debates, text mining

Project URL
The tutorial is available at https://sidih.github.io/agenda/index.html.
 
The files can be downloaded from:https://sidih.si/20.500.12325/2178.
CLARIN resources

ParlaMint-GB annotated corpus

Target audience

Beginners in digital humanities, specifically anyone who is interested in parliamentary corpora or topic modelling

Facilities required

No specific requirements other than a laptop with admin rights. The student will have to install Orange (an open-source software) and have about 8GB of RAM for the analysis to run fairly smoothly. We provide additional materials for students with less processing power (preprocessed corpora, subsets).

Format

PDF and online (XML)

Licence and (re)use CC-BY-SA
Creation date

12.04.2022

Last modification date  30.05.2022
 

Experience with Using CLARIN Resources in Teaching 

The use is fairly straightforward. ParlaMint corpora are well-annotated in a standard CoNLL-U format. The data was easy to find in the repository, from where it was downloaded and used for the analysis. The rich metadata on the speakers is great for detailed analyses.
 

Reusability Notes 

The materials include the links for independent work (workflows, data, software references). The materials could be easily reused in two ways:

  • Topic modelling on a different data set, for example on a ParlaMint corpus from a different country
  • Expanding on the techniques used in the tutorial, for example, semantic analysis of the corpus, longitudinal comparison, or applying a different topic model

All the procedures used in the tutorial are language-agnostic, so no additional changes need to be made for non-English corpora.

Cite this Work

Pretnar Žagar, Ajda, Kristina Pahor de Maiti, and Darja Fišer. 2022. What's on the agenda? Topic modelling parliamentary debates before and during the COVID-19 pandemic.
 

Contact Information

Teachers who reuse and adapt this training material are invited to share their feedback via training [at] clarin.eu (training[at]clarin[dot]eu).