Helsinki Digital Humanities Hackathon 2021: Parliamentary Debates in COVID Times

CLARIN Impact stories

Some quick description text
CLARIN impact stories banner
Impact stories logo mobile

Helsinki Digital Humanities Hackathon DHH21: ‘Parliamentary Debates in COVID Times’

Contributors: Isabella Calabretta, Courtney Dalton, Richard Griscom, Marta Kołczyńska, Matej Klemen, Kristina Pahor de Maiti, Ajda Pretnar Žagar, Ruben Ros



Organised by the University of Helsinki, the online hackathon ‘Parliamentary Debates in COVID Times’ was a short, intense project that took place from 19 to 28 May, 2021. Inspired by the recently completed ParlaMint dataset, this multilingual, interdisciplinary project brought together a team of social scientists, computational anthropologists, digital historians, linguists and computer scientists. The main focus of the project were the parliamentary transcripts from the period of the COVID-19 pandemic from four European countries: Italy, Poland, Slovenia and the UK. The team analysed the data in order to determine how the parliamentary debates during the pandemic differed from the pre-COVID period, and to identify the differences and similarities between the four countries.

‘Compiling a corpus is already a big project, so being able to skip this step was a huge privilege. Also, knowing that the corpus was granted permission to be included in the CLARIN repository already gives you some idea of its quality.’
Kristina Pahor de Maiti





SNE plot with perplexity 20 and exaggeration


As their main data source, the team used the ParlaMint 2.1 dataset, a multilingual set of uniformly annotated corpora of parliamentary proceedings.

For keyword analysis and collocations, the team used the NoSketch Engine tool. With the help of the ‘word list’ function, the team compiled a list of the top fifty keywords for each language. The keywords – those words more likely to appear in the COVID subcorpus than in the reference subcorpus – were determined by calculating the keyness score.

The ‘collocations’ functionality was used to create lists of collocations, which were then sorted by the logDice score, indicating the collocation’s significance. However, in order to achieve a more meaningful result, which correlated with the specific terms used in the parliamentary debates, the team established collocation networks for specific time periods, based on the seed term ’virus’.

In order to identify which keywords occurred across all four countries, and which were country-specific, the hackathon team then manually selected the top twenty COVID-related keywords from each parliament and translated them into English. The fastText embedding model and t-SNE visualisations from Orange were used to retrieve and map word vectors.

Using ggplot2, the team then plotted timelines of word frequencies using relative occurrences, and added a curve indicating the number of COVID cases, thus illustrating the relation between the parliamentary debates and the epidemiological situation in in each country.