The Project
Organised by the University of Helsinki, the online hackathon ‘Parliamentary Debates in COVID Times’ was a short, intense project that took place from 19 to 28 May, 2021. Inspired by the recently completed ParlaMint dataset, this multilingual, interdisciplinary project brought together a team of social scientists, computational anthropologists, digital historians, linguists and computer scientists. The main focus of the project were the parliamentary transcripts from the period of the COVID-19 pandemic from four European countries: Italy, Poland, Slovenia and the UK. The team analysed the data in order to determine how the parliamentary debates during the pandemic differed from the pre-COVID period, and to identify the differences and similarities between the four countries.
Methodology
As their main data source, the team used the ParlaMint 2.1 dataset, a multilingual set of uniformly annotated corpora of parliamentary proceedings.
For keyword analysis and collocations, the team used the NoSketch Engine tool. With the help of the ‘word list’ function, the team compiled a list of the top fifty keywords for each language. The keywords – those words more likely to appear in the COVID subcorpus than in the reference subcorpus – were determined by calculating the keyness score.
The ‘collocations’ functionality was used to create lists of collocations, which were then sorted by the logDice score, indicating the collocation’s significance. However, in order to achieve a more meaningful result, which correlated with the specific terms used in the parliamentary debates, the team established collocation networks for specific time periods, based on the seed term ’virus’.
Outcome
The results showed that the majority of the top fifty keywords for all countries were related to the pandemic. In addition, there was a strong overlap among the manually selected top twenty COVID-related keywords across the four countries, with keywords falling into two broad semantic clusters: the pandemic itself (for example, virus, pandemic, infection) and the reaction to the pandemic (quarantine, ventilator, mask).
In terms of the most prominent collocations, there were also clear parallels – both for the pandemic itself (such as outbreak, crisis, death, cause, time, infection, emergency, global, impact) and also for the measures that were taken (against, response, preparedness, handle, recovery, fund, reform, stability, guideline, reopen).
The collocation networks offered useful insight into the relationship between key terms in the parliamentary discussions, especially when viewed against a timeline. Although some words and clusters referred to country-specific debates, overall the four countries exhibited similarities in terms of the themes that emerged. At first, in March 2020, debates in all countries focused on crisis response, but in subsequent months the discussions increasingly centred on the measures needed to contain the virus, such as lockdowns and quarantine. Other themes that emerged included the polarisation of public opinion and vaccines.
When comparing the timelines of word frequencies against the epidemiological situation, the team noted that during the first wave of the pandemic in the spring of 2020, the increase of COVID-related parliamentary discussions mirrored the rise in the number of cases. However, during the second wave, this was not the case – the increase of COVID-related debates was less pronounced than the actual increase of infections.
CLARIN Tools and Resources
The project was based on the recently published ParlaMint 2.1 dataset. The sessions in the corpora are marked as either belonging to the COVID-19 period (after 1 November 2019), or as ‘reference’ (before that date). This resource includes transcripts of parliamentary sessions for seventeen parliaments in sixteen languages, with around 500 million words in total. The corpora contain extensive metadata, such as the speaker’s name, gender and party affiliation, as well as linguistic annotations of the transcripts, such as named entities and lemmas.
Views on CLARIN
Isabella Calabretta, Digital Product Manager at Cambridge University Press & Assessment
Courtney Dalton, MLIS student at Simmons University, Boston, Massachusetts
Richard Griscom, PhD, Postdoc, Centre for Linguistics, Leiden University
Marta Kołczyńska, PhD, Assistant Professor, Institute of Political Studies, Polish Academy of Sciences
Matej Klemen, Young Researcher, Faculty of Computer and Information Science, University of Ljubljana
Kristina Pahor de Maiti, Research Assistant, Faculty of Arts, University of Ljubljana
Ajda Pretnar Žagar, PhD, Researcher, Faculty of Computer and Information Science, University of Ljubljana, and Institute of Contemporary History