ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora

Monday, 20 May 2024 , All day

General information

Date: 20 May, 2024, at LREC-COLING 2024

Time: 09:00 - 18:00 CEST

Location: Lingotto Conference Centre, conference room Roma - Torino (Italy)

Twitter hashtag: #ParlaCLARIN

ParlaCLARIN IV Workshop

Parliamentary data is an important source of scholarly and socially relevant content, serving as a verified communication channel between the elected political representatives and members of the society. The development of accessible, comprehensive and well-annotated parliamentary corpora is therefore crucial for the information society, as such corpora help scientists and investigative journalists to ascertain the accuracy of socio-politically relevant information, and to inform the citizens about the trends and insights on the basis of such data explorations. Research-wise, parliamentary corpora are a quintessential resource for a number of disciplines in digital humanities and social sciences, such as political science, sociology, history, and (socio)linguistics.

The distinguishing characteristic of parliamentary data is that it is spoken language produced in controlled circumstances. Such data has traditionally been transcribed in a formal way but is now also increasingly transcribed with speech-to-text software as well as released in the original audio and video formats, which encourages resource and software development and provides research opportunities related to structuring, synchronisation, visualisation, querying and analysis of parliamentary corpora. Therefore, a harmonised approach to data curation practices for this type of data can support the advancement of the field significantly. One of the ways in which the research community is supported in this line of work is through the conversion of existing corpora and further development of new cross-national parliamentary corpora into a highly comparable, harmonised set of multilingual resources. These allow researchers to share comparative perspectives and to perform multidisciplinary research on parliamentary data. We envision that the ParlaCLARIN IV workshop, as a venue for knowledge and experience exchange on the topic, will contribute to the development and growth of the field of digital parliamentary science.

Objective

This fourth ParlaCLARIN workshop is a continuation of the 2018, 2020 and 2022 editions held at the respective LREC conferences, see references below. On the one hand, it continues to bring together developers, curators and researchers of regional, national and international parliamentary debates from across diverse disciplines in the Humanities and Social Sciences. On the other hand, we envisage the appearance of new discussion threads, tasks, and challenges that are partially inspired by or related to the new data releases such as ParlaMint and data formats such as ParlaCLARIN.

We have invited unpublished original work focusing on (but not exclusive to):

Compilation, annotation, visualisation and utilisation of historical or contemporary parliamentary written or audio records
Harmonisation of existing multilingual parliamentary resources, containing either synchronic or diachronic data or both
Linking or comparing of parliamentary records with other datasets of political discourse such as party manifestos, political speeches, political campaign debates, and social media posts, and to other sources of structured knowledge, such as formal ontologies and LOD datasets (in particular for the description of speakers, political parties, etc.)

Special themes for this year’s workshop are:

Enrichment of parliamentary proceedings (with e.g. sentiment annotation, political profiling of speakers etc.) and research using such data
Machine translation of parliamentary proceedings and research using such data
Argument mining of parliamentary debates

Apart from the dissemination of the results, the workshop also aims to address the identified obstacles, discuss open issues and coordinate future efforts in this increasingly trans-national and cross-disciplinary community.

Keynote

Ines Rehbein, University of Mannheim

Resources and Methods for Analysing Political Rhetoric and Framing in Parliamentary Debates
Recent work in political science has made extensive use of methods to produce evidential support for a variety of analyses, for example, inferring an actor’s ideological positions from textual data or identifying the polarisation of the political discourse over the last decades. Most work has employed variations of lexical features extracted from text or has learned latent representations in a mostly unsupervised manner. While such approaches have the potential to enable political analyses at scale, they are often limited by their lack of interpretability. In the talk, I will instead look at semantic and pragmatic representations of political rhethoric and ideological framing and present several case studies that showcase how linguistic annotation and the use of NLP methods can help to investigate different framing strategies in parliamentary debates. The first part of the talk investigates populist framing strategies, specifically, the use of pronouns to create in- and out-groups and the identification of people-centric messages. The second part of the presentation focusses on framing strategies on the pragmatic level.

Proceedings

The proceedings are available here.

Programme

9:00 - 9:10	Welcome and Introduction
9:10 - 10:30	ParlaMint Parliamentary Discourse Research in Political Science: Literature Review. Jure Skubic and Darja Fišer Compiling and Exploring a Portuguese Parliamentary Corpus: ParlaMint- PT. José Aires, Aida Cardoso, Rui Pereira and Amalia Mendes Gender, Speech, and Representation in the Galician Parliament: An Analysis Based on the ParlaMint-ES-GA Dataset. Adina I. Vladu, Elisa Fernández Rei, Carmen Magariños and Noelia García Díaz Bulgarian ParlaMint 4.0 corpus as a testset for Part-of-speech tagging and Named Entity Recognition. Petya Osenova and Kiril Simov
10:30 - 11:00	Coffee break
11:00 - 12:00	Keynote: Inis Rehbein Resources and Methods for Analysing Political Rhetoric and Framing in Parliamentary Debates
12:00 - 12:40	Creation of Parliamentary Language Resources PTPARL-V: Portuguese Parliamentary Debates for Voting Behaviour Study. Afonso Sousa and Henrique Lopes Cardoso Polish Round Table Corpus. Maciej Ogrodniczuk, Ryszard Tuora and Beata Wójtowicz
12:40 - 14:00	Lunch
14:00 - 15:00	Analysis of Parliamentary Discourse Investigating Multilinguality in the Plenary Sessions of the Parliament of Finland with Automatic Language Identification. Tommi Jauhiainen, Jussi Piitulainen, Erik Axelson, Ute Dieckmann, Mietta Lennes, Jyrki Niemi, Jack Rueter and Krister Lindén Exploring Word Formation Trends in Written, Spoken, Translated and Interpreted European Parliament Data – A Case Study on Initialisms in English and German. Katrin Menzel Quantitative Analysis of Editing in Transcription Process in Japanese and European Parliaments and its Diachronic Changes. Tatsuya Kawahara
15:00 - 15:40	Language Technology for Parliamentary Discourse Automated Emotion Annotation of Finnish Parliamentary Speeches Using GPT-4. Otto Tarkka, Jaakko Koljonen, Markus Korhonen, Juuso Laine, Kristian Martiskainen, Kimmo Elo and Veronika Laippala Making Parliamentary Debates More Accessible: Aligning Video Recordings with Text Proceedings in Open Parliament TV. Olivier Aubert and Joscha Jäger
15:40 - 16:00	Poster pitches
16:00 - 16:30	Coffee break
16:30 - 17:45	Poster session Russia and Ukraine through the Eyes of ParlaMint 4.0: A Collocational CADS Profile of Spanish and British Parliamentary Discourses. Maria Calzada Perez Multilingual Power and Ideology identification in the Parliament: a reference dataset and simple baselines. Çağrı Çöltekin, Matyáš Kopp, Meden Katja, Vaidas Morkevicius, Nikola Ljubešić and Tomaž Erjavec IMPAQTS: a multimodal corpus of parliamentary and other political speeches in Italy (1946-2023), annotated with implicit strategies. Federica Cominetti, Lorenzo Gregori, Edoardo Lombardi Vallauri and Alessandro Panunzi ParlaMint Ngram viewer: Multilingual Comparative Diachronic Search Across 26 Parliaments. Asher de Jong, Taja Kuzman, Maik Larooij and Maarten Marx Investigating Political Ideologies through the Greek ParlaMint corpus. Maria Gavriilidou, Dimitris Gkoumas, Stelios Piperidis and Prokopis Prokopidis ParlaMint in TEITOK. Maarten Janssen and Matyáš Kopp Historical Parliamentary Corpora Viewer. Alenka Kavčič, Martin Stojanoski and Matija Marolt The dbpedia R Package: An Integrated Workflow for Entity Linking (for ParlaMint Corpora). Christoph Leonhardt and Andreas Blaette Video Retrieval System Using Automatic Speech Recognition for the Japanese Diet. Mikitaka Masuyama, Tatsuya Kawahara and Kenjiro Matsuda One Year of Continuous and Automatic Data Gathering from Parliaments of European Union Member States. Ota Mikušek Government and Opposition in Danish Parliamentary Debates. Costanza Navarretta and Dorte Haltrup Hansen A new Resource and Baselines for Opinion Role Labelling in German Parliamentary Debates. Ines Rehbein and Simone Paolo Ponzetto ParlaMint Widened: a European Dataset of Freedom of Information Act Documents (Position Paper). Gerda Viira, Maarten Marx and Maik Larooij
17:45 - 18:00	Closing remarks

Organising Committee

Programme Committee

Kaspar Beelen, The Alan Turing Institute, GB
Siddharth Bhargava, Fondazione Bruno Kessler, IT
Andreas Blaette, University of Duisburg-Essen, DE
Hajo Boomgaarden, University of Vienna, AT
Robert Borges, Uppsala University, SE
Çağrı Çöltekin, University of Tübingen, DE
Tomaž Erjavec, Dept. of Knowledge Technologies, Jožef Stefan Institute, SI
Francesca Frontini, Istituto di Linguistica Computazionale "A. Zampolli" - ILC Consiglio Nazionale delle Ricerche - CNR, IT
Maria Gavriilidou, ILSP / Athena RC, GR
Turo Hiltunen, University of Helsinki, FI
Pasi Ihalainen, University of Jyv.skyl., FI
Tatsuya Kawahara, Kyoto University, JP
Haidee Kotze, Utrecht University, NL
Anna Kryvenko, NISS (Ukraine); INZ (Slovenia), UA
Cristina Lastres-L.pez, University of Seville, ES
Bente Maegaard, University of Copenhagen, DK
Christian Mair, University of Freiburg, DE
Maarten Marx, University of Amsterdam, NL
Monica Monachini, Institute of Computational Linguistics "A. Zampolli" - CNR, IT
Jan Odijk, Utrecht University, NL
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences, PL
Petya Osenova, Sofia University "St. Kl. Ohridski" and IICT-BAS, BG
Stelios Piperidis, Athena RC/ILSP, GR
Maria Pontiki, Institute for Language and Speech Processing (ILSP), Athena R.C., GR
Simone Paolo Ponzetto, University of Mannheim, DE
Valeria Quochi, Consiglio Nazionale delle Ricerche. Istituto di Linguistica Computazionale "A. Zampolli", IT
Hugo Sanjurjo-Gonz.lez, University of Deusto, ES
Sara Tonelli, FBK, IT
Turo Vartiainen, University of Helsinki, FI
Tanja Wissik, Austrian Academy of Sciences, AT

Invited speaker

Ines Rehbein, University of Mannheim Data and Web Science Group, DE

The workshop is supported by the CLARIN ERIC research infrastructure.

To contact the organisers, please email parlaclarin [at] clarin.eu (parlaclarin[at]clarin[dot]eu) (Subject: [ParlaCLARIN@LREC2024]).