Skip to main content

CLARIN at LREC-COLING 2024

, - ,

General Information

Date 

  • 22-23-24 May 2024: Main Conference
  • 20-21-25 May 2024: Workshops & Tutorials
Location

About

LREC is the major event on Language Resources (LRs) and Evaluation for Human Language Technologies (HLT). The conference provides an overview of the state-of-the-art regarding LRs and their applications. Participants can exchange information, discuss methodologies, industrial use cases and requirements coming from e-science and e-society, with respect to scientific and technological issues as well as policy and organisational ones.

CLARIN-related activities at LREC-COLING 2024


Contributions to the Main Conference


Workshops

ParlaCLARIN IV Workshop  – organised by CLARIN ERIC

Monday 20 May, from 9:00 to 18:00

The ParlaCLARIN IV workshop at LREC-COLING 2024 will focus on the topic of ‘Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora’. Parliamentary (language) data serves as a communication channel between elected political representatives and members of society, thus reflecting socio-politically relevant information. The development of accessible, comprehensive and well-annotated parliamentary corpora is crucial for a number of disciplines, such as political science, sociology, history, and (socio)linguistics. The workshop will bring together developers, curators and researchers of regional, national and international parliamentary debates from across diverse disciplines in the humanities and social sciences.

Holocaust Testimonies as Language Resources – co-organised by Isuri Anuradha, Ingo Frommholz, Francesca Frontini, Martin Wynne, Ruslan Mitkov, Paul Rayson, Alistair Plum

Holocaust testimonies serve as a bridge between survivors and history’s darkest chapters, providing a connection to the profound experiences of the past. Testimonies stand as the primary source of information that describe the Holocaust, offering first-hand accounts and personal narratives of those who experienced it. The majority of testimonies are captured in an oral format, as survivors vividly explain and share their personal experiences and observations from that time period. Transforming Holocaust testimonies into a machine-processable digital format can be a difficult task owing to the unstructured nature of the text. The creation of accessible, comprehensive, and well-annotated Holocaust testimony collections is of paramount importance to our society. These collections empower researchers and historians to validate the accuracy of socially and historically significant information, enabling them to share critical insights and trends derived from these data. This workshop will investigate a number of ways in which techniques and tools from natural language processing and corpus linguistics can contribute to the exploration, analysis, dissemination and preservation of Holocaust testimonies.

2nd International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability – co-organised by Federico Gaspari, Joss Moorkens, Itziar Aldabe, Begoña Altuna, Aritz Farwell, Stelios Piperidis, Georg Rehm and German Rigau

The 2nd International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability will be held as a half-day hybrid event co-located with LREC-COLING 2024 at the Lingotto Conference Centre in Turin, Italy, on the afternoon of Saturday 25th May 2024. At least one author of accepted papers is expected to attend the workshop in person at the conference venue to give the presentation face-to-face. Regular participants who are not presenting papers may attend online. Please note that it is possible to attend the 2nd International TDLE Workshop without attending or registering for the main conference.

The Fifth Workshop on Resources for African Indigenous Languages – RAIL – co-organised by Rooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen

The fifth Resources for African Indigenous Languages (RAIL) workshop will be co-located with LREC-COLING 2024 in Lingotto Conference Centre, Torino, Italy on 25 May 2024. The RAIL workshop is an interdisciplinary platform for researchers working on resources (data collections, tools, etc.) specifically targeted towards African indigenous languages. In particular, it aims to create the conditions for the emergence of a scientific community of practice that focuses on data, as well as computational linguistic tools specifically designed for or applied to indigenous languages found in Africa.
 

Oral and Poster Presentations

Day 1, Tuesday 22 May
 
11:00-12:40, Session D1-S2-P1: Speech Resources and Processing I. Chair: Sara Tonelli. Room: Poster Area I
Samrómur Milljón: An ASR Corpus of One Million Verified Read Prompts in Icelandic (Carlos Daniel Hernandez Mena, Þorsteinn Daði Gunnarsson and Jon Gudnason)
11:00-12:40, Session D1-S2-P1: Discourse and Pragmatics. Chair: Sara Tonelli. Room: Poster Area I

European Language Grid : One year after (Georg Rehm, Stelios Piperidis, Dimitris Galanis, Penny Labropoulou, Maria Giagkou, Miltos Deligiannis, Leon Voukoutis, Martin Courtois, Julian Moreno-Schneider and Katrin Marheinecke)

Developing a Rhetorical Structure Theory Treebank for Czech (Lucie Polakova, Jiří Mírovský, Šárka Zikánová and Eva Hajicova)

Announcing the Prague Discourse Treebank 3.0 (Pavlína Synková, Jiří Mírovský, Lucie Poláková and Magdaléna Rysová)

Cost-Effective Discourse Annotation in the Prague Czech–English Dependency Treebank (Jiří Mírovský, Pavlína Synková, Lucie Polakova and Marie Paclíková)

Universal Anaphora: The First Three Years (Massimo Poesio, Maciej Ogrodniczuk, Vincent Ng, Sameer Pradhan, Juntao Yu, Nafise Sadat Moosavi, Silviu Paun, Amir Zeldes, Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský and Daniel Zeman)

DiscoGeM 2.0: A Parallel Corpus of English, German, French and Czech Implicit Discourse Relations (Frances Yung, Merel Scholman, Sarka Zikanova and Vera Demberg)

 

Session D1-S2-P1: CL and Linguistic Theories, Cognitive Modeling and Psycholinguistics I (Chair: Sara Tonelli) Room: Poster Area I (Pavillion 1 - Lingotto Fiere) Fine-grained Classification of Circumstantial Meanings within the Prague Dependency Treebank Annotation Scheme (Marie Mikulova)
11:00-12:40, Session D1-S2-P1: Policy issues, Ethics, Legal Issues, Bias Analysis. Chair: Sara Tonelli. Room: Poster Area I
 
Common European Language Data Space (Gkoumas, Annika Grützner-Zahn, Athanasia Kolovou, Penny Labropoulou, Andis Lagzdiņš, Elena Leitner, Valérie Mapelli, Hélène Mazo, Simon Ostermann, Stefania Racioppa, Mickaël Rigault and Leon Voukoutis)
17:30-19:10, Session D1-S3-P3: Document Classification, Information Retrieval and Cross-lingual Retrieval. Chair: François Yvon. Room: Poster Area I
CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation (Nikola Ljubešić and Taja Kuzman)
 
17:30-19:10 Session D1-S3-P3: Inference, Reasoning, Question Answering II. Chair: François Yvon. Room: Poster Area I Jargon: A Suite of Language Models and Evaluation Tasks for French Specialized Domains (Vincent Segonne, Aidan Mannion, Laura Cristina Alonzo Canul, Alexandre Daniel AUDIBERT, Xingyu Liu, Cécile Macaire, Adrien Pupier, Yongxin Zhou, Mathilde Aguiar, Felix E. Herron, Magali Norré, Massih R Amini, Pierrette Bouillon, Iris Eshkol-Taravella, Emmanuelle Esperança-Rodier, Thomas François, Lorraine Goeuriot, Jérôme Goulian, Mathieu Lafourcade, Benjamin Lecouteux, François Portet, Fabien Ringeval, Vincent Vandeghinste, Maximin Coavoux, Marco Dinarelli and Didier Schwab)
15:50-17:10, Session: D1-S3-P2: Speech Resources and Processing II. Chair: Eva Maria Vecchi. Room: Poster Area II

Gos 2: A New Reference Corpus of Spoken Slovenian (Darinka Verdonik, Kaja Dobrovoljc, Tomaž Erjavec and Nikola Ljubešić)

Ensembles of Hybrid and End-to-End Speech Recognition (Aditya Kamlesh Parikh, Louis ten Bosch and Henk van den Heuvel)

15:50-17:10, Session: D1-S3-P2 - Opinion & Argument Mining, Sentiment Analysis, Emotion Recognition/Generation I. Chair: Eva Maria Vecchi. Room: Poster Area II The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings (Michal Mochtak, Peter Rupnik and Nikola Ljubešić)
17:30 - 17:50, Session 10D1-S4-R4: Speech Resources and Processing II. Chair: Jan Odijk. Room: Istanbul Corpus Creation and Automatic Alignment of Historical Dutch Dialect Speech (Martijn Bentum, Eric Sanders, Antal P.J. van den Bosch, Douwe Zeldenrust and Henk van den Heuvel)
17:50 - 18:10, Session D1-S4-R6: Integrated Systems and Applications. Chair: Chris Biemann. Room: Berlino MedMT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain (Iker García-Ferrero, Rodrigo Agerri, Aitziber Atutxa Salazar, Elena Cabrio, Iker de la Iglesia, Alberto Lavelli, Bernardo Magnini, Benjamin Molinet, Johana Ramirez-Romero, German Rigau, Jose Maria Villa-Gonzalez, Serena Villata and Andrea Zaninello)
Day 2, Thursday 23 May
 
09:00-10:40, Janire Arana, Mikel Idoyaga, Maitane Urruela, Elisa Espina, Aitziber Atutxa Salazar and Koldo Gojenola A Virtual Patient Dialogue System Based on Question-Answering on Clinical Records (Janire Arana, Mikel Idoyaga, Maitane Urruela, Elisa Espina, Aitziber Atutxa Salazar and Koldo Gojenola)
09:00-10:40, Session D2-S1-P4: Lexicon and Semantics I. Chair: Atul Kr. Ojha. Room: Poster Area II MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations (Dagmar Gromann, Hugo Goncalo Oliveira, Lucia Pitarch, Elena-Simona Apostol, Jordi Bernad, Eliot Bytyçi, Chiara Cantone, Sara Carvalho, Francesca Frontini, Radovan Garabik, Jorge Gracia, Letizia Granata, Fahad Khan, Timotej Knez, Penny Labropoulou, Chaya Liebeskind, Maria Pia di Buono, Ana Ostroški Anić, Sigita Rackevičienė, Ricardo Rodrigues, Gilles Sérasset, Linas Selmistraitis, Mahammadou Sidibé, Purificação Silvano, Blerina Spahiu, Enriketa Sogutlu, Ranka Stanković, Ciprian-Octavian Truică, Giedrė Valūnaitė Oleškevičienė, Slavko Zitnik and Katerina Zdravkova)
09:00-10:40, Session D2-S1-P4: Offensive and Harmful Language Detection and Analysis. Chair: Atul Kr. Ojha. Room: Poster Area II Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation (Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini and Rodrigo Agerri)
11:00-12:40, Session D2-S2-P5: Parsing, Tagging, Chunking, Grammar, Syntax, Morphosyntax, Morphology. Chair: Enrica Troiano. Room: Poster Area I

Evaluating Shortest Edit Script Methods for Contextual Lemmatization (Olia Toporkov and Rodrigo Agerri)

A Computational Model of Latvian Morphology (Peteris Paikens, Lauma Pretkalniņa and Laura Rituma)

OOVs in the Spotlight: How to Inflect them? (Tomáš Sourada, Jana Straková and Rudolf Rosa)

15:30-17:10, Session D2-S3-P6: Corpora and Annotation III. Chair: Maja Buljan. Room: Poster Area II

FAIRification of LeiLanD (Eric Sanders, Sara Petrollino, Gilles R. Scheifer, Henk van den Heuvel and Christopher Handy)

M2SA: Multimodal and Multilingual Model for Sentiment Analysis of Tweets (Gaurish Thakkar, Sherzod Hakimov and Marko Tadić)

16:50 - 17:10, Session D2-S3-R5: Parsing, Tagging, Chunking, Grammar, Syntax, Morphosyntax, Morphology I. Chair: Kaja Dobrovoljc. Room: Madrid

 

UDMorph: Morphosyntactically Tagged UD Corpora (Maarten Janssen)

17:30 - 19:10, D2-S4-P7: Less Resourced/Endangered/Less-studied Languages II. Chair: Frederic Bechet. Room: Poster Area I BalsuTalka.lv - Boosting the Common Voice Corpus for Low-Resource Languages (Roberts Dargis, Arturs Znotins, Ilze Auzina, Baiba Saulite, Sanita Reinsone, Raivis Dejus, Antra Klavinska and Normunds Gruzitis)
D2-S3-P6 - Lexicon and Semantics II (Chair: Maja Buljan) Room: Poster Area II 

Textual Coverage of Eventive Entries in Lexical Semantic Resources (Eva Fučíková, Cristina Fernández Alcaina, Jan Hajič and Zdeňka Urešová)

Exploring Interpretability of Independent Components of Word Embeddings with Automated Word Intruder Test (Tomáš Musil and David Mareček)

18:30 - 18:50, Session D2-S4-R2: Information Extraction, Knowledge Extraction, and Text Mining II. Chair: Elena Cabrio. Room: 500 Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis (Mikel Zubillaga, Oscar Sainz, Ainara Estarrona, Oier Lopez de Lacalle and Eneko Agirre)
Day 3, Friday 24 May
 
09:00 - 10:40, Session D3-S1-P8: Corpora and Annotation VI. Chair: Andreas Witt. Room: Poster Area II
SUK 1.0: A New Training Corpus for Linguistic Annotation of Modern Standard Slovene (Špela Arhar Holdt, Jaka Čibej, Kaja Dobrovoljc, Tomaž Erjavec, Polona Gantar, Simon Krek, Tina Munda, Nejc Robida, Luka Terčon and Slavko Zitnik)
Session D3-S2-P9: Information Extraction, Knowledge Extraction, and Text Mining III. Chair: Samia Touileb. Room: Poster Area I Automatic Extraction of Language-Specific Biomarkers of Healthy Aging In Icelandic (Elena Callegari, Iris Edda Nowenstein, Ingunn Jóhanna Kristjánsdóttir and Anton Karl Ingason)
11:00 - 12:40, D3-S2-P9: Multilinguality, Machine Translation, and Translation Aids I. Chair: Samia Touileb. Room: Poster Area I

Evaluating Word Expansion for Multilingual Sentiment Analysis of Parliamentary Speech (Yana Nikolova and Costanza Navarretta)

Charles Translator: A Machine Translation System between Ukrainian and Czech (Martin Popel, Lucie Polakova, Michal Novák, Jindřich Helcl, Jindřich Libovický, Pavel Straňák, Tomas Krabac, Jaroslava Hlavacova, Mariia Anisimova and Tereza Chlanova)

Session D3-S2-R6: Parsing, Tagging, Chunking, Grammar, Syntax, Morphosyntax, Morphology II. Chair: Daniel Zeman. Room: Berlino PaReNT (Parent Retrieval Neural Tool): A Deep Dive into Word Formation Across Languages (Emil Svoboda and Magda Sevcikova)
12:00 - 12:20, Session D3-S2-R1: Corpora and Annotation VII. Chair: Rémi Cardon. Room: Auditorium G. Agnelli A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset (Giulia Pensa, Begoña Altuna and Itziar Gonzalez-Dios)
15:50 - 17:10, Session D3-S3-P10: Corpora and Annotation V. Chair: Valentin Barriere. Room: Poster Area II Towards an Ideal Tool for Learner Error Annotation (Špela Arhar Holdt, Tomaž Erjavec, Iztok Kosem and Elena Volodina)
18:30 - 18:50, Session D2-S4-R6: Policy issues, Ethics, Legal Issues, Bias Analysis. Chair: Penny Labropoulou. Room: Berlino Gendered Grammar or Ingrained Bias? Exploring Gender Bias in Icelandic Language Models ( Steinunn Rut Friðriksdóttir and Hafsteinn Einarsson)
12:40 - 13:20, Session D3-S2-RE14: Less-Resourced/Endangered/Less-studied Languages II. Chair: *TBD*.  Zoom: Link14 - Virtual Room2 The IgboAPI Dataset: Empowering Igbo Language Technologies through Multi-dialectal Enrichment (Chris Chinenye Emezue, Ifeoma Okoh, Chinedu Emmanuel Mbonu, Chiamaka Chukwuneke, Daisy Monika Lal, Ignatius Ezeani, Paul Rayson, Ijemma Onwuzulike, Chukwuma Onyebuchi Okeke, Gerald Okey Nweya, Bright Ikechukwu Ogbonna, Chukwuebuka Uchenna ORAEGBUNAM, Esther Chidinma Awo-Ndubuisi and Akudo Amarachukwu Osuagwu)
Session D3-S3-P10: Evaluation and Validation Methodologies III. Chair: Valentin Barriere. Room: Poster Area II. Room: Pavillion 1  How Gender Interacts with Political Values: A Case Study on Czech BERT Models (Adnan Al Ali and Jindřich Libovický)
15:50 - 17:10, Session: D3-S3-P10 - Corpora and Annotation V. Chair: Valentin Barriere. Room: Poster Area II  Building an Infrastructure for Uniform Meaning Representations

Contributions to Co-Allocated Events


Oral and Poster Presentations at Co-allocated Workshops

Monday 20 May
  • Attitudes in Diplomatic Speeches: Introducing the CoDipA UNSC 1.0 (Mariia Anisimova and S ̆ ́arka Zik ́anov ́a; Accepted at Interoperable Semantic Annotation (ISA))

  • 9:10 - 10:30 Parliamentary Discourse Research in Political Science: Literature Review (Jure Skubic and Darja Fišer; Accepted at Interoperable Semantic Annotation (ISA))
  • 9:10 - 10:30 Bulgarian ParlaMint 4.0 corpus as a testset for Part-of-speech tagging and Named Entity Recognition (Petya Osenova and Kiril Simov; Accepted at ParlaCLARIN IV)
  • 9:10-10:30, Gender, Speech, and Representation in the Galician Parliament: An Analysis Based on the ParlaMint-ES- Dataset (Vladu, Adina / Elisa Fernández Rei / Carmen Magariños / Noelia García Díaz; at ParlaCLARIN IV)
  • 14:00 - 14:20 Investigating Multilinguality in the Plenary Sessions of the Parliament of Finland with Automatic Language Identification ( Tommi Jauhiainen, Jussi Piitulainen, Erik Axelson, Ute Dieckmann, Mietta Lennes, Jyrki Niemi, Jack Rueter and Krister Lindén;  Accepted at ParlaCLARIN IV)
  • 16:30 - 17:45 Multilingual Power and Ideology identification in the Parliament: a reference dataset and simple baselines (Çağrı Çöltekin, Matyáš Kopp, Meden Katja, Vaidas Morkevicius, Nikola Ljubešić and Tomaž Erjavec; Accepted at ParlaCLARIN IV)
  • 16:30 - 17:45 ParlaMint Ngram viewer: Multilingual Comparative Diachronic Search Across 26 Parliaments ( Asher de Jong, Taja Kuzman, Maik Larooij and Maarten Marx; Accepted at ParlaCLARIN IV)
  • ParlaMint in TEITOK (Maarten Janssen and Matyáš Kopp;Accepted at ParlaCLARIN IV)

  • 16:30 - 17:45 Government and Opposition in Danish Parliamentary Debates (Costanza Navarretta and Dorte H. Hansen; Accepted at ParlaCLARIN IV)
  • 16:30 - 17:45  Investigating Political Ideologies through the Greek ParlaMint corpus (Maria Gavriilidou, Dimitris Gkoumas, Stelios Piperidis and Prokopis Prokopidis
  • 11:20 - 11:40 Improving Language Coverage on HeLI-OTS (Tommi Jauhiainen and Krister Lindén; Accepted at SIGUL 2024 workshop)
  • 11:30 - 13:00 Less is Enough: Less-Resourced Multilingual AMR Parsing (Bram Vanroy; Accepted at Interoperable Semantic Annotation (ISA))
  • 15:30-15:45 Exploring aspect-based sentiment analysis methodologies for literary-historical research purposes (Tess Dejaeghere, Els Lefever, Pranaydeep Singh and Julie Birkholz; Accepted at LT4HALA workshop)
  • 15:45-16:00 Early Modern Dutch Comedies and Farces in the Spotlight: Introducing EmDComF and its Emotion Framework (Florian Debaene, Kornee van der Haven and Veronique Hos; Accepted at LT4HALA workshop)
  • 16:50 - 17:10 At the Crossroad of Cuneiform and NLP: Challenges for Fine-grained Part-of-speech Tagging (Gustav Ryberg Smidt, Els Lefever and Katrien De Graef; Accepted at DH and Cultural Heritage)
Tuesday 21 May
 
  • Quality and Quantity of Machine Translation References for Automatic Metrics 
    (Vilém Zouhar and Ondřej Bojar; Accepted at Workshop on Human Evaluation of NLP Systems)
  • 11:40-12:00 Uncovering Social Changes of the Basque Speaking Twitter Community During COVID-19 Pandemic (Joseba Fernandez de Landa, Iker García-Ferrero, Ander Salaberria and Jon Ander Campos; Accepted at SIGUL 2024 workshop)
  • Evaluating Summarization Models: investigating the impact of education and language proficiency on reproducibility (Mateusz Lango, Patricia Schmidtova, Simone Balloccu and Ondrej Dusek; Accepted at Workshop on Human Evaluation of NLP Systems)
  • 17:00-17:15 The Simplification of the Language of Public Administration: The Case of Ombudsman Institutions (Gabriel Gonzalez-Delgado and Borja Navarro-Colorado; Accepted at DeTermIt)
  • The 5th International Workshop on Designing Meaning Representations

Keynote Speaker

What Is Not in the Archives": Early Holocaust Testimony as Research Data ( Michal Frankl; Invited at Holocaust Testimonies as Language Resources)

Saturday 25 May

Keynote Speaker


CLARIN Booth at LREC - COLING 2024

CLARIN will be present on 22, 23 and 24 May with a booth, you can visit us to get to know CLARIN better, talk to people from the CLARIN network or browse through our latest publications. 
 
Booth Attendance Schedule
 
  Wednesday 22 Thursday 23  Friday 24

10:40 – 11:00

Morning coffee break

Paul Rayson
CLARIN Ambassador

Costanza Navarretta
National Coordinator

Dedicated to the paper Evaluating Word Expansion for Multilingual Sentiment Analysis of Parliamentary Speech (Yana Nikolova and Costanza Navarretta)

 

Petya Osenova

Dedicated to the paper Bulgarian ParlaMint 4.0 corpus as a testset for Part-of-speech tagging and Named Entity Recognition (Petya Osenova and Kiril Simov)

Henk van den Heuvel
Member of the Board of Directors
13:20 - 14:40 
 
Lunch
   
 

17:10 – 17:30

Afternoon coffee break

Maria Gavriilidou

Dedicated to the poster 'Investigating Political Ideologies through the Greek ParlaMint corpus' (Maria Gavriilidou, Dimitris Gkoumas, Stelios Piperidis and Prokopis Prokopidis

 

Henk van den Heuvel
Member of the Board of Directors