Skip to main content

Programme CLARIN Annual Conference 2024


CLARIN Annual Conference 2024


Date: Tuesday, 15 October 2024 - Thursday, 17 October 2024 (all times are CEST)
Location: Hotel Barceló Sants, Barcelona, Spain
Hashtag: #CLARIN2024
   main conference

CLARIN2024 | Proceedings | 


Conference Programme Outline


Pre-Conference (invite-only)
All day K-centre Workshop MR.07
9:00 – 10:30

Technical Centres Assessment Committee (TCAC)

CLARIN National Coordinators' Forum (NCF) Part 1



10:30 - 11:00 Coffee Break  
11:00 - 13:00

CLARIN National Coordinators' Forum (NCF) Part 2

Technical Centres Committee (TCC)

User Involvement Committee (UIC)




13:00 - 14:00 Lunch Break (only for participants of pre-conference meetings)  
13:30 - 15:30 Photographer (for committee members) MR.15
14:00 - 15:30

CLARIN Standards and Interoperability Committee (SIC)

Knowledge Infrastructure Committee (KIC)

CLARIN Legal Issues Committee (CLIC)




  Workshop: CLARIN101 MR.07
  Building bridges with industry MR.01
15:30 - 16:00 Coffee Break  
16:00 - 16:15

Conference Opening Session

Steven Krauwer Award

16:15 - 17:00 Keynote by Maite Melero
17:00 - 18:00 Papers (Poster Format) MR.10+11
17:30 - 18:30 PhD Meet-Up lobby
19:30 - 22:00 Conference dinner Barceló Sants
09:00 - 09:10 Presentation by Programme Committee Chair MR.09
09:10 - 09:15 Presentation by Local National Coordinator MR.09
09:15 - 10:00 Pitches by CLARIN Committees MR.09
10:00 - 10:30 State of the Technical Infrastructure MR.09
10:30 - 11:00 Coffee Break  
11:00 - 13:00 Abstract presentations: Resources and usage MR.09
13:00 - 13:45 Lunch  
13:45 - 14:45 PhD Poster Session MR.10+11
14:45 - 15:45 Abstract presentations: Education MR.09
15:45 - 16:15 Coffee Break  
16:15 - 17:35 Abstract presentations: Core CLARIN infrastructure MR.09
17:00 - 19:00 Photographer (for committee members) MR.15
17:35 - 19:00
19:30 - 22:30 Conference dinner
Barceló Sants
09:00 - 10:20 Abstract presentations: Metadata MR.09
10:20 - 11:00 Group Photo and Coffee Break  
11.00 - 11:45 Keynote by Steven Bird MR.09
11:45 - 12:45
Abstract presentations: Centres and Resource Families
12:45 - 13:00
Closing Remarks
Award Best PhD Poster
13:00 - 14:00 Lunch  
14:00 - 17:00 SAB Meeting MR.03
14:00 - 17:00
Post-Conference Workshop: Comparable and Interoperable Corpora
Post-Conference Workshop: SSH Open Marketplace
  • MR.08
  • MR.07



Maite Melero

Barcelona Supercomputing Center

Tuesday 15 October, 16:15 - 17:00

The Future of Language (and Cultural)
Diversity in the Age of AI

Steven Bird

Charles Darwin University

Thursday, 16 October, 11:00 - 11:45

Making it Meaningful

Conference Programme Details


Pre-Conference (invite-only)

Time Monday 14 October 2024 Room
all day K-centre Workshop MR.07

Day One

Time Tuesday 15 October 2024 Room
9:00 – 10:30

Technical Centres Assessment Committee (TCAC)

CLARIN National Coordinators' Forum (NCF) Part 1



10:30 - 11:00 Coffee Break
11:00 - 13:00

CLARIN National Coordinators' Forum (NCF) Part 2

Technical Centres Committee (TCC)

User Involvement Committee (UIC)




13:30 - 15:30 Photographer (for committee members) MR.15
13:00 - 14:00 Lunch Break  
14:00 - 15:30

CLARIN Standards and Interoperability Committee (SIC)

Knowledge Infrastructure Committee (KIC)

CLARIN Legal Issues Committee (CLIC)





Workshop: CLARIN101 

The aim of this session is to give an introduction to the research infrastructure to those conference participants who are new to CLARIN and/or not yet familiar with CLARIN. After a brief introduction by one of its founders, Steven Krauwer, and an overview of the main services, the participants will have the opportunity to ask questions and try some of the services themselves. The workshop is organised by Steven Krauwer, Vincent Vandeghinste and Iulianna van der Lek.

Building bridges with industry

This first industry track session in a CLARIN Annual Conference brings together academics, research infrastructure experts and industry representatives from the Spanish industry landscape specialising on AI and language technologies, as well as healthcare, customer support and telecommunications. The objective of the session is to build and open up bridges between industrial R&D, academic research and existing resource infrastructures such as CLARIN ERIC. The industry experts will provide their perspectives on the value of collaborating with academic institutions and research infrastructures to foster more effective knowledge, data and technology transfer in order to accelerate technological advancements and create broader societal benefits.
15:30 - 16:00 Coffee Break  
  Start of the Conference  
16:00 - 16:15

Conference Opening Session

Steven Krauwer Award

16:15 - 17:00

Keynote by Maite Melero

Chair: Vincent Vandeghinste

The Future of Language (and Cultural) Diversity in the Age of AI

In the rapidly evolving digital landscape, language stands at the intersection of technology and cultural identity. This talk explores the complex relationship between AI technologies and the world’s linguistic and cultural landscapes, emphasizing on the critical role of Machine Translation and generative AI in shaping the future of language diversity. We will discuss the impact of globalization on language use and the effects that English as a lingua franca may have on the preservation of linguistic diversity.
As AI-driven tools like machine translation and language processing become more sophisticated, they offer unprecedented opportunities for minority languages, potentially enabling speakers to communicate globally without sacrificing their native tongues. However, as we will explore, these technologies also pose significant challenges, including the risk of reinforcing the dominance of major languages and, crucially,  introducing cultural biases in AI models. These biases, which often go unnoticed, are rooted in the unequal representation of cultures in the training data and can contribute to the silent extinction of cultural diversity in the digital world.
We will also examine the implications of generative AI in the language marketplace, the evolving role of human translators, and the ethical considerations in developing inclusive and culturally aware AI systems. The talk will conclude by reflecting on the role of policymakers, researchers, and language communities in preserving our global linguistic and cultural heritage.
17:00 - 18:00

Papers (Poster Format)

Chair: Andreas Witt

Anda Baklāne - Text collections as data at the National Library of Latvia

Alessandro Tommasi, Cesare Zavattari and Valeria Quochi - REST services for Corpus management Annotation and SearcH

Elena Montiel-Ponsoda, Paula Diez-Ibarbia and Patricia Martín-Chozas - The Spanish INESData and TeresIA Projects as Potential Contributors to CLARIN
Tess Dejaeghere, Pranaydeep Singh, Els Lefever and Julie Birkholz - On the creation of multilingual NER and ASBA workflows for literary-historical texts with chat-based LLMs

Kaja Dobrovoljc - Can’t See the Forest for the Trees: Infrastructure for Investigating Slovene Dependency Treebanks

Magnus Ahltorp and Maria Skeppstedt - Word Rain as a Service

Maria Skeppstedt and Magnus Ahltorp - Using Topics2Themes and Word Rain to visualise topics in Swedish news on climate change

Kiril Simov and Petya Osenova - Modeling events in Bulgarian: a Case Study

Elena Callegari, Agnes Sólmundsdóttir and Anton Karl Ingason - Preserving Privacy in Small Communities: Tailored Anonymization Techniques for Icelandic Conversational Data

Jon Alkorta, Aritz Farwell, Joseba Fernandez de Landa, Begoña Altuna, Ainara Estarrona, Mikel Iruskieta, Xabier Arregi, Xabier Goenaga and Jose Mari Arriola - CLARIAH-EUS: A Strategic Network Helping Basque Country Researchers to Participate in European Research Infrastructures

Costanza Navarretta, Dorte Haltrup Hansen and Bart Jongejan - Enriching the ParlaMint-DK corpus with Policy Domains

Bojana Mikelenić, Antoni Oliver and Marko Tadić - Expansion of the RomCro corpus with texts in Catalan

Dragoș Alexandru Bălan, Khiet Truong, Henk van den Heuvel and Roeland Ordelman - Benchmarking and Research Infrastructures: Evaluating Dutch Automatic Speech Recognition

Anna Szczepaniak-Kozak and Magdalena Jaszczyk-Grzyb - Automatic analysis of covert hate speech: A case study with a focus on sentiment analysis

Sylvie De Cock, Gaëtanelle Gilquin, Sylviane Granger, Pauline Jadoulle and Magali Paquot - Different registers, same learners: Towards a multi-register corpus of learner English

Roberta Bianca Luzietti, Riccardo Del Gratta, Valeria Quochi, Roberta Ottaviani, Daniele Carpita and Monica Monachini - CLARIN in the Italian Open Science Cloud: Landscaping and Community Engagement

Kerim Meijer and Menzo Windhouwer - The CLARIAH-NL FAIR Vocabulary Registry

Eliza Margaretha Illig, Nils Diewald, Paweł Kamocki and Marc Kupietz - Managing Access to Language Resources in a Corpus Analysis Platform

Joel C. Wallenberg, Anton Karl Ingason, Einar Freyr Sigurðsson and Eiríkur Rögnvaldsson - IcePaHC 2024.03 -- A Significant Treebank Upgrade

Gustaf Gren - Towards a Swedish Sign Language Dataset With Pose Estimation Information: Process and Challenges

Xabier Goenaga, Aritz Farwell, Joseba Fernandez de Landa and Xabier Arregi - Constructing the CLARIAH-EUS CLARIN B-Centre: First Steps

Bence Sárossy and Noémi Ligeti-Nagy - How to talk the talk? A comparative overview of keyword usage in Hungarian and Slovenian parliamentary corpora

Angel Daza and Antske Fokkens - Choosing the Right Tool for You: Informed Evaluation of Text Analysis Tools

Jennifer Ecker, Stefan Fischer, Pia Schwarz, Thorsten Trippel, Antonina Werthmann and Rebecca Wilm - Unlocking the Corpus: Enriching Metadata with State-of-the-Art NLP Methodology and Linked Data

Ingunn Jóhanna Kristjánsdóttir, Hafsteinn Einarsson and Anton Karl Ingason - Improving Phrase Structure Parsing for Icelandic

17:30 - 18:30

PhD Meet-Up

An informal session for PhD students to meet and get to know each other. Information will be shared via email. 

19:30 - 22:00 Conference dinner Barceló Sants

Day Two

Time Wednesday 16 October 2024 Room
09:00 - 09:10 Presentation by Programme Committee Chair MR.09
09:10 - 09:15 Presentation by Local National Coordinator
09:15 - 10:00 Pitches by CLARIN Committees
10:00 - 10:30 State of the Technical Infrastructure MR.09
10:30 - 11:00 Coffee Break  
11:00 - 13:00

Abstract presentations: Resources and usage

Chair: Starkaður Barkarson

11:00 - 11:20 Tanja Wissik - An Infrastructural Approach to Terminology Work: The Case of Research Infrastructures  
11:20 - 11:40 Lilja Björk Stefánsdóttir and Anton Karl Ingason - Using the Icelandic Gigaword Corpus to Explain Lifespan Change  
11:40 - 12:00 Steinþór Steingrímsson, Einar Freyr Sigurðsson and Björn Halldórsson - Evaluating Capabilities of MT Systems in Translating Idiomatic Expressions Using a Specialized Dataset  
12:00 - 12:20 Begoña Altuna and Iker García-Ferrero - Prepare to be Amazed: NoticIA, the Spanish Clickbait Dataset Transforming the Way We Read News  
12:20 - 12:40 Tess Dejaeghere, Pranaydeep Singh, Els Lefever and Julie Birkholz - On the creation of multilingual NER and ASBA workflows for literary-historical texts with chat-based LLMs  
12:40 - 13:00 Magnus Ahltorp and Maria Skeppstedt - Word Rain as a Service  
13:00 - 13:45 Lunch
13:45 - 14:45

PhD Poster Session

Chair: Martin Wynne

14:45 - 15:45

Abstract presentations: Education

Chair: Tanja Wissik

14:45 - 15:05 Inguna Skadiņa, Jana Kuzmina, Sergejs Kruks, Marina Platonova, Tatjana Smirnova and Ilze Auzina - Language Technology Initiative - Bridging the Gap between Research and Education  
15:05 - 15:25 Nikola Ljubešić, Taja Kuzman, Ivana Filipović Petrović, Jelena Parizoska and Petya Osenova - CLASSLA-Express: a Train of CLARIN.SI Workshops on Language Resources and Tools with Easily Expanding Route  
15:25 - 15:45 Giulia Pedonese, Francesca Frontini, Dario Del Fante and Eleonora Federici - Adapting UPSKILLS learning modules to the university curricula: best practices and lesson learnt from the H2IOSC training experience at the University of Ferrara  
15:45 - 16:15 Coffee Break
16:15 - 17:35

Abstract presentations: Core CLARIN Infrastructure

Chair: Monica Monachini

16:15 - 16:35 Erik Körner, Thomas Eckart, Felix Helfer and Uwe Kretschmer - Federated Content Search: Advancing the Common Search Infrastructure  
16:35 - 16:55 Jakob Lenardič and Kristina Pahor de Maiti Tekavčič - The citation of language resource technologies in CLARIN  
16:55 - 17:15 Pawel Kamocki, Aleksei Kelli, Costanza Navaretta, Andrius Puksas, Mateja Jemec Tomazin, Benito Trollip and Silvia Calamai - New laws, new opportunities – the effect of the Digital Services Act and the Data Act on access to language data for research purposes  
17:15 - 17:35 Xabier Goenaga, Aritz Farwell, Joseba Fernandez de Landa and Xabier Arregi - Constructing the CLARIAH-EUS CLARIN B-Centre: First Steps  
17:00 - 19:00 Photographer (for committee members) MR.15
17:35 - 19:00


Chair: Costanza Navarretta

19:30 - 22:30 Conference dinner Barceló Sants

Day Three

Time Thursday 17 October 2024 Room
09:00 - 10:20

Abstract presentations: Metadata

Chair: Gunn Inger Lyse

09:00 - 09:20 Maarten Van Gompel - FAIR Tool Discovery: an automated software metadata harvesting pipeline for CLARIAH  
09:20 - 09:40 Nannan Liu and Mariachiara Russo - A core metadata schema for interpreting corpora: Implementation on the Unified Interpreting Corpus (UNIC) platform  
09:40 - 10:00 Daan Broeder and Jan Odijk - Vocabularies in CLARIN : Problems and suggested solutions  
10:00 - 10:20 Claus Zinn and Thorsten Trippel - On the Successful Migration of Research Data  
10:20 - 11:00 Group Photo and Coffee Break
11:00 - 11:45

Keynote by Steven Bird

Chair: Lars Borin

Making it Meaningful

Despite their manifold benefits, language technologies are contributing to several unfolding crises. Small screens deliver mainstream content across the world and entice children of minoritised communities away from their ancestral languages. Data centres power large language models and depend on the mining of ever more rare earth metals and the emission of ever more carbon. Malicious actors flood social media with fake news, provoking extremism, division, and even war. Common to these crises is content, i.e. language content, increasingly generated and accessed using language technologies. These crises – the language crisis, the environmental crisis, and the meaning crisis – compound each other in what is now being referred to as the metacrisis. How are we to respond, then, as a community of practice who is continuing to create language resources that enable the development of still more language technologies?

I believe that a good first step is to bring our awareness to the matter and to rethink what we are doing. We must be suspicious of purely technological solutions which may only exacerbate problems that were created by our use of technology. Instead, I argue that we should approach the problem as social and cultural, and reexamine our initiatives to develop "research infrastructure for language as social and cultural data". In this presentation I will share stories from a small and highly multilingual indigenous society who understands language not as sequence data but as social practice, and who understands language resources not as annotated text and speech but as the stories and knowledge practices carried by the country and by elders. I will explore ramifications for our work in the space of language resources and technologies, and suggest some ways forward that avoid extractive processes and centre speech communities, all the while making it meaningful.

11:45 - 12:45

Abstract presentations: Centres and Resource Families

Chair: Cristina Grisot
11:45 - 12:05 Tomaž Erjavec, Nikola Ljubešić, Katja Meden, Taja Kuzman, Cyprian Laskowski, Jan Jona Javoršek, Simon Krek, Mateja Jemec Tomazin and Jakob Lenardič - CLARIN.SI, the Slovenian node of CLARIN: ten years on  
12:05 - 12:25 Henk Van den Heuvel, Nicola Bessell, Katarzyna Klessa, Alice Lee, Satu Saalasti and Eric Sanders - A CLARIN Resource Family for Corpora of Communication Disorders  
12:25 - 12:45 Steven Coats - A Development Outlook for CLARIN’s Northernmost Center  
12:45 - 13:00 Best PhD Poster Award
Closing Remarks
13:00 - 14:00 Lunch
14:00 - 17:00  SAB Meeting MR.03
14:00 - 17:00
Post-Conference Workshop: Comparable and Interoperable Corpora
Post-Conference Workshop: SSH Open Marketplace

Time Friday 18 October 2024 Room
09:00 - 17:00

FCS Hackathon

The FCS Hackathon is a workshop that aims to give a deep introduction into the Federated Content Search and will help participants to develop their own FCS endpoints.

The CLARIN Federated Content Search (FCS) is a search engine and infrastructure that connects heterogeneous data collections and search engines hosted locally at CLARIN centres and provides users with a uniform interface to discover and search in interesting language resources.

The FCS Hackathon will begin with a lecture on Federated Content Search, introducing the design and goals, the technologies, the software ecosystem and infrastructure, and a brief guide on endpoint development. This will be followed by an active hacking session where participants will start developing their own FCS endpoints. Experienced FCS developers stand ready to help, advise and answer questions. The aim of the hackathon is for each participant to develop a new endpoint and gain knowledge for further customization and development and to become part of the CLARIN FCS community.

Registration can be done via this form. Please be aware there is a limited number of spots available and we will work on a first come, first served basis. 
