CLARIN Annual Conference 2024
internal | |
main conference | |
keynote |
CLARIN2024 | Proceedings |
Conference Programme Outline
All day | K-centre Workshop | MR.07 |
9:00 – 10:30 |
Technical Centres Assessment Committee (TCAC) CLARIN National Coordinators' Forum (NCF) Part 1 |
MR.06 MR.05 |
10:30 - 11:00 | Coffee Break | |
11:00 - 13:00 |
CLARIN National Coordinators' Forum (NCF) Part 2 Technical Centres Committee (TCC) User Involvement Committee (UIC) |
MR.05 MR.06 MR.08 |
13:00 - 14:00 | Lunch Break (only for participants of pre-conference meetings) | |
13:30 - 15:30 | Photographer (for committee members) | MR.15 |
14:00 - 15:30 |
CLARIN Standards and Interoperability Committee (SIC) Knowledge Infrastructure Committee (KIC) CLARIN Legal Issues Committee (CLIC) |
MR.06 MR.05 MR.08 |
Workshop: CLARIN101 | MR.07 | |
Building bridges with industry | MR.01 | |
15:30 - 16:00 | Coffee Break | |
16:00 - 16:15 |
Conference Opening Session Steven Krauwer Award |
MR.09
|
16:15 - 17:00 | Keynote by Maite Melero |
MR.09
|
17:00 - 18:00 | Papers (Poster Format) | MR.10+11 |
17:30 - 18:30 | PhD Meet-Up | lobby |
19:30 - 22:00 | Conference dinner | Barceló Sants |
09:00 - 09:10 | Presentation by Programme Committee Chair | MR.09 |
09:10 - 09:15 | Presentation by Local National Coordinator | MR.09 |
09:15 - 10:00 | Pitches by CLARIN Committees | MR.09 |
10:00 - 10:30 | State of the Technical Infrastructure | MR.09 |
10:30 - 11:00 | Coffee Break | |
11:00 - 13:00 | Abstract presentations: Resources and usage | MR.09 |
13:00 - 13:45 | Lunch | |
13:45 - 14:45 | PhD Poster Session | MR.10+11 |
14:45 - 15:45 | Abstract presentations: Education | MR.09 |
15:45 - 16:15 | Coffee Break | |
16:15 - 17:35 | Abstract presentations: Core CLARIN infrastructure | MR.09 |
17:00 - 19:00 | Photographer (for committee members) | MR.15 |
17:35 - 19:00 |
Bazaar
|
MR.10+11 |
19:30 - 22:30 | Conference dinner |
Barceló Sants
|
09:00 - 10:20 | Abstract presentations: Metadata | MR.09 |
10:20 - 11:00 | Group Photo and Coffee Break | |
11.00 - 11:45 | Keynote by Steven Bird | MR.09 |
11:45 - 12:45 |
Abstract presentations: Centres and Resource Families
|
MR.09 |
12:45 - 13:00 |
Closing Remarks
Award Best PhD Poster
|
MR.09 |
13:00 - 14:00 | Lunch | |
14:00 - 17:00 | SAB Meeting | MR.03 |
14:00 - 17:00 |
Post-Conference Workshop: Comparable and Interoperable Corpora
Post-Conference Workshop: SSH Open Marketplace
|
|
Keynotes
Maite Melero Barcelona Supercomputing Center Tuesday 15 October, 16:15 - 17:00 The Future of Language (and Cultural) |
Steven Bird Charles Darwin University Thursday, 16 October, 11:00 - 11:45 Making it Meaningful |
Conference Programme Details
Pre-Conference (invite-only)
Time | Monday 14 October 2024 | Room |
all day | K-centre Workshop | MR.07 |
Day One
Time | Tuesday 15 October 2024 | Room |
9:00 – 10:30 |
Technical Centres Assessment Committee (TCAC) CLARIN National Coordinators' Forum (NCF) Part 1 |
MR.06 MR.05 |
10:30 - 11:00 | Coffee Break |
|
11:00 - 13:00 |
CLARIN National Coordinators' Forum (NCF) Part 2 Technical Centres Committee (TCC) User Involvement Committee (UIC) |
MR.05 MR.06 MR.08 |
13:30 - 15:30 | Photographer (for committee members) | MR.15 |
13:00 - 14:00 | Lunch Break | |
14:00 - 15:30 |
CLARIN Standards and Interoperability Committee (SIC) Knowledge Infrastructure Committee (KIC) CLARIN Legal Issues Committee (CLIC) |
MR.06 MR.05 MR.08 |
Workshop: CLARIN101 The aim of this session is to give an introduction to the research infrastructure to those conference participants who are new to CLARIN and/or not yet familiar with CLARIN. After a brief introduction by one of its founders, Steven Krauwer, and an overview of the main services, the participants will have the opportunity to ask questions and try some of the services themselves. The workshop is organised by Steven Krauwer, Vincent Vandeghinste and Iulianna van der Lek.
|
MR.07 | |
Building bridges with industry This first industry track session in a CLARIN Annual Conference brings together academics, research infrastructure experts and industry representatives from the Spanish industry landscape specialising on AI and language technologies, as well as healthcare, customer support and telecommunications. The objective of the session is to build and open up bridges between industrial R&D, academic research and existing resource infrastructures such as CLARIN ERIC. The industry experts will provide their perspectives on the value of collaborating with academic institutions and research infrastructures to foster more effective knowledge, data and technology transfer in order to accelerate technological advancements and create broader societal benefits.
|
MR.01 | |
15:30 - 16:00 | Coffee Break | |
Start of the Conference | ||
16:00 - 16:15 |
Conference Opening Session Steven Krauwer Award |
MR.09
|
16:15 - 17:00 |
Keynote by Maite Melero Chair: Vincent Vandeghinste The Future of Language (and Cultural) Diversity in the Age of AI In the rapidly evolving digital landscape, language stands at the intersection of technology and cultural identity. This talk explores the complex relationship between AI technologies and the world’s linguistic and cultural landscapes, emphasizing on the critical role of Machine Translation and generative AI in shaping the future of language diversity. We will discuss the impact of globalization on language use and the effects that English as a lingua franca may have on the preservation of linguistic diversity.
As AI-driven tools like machine translation and language processing become more sophisticated, they offer unprecedented opportunities for minority languages, potentially enabling speakers to communicate globally without sacrificing their native tongues. However, as we will explore, these technologies also pose significant challenges, including the risk of reinforcing the dominance of major languages and, crucially, introducing cultural biases in AI models. These biases, which often go unnoticed, are rooted in the unequal representation of cultures in the training data and can contribute to the silent extinction of cultural diversity in the digital world. We will also examine the implications of generative AI in the language marketplace, the evolving role of human translators, and the ethical considerations in developing inclusive and culturally aware AI systems. The talk will conclude by reflecting on the role of policymakers, researchers, and language communities in preserving our global linguistic and cultural heritage. |
MR.09 |
17:00 - 18:00 |
Papers (Poster Format) Chair: Andreas Witt Anda Baklāne - Text collections as data at the National Library of Latvia Alessandro Tommasi, Cesare Zavattari and Valeria Quochi - REST services for Corpus management Annotation and SearcH Elena Montiel-Ponsoda, Paula Diez-Ibarbia and Patricia Martín-Chozas - The Spanish INESData and TeresIA Projects as Potential Contributors to CLARIN |
MR.10+11 |
17:30 - 18:30 |
PhD Meet-Up An informal session for PhD students to meet and get to know each other. Information will be shared via email. |
lobby |
19:30 - 22:00 | Conference dinner | Barceló Sants |
Day Two
Time | Wednesday 16 October 2024 | Room |
09:00 - 09:10 | Presentation by Programme Committee Chair | MR.09 |
09:10 - 09:15 | Presentation by Local National Coordinator |
MR.09
|
09:15 - 10:00 | Pitches by CLARIN Committees |
MR.09
|
10:00 - 10:30 | State of the Technical Infrastructure | MR.09 |
10:30 - 11:00 | Coffee Break | |
11:00 - 13:00 |
Abstract presentations: Resources and usage Chair: Starkaður Barkarson |
MR.09
|
11:00 - 11:20 | Tanja Wissik - An Infrastructural Approach to Terminology Work: The Case of Research Infrastructures | |
11:20 - 11:40 | Lilja Björk Stefánsdóttir and Anton Karl Ingason - Using the Icelandic Gigaword Corpus to Explain Lifespan Change | |
11:40 - 12:00 | Steinþór Steingrímsson, Einar Freyr Sigurðsson and Björn Halldórsson - Evaluating Capabilities of MT Systems in Translating Idiomatic Expressions Using a Specialized Dataset | |
12:00 - 12:20 | Begoña Altuna and Iker García-Ferrero - Prepare to be Amazed: NoticIA, the Spanish Clickbait Dataset Transforming the Way We Read News | |
12:20 - 12:40 | Tess Dejaeghere, Pranaydeep Singh, Els Lefever and Julie Birkholz - On the creation of multilingual NER and ASBA workflows for literary-historical texts with chat-based LLMs | |
12:40 - 13:00 | Magnus Ahltorp and Maria Skeppstedt - Word Rain as a Service | |
13:00 - 13:45 | Lunch |
|
13:45 - 14:45 |
PhD Poster Session Chair: Martin Wynne |
MR.10+11 |
14:45 - 15:45 |
Abstract presentations: Education Chair: Tanja Wissik |
MR.09 |
14:45 - 15:05 | Inguna Skadiņa, Jana Kuzmina, Sergejs Kruks, Marina Platonova, Tatjana Smirnova and Ilze Auzina - Language Technology Initiative - Bridging the Gap between Research and Education | |
15:05 - 15:25 | Nikola Ljubešić, Taja Kuzman, Ivana Filipović Petrović, Jelena Parizoska and Petya Osenova - CLASSLA-Express: a Train of CLARIN.SI Workshops on Language Resources and Tools with Easily Expanding Route | |
15:25 - 15:45 | Giulia Pedonese, Francesca Frontini, Dario Del Fante and Eleonora Federici - Adapting UPSKILLS learning modules to the university curricula: best practices and lesson learnt from the H2IOSC training experience at the University of Ferrara | |
15:45 - 16:15 | Coffee Break |
|
16:15 - 17:35 |
Abstract presentations: Core CLARIN Infrastructure Chair: Monica Monachini |
MR.09
|
16:15 - 16:35 | Erik Körner, Thomas Eckart, Felix Helfer and Uwe Kretschmer - Federated Content Search: Advancing the Common Search Infrastructure | |
16:35 - 16:55 | Jakob Lenardič and Kristina Pahor de Maiti Tekavčič - The citation of language resource technologies in CLARIN | |
16:55 - 17:15 | Pawel Kamocki, Aleksei Kelli, Costanza Navaretta, Andrius Puksas, Mateja Jemec Tomazin, Benito Trollip and Silvia Calamai - New laws, new opportunities – the effect of the Digital Services Act and the Data Act on access to language data for research purposes | |
17:15 - 17:35 | Xabier Goenaga, Aritz Farwell, Joseba Fernandez de Landa and Xabier Arregi - Constructing the CLARIAH-EUS CLARIN B-Centre: First Steps | |
17:00 - 19:00 | Photographer (for committee members) | MR.15 |
17:35 - 19:00 |
Bazaar Chair: Costanza Navarretta |
MR.10+11 |
19:30 - 22:30 | Conference dinner | Barceló Sants |
Day Three
Time | Thursday 17 October 2024 | Room |
09:00 - 10:20 |
Abstract presentations: Metadata Chair: Gunn Inger Lyse |
MR.09 |
09:00 - 09:20 | Maarten Van Gompel - FAIR Tool Discovery: an automated software metadata harvesting pipeline for CLARIAH | |
09:20 - 09:40 | Nannan Liu and Mariachiara Russo - A core metadata schema for interpreting corpora: Implementation on the Unified Interpreting Corpus (UNIC) platform | |
09:40 - 10:00 | Daan Broeder and Jan Odijk - Vocabularies in CLARIN : Problems and suggested solutions | |
10:00 - 10:20 | Claus Zinn and Thorsten Trippel - On the Successful Migration of Research Data | |
10:20 - 11:00 | Group Photo and Coffee Break |
|
11:00 - 11:45 |
Keynote by Steven Bird Chair: Lars Borin Making it Meaningful Despite their manifold benefits, language technologies are contributing to several unfolding crises. Small screens deliver mainstream content across the world and entice children of minoritised communities away from their ancestral languages. Data centres power large language models and depend on the mining of ever more rare earth metals and the emission of ever more carbon. Malicious actors flood social media with fake news, provoking extremism, division, and even war. Common to these crises is content, i.e. language content, increasingly generated and accessed using language technologies. These crises – the language crisis, the environmental crisis, and the meaning crisis – compound each other in what is now being referred to as the metacrisis. How are we to respond, then, as a community of practice who is continuing to create language resources that enable the development of still more language technologies? I believe that a good first step is to bring our awareness to the matter and to rethink what we are doing. We must be suspicious of purely technological solutions which may only exacerbate problems that were created by our use of technology. Instead, I argue that we should approach the problem as social and cultural, and reexamine our initiatives to develop "research infrastructure for language as social and cultural data". In this presentation I will share stories from a small and highly multilingual indigenous society who understands language not as sequence data but as social practice, and who understands language resources not as annotated text and speech but as the stories and knowledge practices carried by the country and by elders. I will explore ramifications for our work in the space of language resources and technologies, and suggest some ways forward that avoid extractive processes and centre speech communities, all the while making it meaningful. |
MR.09 |
11:45 - 12:45 |
Abstract presentations: Centres and Resource Families Chair: Cristina Grisot
|
MR.09
|
11:45 - 12:05 | Tomaž Erjavec, Nikola Ljubešić, Katja Meden, Taja Kuzman, Cyprian Laskowski, Jan Jona Javoršek, Simon Krek, Mateja Jemec Tomazin and Jakob Lenardič - CLARIN.SI, the Slovenian node of CLARIN: ten years on | |
12:05 - 12:25 | Henk Van den Heuvel, Nicola Bessell, Katarzyna Klessa, Alice Lee, Satu Saalasti and Eric Sanders - A CLARIN Resource Family for Corpora of Communication Disorders | |
12:25 - 12:45 | Steven Coats - A Development Outlook for CLARIN’s Northernmost Center | |
12:45 - 13:00 | Best PhD Poster Award Closing Remarks |
MR.09 |
13:00 - 14:00 | Lunch |
|
14:00 - 17:00 | SAB Meeting | MR.03 |
14:00 - 17:00 |
Post-Conference Workshop: Comparable and Interoperable Corpora
Post-Conference Workshop: SSH Open Marketplace
|
MR.08 MR.07 |
Time | Friday 18 October 2024 | Room |
09:00 - 17:00 |
The FCS Hackathon is a workshop that aims to give a deep introduction into the Federated Content Search and will help participants to develop their own FCS endpoints. The CLARIN Federated Content Search (FCS) is a search engine and infrastructure that connects heterogeneous data collections and search engines hosted locally at CLARIN centres and provides users with a uniform interface to discover and search in interesting language resources. |
MR.03 |