Skip to main content

Programme CLARIN Annual Conference 2023

Event name: CLARIN Annual Conference 2023
Date: Monday, 16 October 2023 - Wednesday, 18 October 2023 (all times are CEST)
Location: Irish College Leuven, Leuven, Belgium
Twitter Hashtag: #CLARIN2023
 

CLARIN 2023 | Proceedings

Conference Programme Outline

9:00 – 10:30
  • Centre Assessment Committee (CAC) (cancelled)
  • CLARIN National Coordinators' Forum (NCF) Part 1
  • CR3
  • CR2
10:30 - 11:00 Coffee break  
11:00 - 13:00
  • CLARIN National Coordinators' Forum (NCF) Part 2
  • Standing Committee on CLARIN Technical Centres (SCCTC)
  • User Involvement Committee
  • CR2
  • CR3
  • Green room
13:00 - 14:00 Lunch break  
14:00 - 15:30
  • CLARIN Standards Committee (CSC)
  • Knowledge Infrastructure Committee (KIC)
  • CLARIN Legal Issues Committee (CLIC)
  • CR3
  • Green room
  • CR2
15:30 - 16:00 Coffee break  
16:00 - 16:15
  • Conference Opening Session
  • Steven Krauwer Award
Aula
16:15 - 17:00 Keynote Jörg Tiedemann
Aula
17:00 - 18:00 Papers (Poster Format) Dining room wing
18:15-18:30 Walk to Leuven Town Hall  
18:30 - 19:20 Welcome Reception
Historic Town Hall 
Grote Markt 9
19:30 - 22:00 Welcome Dinner
Domus
Tiensestraat 8
09:00 - 09:10 Presentation by Programme Committee Chair Aula
09:10 - 09:15 Presentation by Local National Coordinator Aula
09:15 - 10:00 Pitches by CLARIN Committees Aula
10:00 - 10:30 State of the Technical Infrastructure Aula
10:30 - 11:00 Coffee Break  
11:00 - 13:00 Abstract Presentations (Infrastructure) Aula
11:00 - 13:00 Teachers' Workshop: Using CLARIN in Training and Education CR2
13:00 - 13:45 Lunch  
13:45 - 14:30 PhD Poster Session Dining room wing
14:30 - 15:30 Abstract Presentations (ParlaMint) Aula
15:30 - 16:00 Coffee Break  
16:00 - 17:20 Abstract Presentations (Tools) Aula
17:30 - 19:00
 
For an overview off all posters, please consult the Bazaar page
 
19:30 - 22:30 Conference Dinner
Faculty Club
Groot Begijnhof 14
09:00 - 10:20 Abstract Presentations (Corpora) Aula
10:20 - 11:00 Group Photo and Coffee Break  
11.00 - 11:45 Keynote by Laurence Devillers Aula
11:45 - 12:45
Abstract Presentations (Metadata and Annotations)
Aula
12:45 - 13:00
Closing Remarks
Aula
13:00 - 14:00 Lunch  
14:00 - 16:00 SAB Meeting Board room
14:00 - 17:00
K-centre Workshop (Part I) (Invite-only)
SSH Marketplace Workshop
EuReCo Workshop (Invite-only)
  • CR2
  • CR7
  • Aula
 
09:00 - 13:00 K-centre Workshop (Part II) (Invite-only) Irish College

 


Keynotes

 

Lost in Meaning - Found in Translation

Jörg Tiedemann

University of Helsinki

Monday 16 October, 16:15 - 17:00

Ethical Issues of Generative AI

Laurence Devillers

University Paris-Sorbonne IV/LIMSI CNRS

Wednesday, 18 October, 11:00 - 11:45


 

 

Conference Programme Details

Day One

Time Monday 16 October 2023 Room
9:00 – 10:30
  • Centre Assessment Committee (CAC) (cancelled)
  • CLARIN National Coordinators' Forum (NCF) Part 1
  • CR3
  • CR2
10:30 - 11:00 Coffee break
 
11:00 - 13:00
  • CLARIN National Coordinators' Forum (NCF) Part 2
  • Standing Committee on CLARIN Technical Centres (SCCTC)
  • User Involvement Committee
  • CR2
  • CR3
  • Green room
13:00 - 14:00 Lunch break  
14:00 - 15:30
  • CLARIN Standards Committee (CSC)
  • Knowledge Infrastructure Committee (KIC)
  • CLARIN Legal Issues Committee (CLIC)
  • CR3
  • Green room
  • CR2
15:30 - 16:00 Coffee break  
  Start of the Conference  
16:00 - 16:15
  • Conference Opening Session
  • Steven Krauwer Award
Aula
16:15 - 17:00

Keynote by Jörg Tiedemann

Lost in Meaning - Found in Translation: Natural Language Understanding with Multilingual Data (slides)

The task of translation involves language understanding and generation and, in this way, naturally combines the two essential challenges in computational linguistics and language technology. In the FoTran project, we are interested in the ability of neural translation models to pick up linguistic properties and to generalise to meaningful representations when trained on large amounts of multilingual data. Our focus is on the effect of linguistic diversity on abstraction and generalisation. In order to study this, we need to create the necessary resources and infrastructure. In this talk, I will first introduce the OPUS ecosystem that fuels our research. In the second part, I will concentrate on the experiments, studies and developments that this ecosystem enables within and outside of FoTran. I also welcome discussions on further directions that can be taken with the multilingual infrastructure we build, looking forward to your input.
Aula
17:00 - 18:00

Papers (Poster Format)

Linguistic Resources and Tools for Ukrainian: Grounds for Creating a K-Centre
Olha Kanishcheva and Maria Shvedova
 
The Making of the CLARIN Resource Family for Oral History: Lessons Learned from ‘Voices from Ravensbrück’ (poster)
Stefania Scagliola, Silvia Calamai, Henk Van Den Heuvel and Christoph Draxler
 
Libraries as Data Infrastructures
Martin Wynne, Andreas Witt, Leinen Peter and Sally Chambers
 
(CI) Workflow for Quality Assurance Checks for Corpora of Multimodal Interaction (poster)
Anne Ferger, André Frank Krause and Karola Pitsch: A Continous Integration 
 
The LiRI Corpus Platform (poster)
Jonathan Schaber, Johannes Graën, Daniel McDonald, Igor Mustač, Nikolina Rajović, Gerold Schneider and Noah Bubenhofer
 
DBBErt: Part-of-Speech Tagging of Pre-Modern Greek Text
Colin Swaelens, Els Lefever and Ilse De Vos
 
A Multilingual Database for Icelandic L2 Flashcards
Xindan Xu, Þórunn Arnardóttir and Anton Karl Ingason
 
Korpusnik: A Corpus Summarizing Tool for Slovene
Iztok Kosem, Jaka Cibej, Kaja Dobrovoljc and Simon Krek
 
Topics in Swedish News on Climate Change: A Timeline 2016 - 2023
Maria Skeppstedt
 
Sharing the Finnish Dark Web Marketplace Corpus  (FINDarC) (poster)
Krister Lindén, Teemu Ruokolainen, Lasse Hämäläinen and Tuomas Harvianen
 
Swissdox@LiRI – A Large Database of Media Articles Made Accessible to Researchers (poster)
Johannes Graën, Igor Mustač, Nikolina Rajović, Jonathan Schaber, Gerold Schneider and Noah Bubenhofer
 
Analyses of Information Security Standards on Data Crawled from Company Web Sites Using SweClarin Resources
Arne Jönsson, Subhomoy Bandyopadhyay, Svjetlana Pantic Dragisic and Andrea Fried
 
Building and Consolidating a FAIR-Compliant Ecosystem of Infrastructures
Cristina Grisot, Noah Bubenhofer, Andrea Malits, Stefanie Strebel, Johannes Graën and Stefan Buerli
 
Dynamically Chaining APIs: from Dracor to TEITOK
Maarten Janssen
 
The ACoDe Project: Creating a Dementia Corpus for Icelandic

 Elena Callegari, Anton Karl Ingason and Agnes Sólmundsdóttir

 
Emotion and Abstractness in Austrian Parliamentary Discourse
Tanja Wissik and Klaus Hofmann
 
Developing Manually-Annotated Corpora for Teaching and Learning Purposes of Brazilian Portuguese, Dutch, Estonian, and Slovene (the CrowLL Project)

 Tanara Zingano Kuhn, Carole Tiberius, Špela Arhar Holdt, Kristina Koppel, Iztok Kosem and Rina Zviel Girshin and Ana R. Luís

Dining room wing
18:15-18:30 Walk to Town Hall Leuven  
18:30 - 19:30 Welcome Reception
Historic Town Hall
Grote Markt 9
19:30 - 22:00  Welcome Dinner
Domus
Tiensestraat 8

Day Two

To be confirmed.
 
Time Tuesday 17 October 2023 Room
09:00 - 09:10 Presentation by Programme Committee Chair (slides) Aula
09:10 - 09:15 Presentation by Local National Coordinator
Aula
09:15 - 10:00 Pitches by CLARIN Committees  (slides)
Aula
10:00 - 10:30 State of the Technical Infrastructure (slides) Aula
10:30 - 11:00 Coffee Break  
11:00 - 13:00

Thematic Session: Infrastructure

Chair: Jurgita Vaičenonienė

Aula
11:00 - 11:20
Standards Information System for CLARIN Centres and Beyond (slides)
 
Piotr Banski and Eliza Margaretha Illig
 
11:20 - 11:40

The CLARIN:EL Infrastructure (slides)

Maria Gavriilidou, Stelios Piperidis, Dimitrios Galanis, Juli Bakagianni, Penny Labropoulou, Athanasia Kolovou, Dimitris Gkoumas, Miltos Deligiannis, Kanella Pouli, Iro Tsiouli, Leon Voukoutis and Katerina Gkirtzou
 
11:40 - 12:00
NB DH-LAB: A Corpus Infrastructure for Social Sciences and Humanities (slides)
 
Magnus Breder Birkenes, Lars G. Johnsen and Andre Kåsen
 
12:00 - 12:20 
CORLI CLARIN K-Centre: Development and Perspectives (slides)
 
Christophe Parisse and Céline Poudat
 
12:20 - 12:40
The SSH Open Marketplace and CLARIN (slides)
 
Alexander König, Laure Barbot, Cristina Grisot, Michael Kurzmeier and Edward J. Gray
 
12:40 - 13:00
CLARIN-IT: Texts, Documents and New Contexts (slides)
 
Federico Boschetti, Angelo Mario Del Grosso, Riccardo Del Gratta, Francesca Frontini and Monica Monachini
 
11:00 - 13:00
Teachers' workshop: Using CLARIN in Training and Education (slides)
Click on Details to view the programme. For more information about the abstracts, please visit the workshop programme page.
 

11:00 - 12:00  Presentations of Accepted Abstracts 

11:00 - 11:10  Welcome and Introduction 
 
Francesca Frontini
 
11:10 - 11:20 Privacy by Design in Linguistic Research
 
Henk van den Heuvel
 

11:20 - 11:30 Teaching Syntax with CLARIN Corpora and Resources 

 
Antonio Balvet
 

11:30 - 11:40 Learning Programming in Python for Linguistics and Language Studies

 
Koenraad De Smedt
 

11:40 - 11:50 NLP Annotation for Digital Scholars 

 
Maarten Janssen and Silvie Cinková 
 
11:50 - 12:00 DH-Course Registry: A Bridge Between Infrastructures, DH Masters Degrees and Industry? 
 
Amelia Sanz, Vicky Garnett, Tom Gheldof, Adeline Joffres, Iulianna van der Lek, Edward Gray,

12:00 - 12:10 Discussion

12:10 - 13:00 Demo of the CLARIN Learning Content in the UPSKILLS project 

12:10-12:20 Introduction to the UPSKILLS Project 
 
Stavros Assimakopoulos 
 
12:20 -12:35 Introduction to Language Data: Standards and Repositories  
 
Iulianna van der Lek 
 
12:35 -12:50 Automatic Speech Recognition and Force Alignment 
 
Louis ten Bosch 
 

12:50 - 13:00 Discussion & Wrap-Up

CR2
13:00 - 13:45 Lunch
 
13:30 - 14:30 PhD Poster Session Dining room wing
14:30 - 15:30

Thematic Session: ParlaMint

Chair: Maciej Piasecki

Aula
14:30 - 14:50

The ParlaMint Project: Ever-Growing Family of Comparable and Interoperable Parliamentary Corpora (slides)

Maciej Ogrodniczuk, Petya Osenova, Tomaž Erjavec, Darja Fišer, Nikola Ljubešić, Çagrı Çöltekin, Matyáš Kopp, Katja Meden and Taja Kuzman

 
14:50 - 15:10

Workflow and Metadata Challenges in the ParlaMint Project: Insights from Building the ParlaMint-UA Corpus (slides)

Anna Kryvenko and Matyáš Kopp
 
15:10 - 15:30

Adding Political Orientation Metadata to ParlaMint Corpora (slides)

Tomaž Erjavec, Katja Meden and Jure Skubic
 
15:30 - 16:00 Coffee Break
 
16:00 - 17:20

Thematic Session: Tools

Chair: Vincent Vandeginste

Aula
16:00 - 16:20

MATEO: Machine Translation Evaluation for Users and Developers (slides)

Bram Vanroy
 
16:20 - 16:40
Domain-Specific Languages for Epigraphy: The Case of ItAnt (slides)
 
Luca Rigobianco, Federico Boschetti and Valeria Quochi
 
16:40 - 17:00

Finding Dutch Multiword Expressions (slides)

Jan Odijk, Martin Kroon, Tijmen Baarda, Ben Bonfil and Sheean Spoel
 
17:00 - 17:20
Automatic Anonymisation of Human Faces in Images of Authentic Social Interaction: A Web Application (slides)
 
André Frank Krause, Anne Ferger and Karola Pitsch
 
17:30 - 19:00 Bazaar Poster Session Dining room wing
19:30 - 22:30 Conference Dinner
Faculty Club
Groot Begijnhof 14

Day Three

Time Wednesday 18 October 2023 Room
09:00 - 10:20

Thematic Session: Corpora

Chair: Tomaž Erjavec

Aula
09:00 - 09:20
A Spoken Academic Belgian Dutch Corpus (slides)
 
Vincent Vandeghinste, Jolien Mathysen, Patrick Wambacq and Elke Peters
 
09:20 - 09:40
NGT-HoReCo and GoSt-ParC-Sign: Two New Sign Language - Spoken Language Parallel Corpora (slides)
 
Mirella De Sisto, Dimitar Shterionov, Lien Soetemans, Vincent Vandeghinste and Caro Brosens
 
09:40 - 10:00
Teaching Syntax with Clarin Corpora and Resources (slides)
 
Antonio Balvet
 
 10:00 - 10:20
A New CLARIN Resource Family for Lexical Semantic Change Research (slides)
 
Paola Marongiu, Fahad Khan and Barbara McGillivray
 
10:20 - 11:00 Group Photo and Coffee Break
 
11:00 - 11:45

Keynote by Laurence Devillers

Ethical Issues of Generative AI (slides)
 

In this keynote, I offer studies and reflections on the ethical issues of generative artificial intelligence (AI). The special feature of generative artificial intelligence systems is that they are based on generative models that can produce multiple outputs: generation of text or images for various purposes such as translation, production of computer code, chatbots, decision support and so on. These models, pre-trained on large datasets, can be optimised to produce a new application using little additional data specific to that task. The social and economic impact of generative AI systems is likely to be major in many potential uses, for example, in the environment or in healthcare. However, these generative AI systems raise many ethical, epistemological, anthropological, psychological, economic, social, political and cultural questions. Some of these issues will continue to occur as these technologies are put to new uses, and it is not yet possible to predict all the effects they will have on individuals and society. Since the end of 2022, economic and political actors in several countries have been discussing the impact of language models built with these generative AI systems. Some of these models have an impressive number of parameters. The race for the largest model is ongoing, but it is not certain that larger models would deliver higher performance. I was involved as a co-writer of the opinion n°7 of the ethical issues of generative artificial intelligence in the CNPEN (National Pilot Committee for Digital Ethics). In this opinion, CNPEN focuses on the most important ethical issues in light of current experience with generative AI systems, mainly on language models.
Aula
11:45 - 12:45

Thematic session: Metadata and Annotations

Chair: Andreas Witt
Aula
11:45 - 12:05

 Documenting Corpus Annotation in CMDI: State of Affairs (slides)

 
Jakob Lenardič
 
12:05 - 12:25
Do Chatbots Dream of Copyright? Copyright in AI-generated Language Data (slides)
 
Pawel Kamocki, Toby Bond, Krister Lindén and Thomas Margoni
 
12:25 - 12:45
Between Lexicon and Grammar: Towards Integrated Valencies for Bulgarian (slides)
 
Petya Osenova and Kiril Simov
 
12:45 - 13:00
  • Best PhD Poster Award
  • Closing Remarks (slides)
Aula
13:00 - 14:00 Lunch
 
14:00 - 16:00  SAB Meeting
Board room
 
14:00 - 17:00
K-Centre Workshop (Part I) (Invite-only)

Annual workshop for K-centre representatives, see the event page.
 
SSH Open Marketplace Workshop  (cancelled)

This workshop aims at supporting researchers interested in creating a workflow in the SSH Open Marketplace. Following a brief presentation of what the SSH Open Marketplace is and how it works, participants will be supported by members of the Editorial Board of this discovery portal to write and document their research scenarios, based on the use of CLARIN tools, services and data - for example the CLARIN Resource Families or tools from the Language Resource Switchboard. Workflows are an ideal way to share one’s research resources, and harness the power of the SSH Open Marketplace to contextualise tools and services with publications, datasets, and training resources, thus presenting a research activity from A to Z in an easy to follow and reproducible way.
 
EuReCo Workshop (Invite-only)

The EuReCo workshop brings together representatives of National Corpora from CLARIN countries. Its aim is to explore the possibilities of launching an initiative toward a large multilingual and distributed reference corpus for European languages that would connect these existing resources. Such an initiative could potentially develop into a new CLARIN flagship project. It would enable linguists to explore corpora of different languages, especially annotated ones, by means of the CLARIN infrastructure. Eventually, this project could lead to the creation of a large comparable corpus of European languages accessible through a single access point. For more details, including the agenda, please refer to this link.

You can find the agenda via this link.
  • CR2
  • CR7
  • Aula

Day Four

Time Thursday 19 October 2023 Room
09:00 - 13:00 K-Centre Workshop (Part II) (Invite-only) CR2