Programme CLARIN Annual Conference 2023

Event name: CLARIN Annual Conference 2023

Date: Monday, 16 October 2023 - Wednesday, 18 October 2023 (all times are CEST)

Location: Irish College Leuven, Leuven, Belgium

Twitter Hashtag: #CLARIN2023

CLARIN 2023 | Proceedings |

Conference Programme Outline

Monday, 16 October 2023

9:00 – 10:30	~~Centre Assessment Committee (CAC)~~ (cancelled) CLARIN National Coordinators' Forum (NCF) Part 1	~~CR3~~ CR2
10:30 - 11:00	Coffee break
11:00 - 13:00	CLARIN National Coordinators' Forum (NCF) Part 2 Standing Committee on CLARIN Technical Centres (SCCTC) User Involvement Committee	CR2 CR3 Green room
13:00 - 14:00	Lunch break
14:00 - 15:30	CLARIN Standards Committee (CSC) Knowledge Infrastructure Committee (KIC) CLARIN Legal Issues Committee (CLIC)	CR3 Green room CR2
15:30 - 16:00	Coffee break
16:00 - 16:15	Conference Opening Session Steven Krauwer Award	Aula
16:15 - 17:00	Keynote Jörg Tiedemann	Aula
17:00 - 18:00	Papers (Poster Format)	Dining room wing
18:15-18:30	Walk to Leuven Town Hall
18:30 - 19:20	Welcome Reception	Historic Town Hall Grote Markt 9
19:30 - 22:00	Welcome Dinner	Domus Tiensestraat 8

Tuesday, 17 October 2023

09:00 - 09:10	Presentation by Programme Committee Chair	Aula
09:10 - 09:15	Presentation by Local National Coordinator	Aula
09:15 - 10:00	Pitches by CLARIN Committees	Aula
10:00 - 10:30	State of the Technical Infrastructure	Aula
10:30 - 11:00	Coffee Break
11:00 - 13:00	Abstract Presentations (Infrastructure)	Aula
11:00 - 13:00	Teachers' Workshop: Using CLARIN in Training and Education	CR2
13:00 - 13:45	Lunch
13:45 - 14:30	PhD Poster Session	Dining room wing
14:30 - 15:30	Abstract Presentations (ParlaMint)	Aula
15:30 - 16:00	Coffee Break
16:00 - 17:20	Abstract Presentations (Tools)	Aula
17:30 - 19:00	Bazaar Poster Session For an overview off all posters, please consult the Bazaar page
19:30 - 22:30	Conference Dinner	Faculty Club Groot Begijnhof 14

Wednesday, 18 October 2023

09:00 - 10:20	Abstract Presentations (Corpora)	Aula
10:20 - 11:00	Group Photo and Coffee Break
11.00 - 11:45	Keynote by Laurence Devillers	Aula
11:45 - 12:45	Abstract Presentations (Metadata and Annotations)	Aula
12:45 - 13:00	Closing Remarks Award Best PhD Poster	Aula
13:00 - 14:00	Lunch
14:00 - 16:00	SAB Meeting	Board room
14:00 - 17:00	K-centre Workshop (Part I) (Invite-only) SSH Marketplace Workshop EuReCo Workshop (Invite-only)	CR2 CR7 Aula

Thursday 19 October 2023

09:00 - 13:00

K-centre Workshop (Part II) (Invite-only)

Irish College

Keynotes

Lost in Meaning - Found in Translation

Jörg Tiedemann

University of Helsinki

Monday 16 October, 16:15 - 17:00

Ethical Issues of Generative AI

Laurence Devillers

University Paris-Sorbonne IV/LIMSI CNRS

Wednesday, 18 October, 11:00 - 11:45

Conference Programme Details

Day One

Time	Monday 16 October 2023	Room
9:00 – 10:30	~~Centre Assessment Committee (CAC)~~ (cancelled) CLARIN National Coordinators' Forum (NCF) Part 1	CR3 CR2
10:30 - 11:00	Coffee break
11:00 - 13:00	CLARIN National Coordinators' Forum (NCF) Part 2 Standing Committee on CLARIN Technical Centres (SCCTC) User Involvement Committee	CR2 CR3 Green room
13:00 - 14:00	Lunch break
14:00 - 15:30	CLARIN Standards Committee (CSC) Knowledge Infrastructure Committee (KIC) CLARIN Legal Issues Committee (CLIC)	CR3 Green room CR2
15:30 - 16:00	Coffee break
	Start of the Conference
16:00 - 16:15	Conference Opening Session Steven Krauwer Award	Aula
16:15 - 17:00	Keynote by Jörg Tiedemann Lost in Meaning - Found in Translation: Natural Language Understanding with Multilingual Data (slides) Abstract The task of translation involves language understanding and generation and, in this way, naturally combines the two essential challenges in computational linguistics and language technology. In the FoTran project, we are interested in the ability of neural translation models to pick up linguistic properties and to generalise to meaningful representations when trained on large amounts of multilingual data. Our focus is on the effect of linguistic diversity on abstraction and generalisation. In order to study this, we need to create the necessary resources and infrastructure. In this talk, I will first introduce the OPUS ecosystem that fuels our research. In the second part, I will concentrate on the experiments, studies and developments that this ecosystem enables within and outside of FoTran. I also welcome discussions on further directions that can be taken with the multilingual infrastructure we build, looking forward to your input.	Aula
17:00 - 18:00	Papers (Poster Format) Linguistic Resources and Tools for Ukrainian: Grounds for Creating a K-Centre Olha Kanishcheva and Maria Shvedova The Making of the CLARIN Resource Family for Oral History: Lessons Learned from ‘Voices from Ravensbrück’ (poster) Stefania Scagliola, Silvia Calamai, Henk Van Den Heuvel and Christoph Draxler Libraries as Data Infrastructures Martin Wynne, Andreas Witt, Leinen Peter and Sally Chambers (CI) Workflow for Quality Assurance Checks for Corpora of Multimodal Interaction (poster) Anne Ferger, André Frank Krause and Karola Pitsch: A Continous Integration The LiRI Corpus Platform (poster) Jonathan Schaber, Johannes Graën, Daniel McDonald, Igor Mustač, Nikolina Rajović, Gerold Schneider and Noah Bubenhofer DBBErt: Part-of-Speech Tagging of Pre-Modern Greek Text Colin Swaelens, Els Lefever and Ilse De Vos A Multilingual Database for Icelandic L2 Flashcards Xindan Xu, Þórunn Arnardóttir and Anton Karl Ingason Korpusnik: A Corpus Summarizing Tool for Slovene Iztok Kosem, Jaka Cibej, Kaja Dobrovoljc and Simon Krek Topics in Swedish News on Climate Change: A Timeline 2016 - 2023 Maria Skeppstedt Sharing the Finnish Dark Web Marketplace Corpus (FINDarC) (poster) Krister Lindén, Teemu Ruokolainen, Lasse Hämäläinen and Tuomas Harvianen Swissdox@LiRI – A Large Database of Media Articles Made Accessible to Researchers (poster) Johannes Graën, Igor Mustač, Nikolina Rajović, Jonathan Schaber, Gerold Schneider and Noah Bubenhofer Analyses of Information Security Standards on Data Crawled from Company Web Sites Using SweClarin Resources Arne Jönsson, Subhomoy Bandyopadhyay, Svjetlana Pantic Dragisic and Andrea Fried Building and Consolidating a FAIR-Compliant Ecosystem of Infrastructures Cristina Grisot, Noah Bubenhofer, Andrea Malits, Stefanie Strebel, Johannes Graën and Stefan Buerli Dynamically Chaining APIs: from Dracor to TEITOK Maarten Janssen The ACoDe Project: Creating a Dementia Corpus for Icelandic Elena Callegari, Anton Karl Ingason and Agnes Sólmundsdóttir Emotion and Abstractness in Austrian Parliamentary Discourse Tanja Wissik and Klaus Hofmann Developing Manually-Annotated Corpora for Teaching and Learning Purposes of Brazilian Portuguese, Dutch, Estonian, and Slovene (the CrowLL Project) Tanara Zingano Kuhn, Carole Tiberius, Špela Arhar Holdt, Kristina Koppel, Iztok Kosem and Rina Zviel Girshin and Ana R. Luís	Dining room wing
18:15-18:30	Walk to Town Hall Leuven
18:30 - 19:30	Welcome Reception	Historic Town Hall Grote Markt 9
19:30 - 22:00	Welcome Dinner	Domus Tiensestraat 8

Day Two

Details

To be confirmed.

Time	Tuesday 17 October 2023	Room
09:00 - 09:10	Presentation by Programme Committee Chair (slides)	Aula
09:10 - 09:15	Presentation by Local National Coordinator	Aula
09:15 - 10:00	Pitches by CLARIN Committees (slides)	Aula
10:00 - 10:30	State of the Technical Infrastructure (slides)	Aula
10:30 - 11:00	Coffee Break
11:00 - 13:00	Thematic Session: Infrastructure Chair: Jurgita Vaičenonienė	Aula
11:00 - 11:20	Standards Information System for CLARIN Centres and Beyond (slides) Piotr Banski and Eliza Margaretha Illig
11:20 - 11:40	The CLARIN:EL Infrastructure (slides) Maria Gavriilidou, Stelios Piperidis, Dimitrios Galanis, Juli Bakagianni, Penny Labropoulou, Athanasia Kolovou, Dimitris Gkoumas, Miltos Deligiannis, Kanella Pouli, Iro Tsiouli, Leon Voukoutis and Katerina Gkirtzou
11:40 - 12:00	NB DH-LAB: A Corpus Infrastructure for Social Sciences and Humanities (slides) Magnus Breder Birkenes, Lars G. Johnsen and Andre Kåsen
12:00 - 12:20	CORLI CLARIN K-Centre: Development and Perspectives (slides) Christophe Parisse and Céline Poudat
12:20 - 12:40	The SSH Open Marketplace and CLARIN (slides) Alexander König, Laure Barbot, Cristina Grisot, Michael Kurzmeier and Edward J. Gray
12:40 - 13:00	CLARIN-IT: Texts, Documents and New Contexts (slides) Federico Boschetti, Angelo Mario Del Grosso, Riccardo Del Gratta, Francesca Frontini and Monica Monachini
11:00 - 13:00	Teachers' workshop: Using CLARIN in Training and Education (slides) Click on Details to view the programme. For more information about the abstracts, please visit the workshop programme page. Details 11:00 - 12:00 Presentations of Accepted Abstracts 11:00 - 11:10 Welcome and Introduction Francesca Frontini 11:10 - 11:20 Privacy by Design in Linguistic Research Henk van den Heuvel 11:20 - 11:30 Teaching Syntax with CLARIN Corpora and Resources Antonio Balvet 11:30 - 11:40 Learning Programming in Python for Linguistics and Language Studies Koenraad De Smedt 11:40 - 11:50 NLP Annotation for Digital Scholars Maarten Janssen and Silvie Cinková 11:50 - 12:00 DH-Course Registry: A Bridge Between Infrastructures, DH Masters Degrees and Industry? Amelia Sanz, Vicky Garnett, Tom Gheldof, Adeline Joffres, Iulianna van der Lek, Edward Gray, 12:00 - 12:10 Discussion 12:10 - 13:00 Demo of the CLARIN Learning Content in the UPSKILLS project 12:10-12:20 Introduction to the UPSKILLS Project Stavros Assimakopoulos 12:20 -12:35 Introduction to Language Data: Standards and Repositories Iulianna van der Lek 12:35 -12:50 Automatic Speech Recognition and Force Alignment Louis ten Bosch 12:50 - 13:00 Discussion & Wrap-Up	CR2
13:00 - 13:45	Lunch
13:30 - 14:30	PhD Poster Session	Dining room wing
14:30 - 15:30	Thematic Session: ParlaMint Chair: Maciej Piasecki	Aula
14:30 - 14:50	The ParlaMint Project: Ever-Growing Family of Comparable and Interoperable Parliamentary Corpora (slides) Maciej Ogrodniczuk, Petya Osenova, Tomaž Erjavec, Darja Fišer, Nikola Ljubešić, Çagrı Çöltekin, Matyáš Kopp, Katja Meden and Taja Kuzman
14:50 - 15:10	Workflow and Metadata Challenges in the ParlaMint Project: Insights from Building the ParlaMint-UA Corpus (slides) Anna Kryvenko and Matyáš Kopp
15:10 - 15:30	Adding Political Orientation Metadata to ParlaMint Corpora (slides) Tomaž Erjavec, Katja Meden and Jure Skubic
15:30 - 16:00	Coffee Break
16:00 - 17:20	Thematic Session: Tools Chair: Vincent Vandeginste	Aula
16:00 - 16:20	MATEO: Machine Translation Evaluation for Users and Developers (slides) Bram Vanroy
16:20 - 16:40	Domain-Specific Languages for Epigraphy: The Case of ItAnt (slides) Luca Rigobianco, Federico Boschetti and Valeria Quochi
16:40 - 17:00	Finding Dutch Multiword Expressions (slides) Jan Odijk, Martin Kroon, Tijmen Baarda, Ben Bonfil and Sheean Spoel
17:00 - 17:20	Automatic Anonymisation of Human Faces in Images of Authentic Social Interaction: A Web Application (slides) André Frank Krause, Anne Ferger and Karola Pitsch
17:30 - 19:00	Bazaar Poster Session	Dining room wing
19:30 - 22:30	Conference Dinner	Faculty Club Groot Begijnhof 14

Day Three

Time	Wednesday 18 October 2023	Room
09:00 - 10:20	Thematic Session: Corpora Chair: Tomaž Erjavec	Aula
09:00 - 09:20	A Spoken Academic Belgian Dutch Corpus (slides) Vincent Vandeghinste, Jolien Mathysen, Patrick Wambacq and Elke Peters
09:20 - 09:40	NGT-HoReCo and GoSt-ParC-Sign: Two New Sign Language - Spoken Language Parallel Corpora (slides) Mirella De Sisto, Dimitar Shterionov, Lien Soetemans, Vincent Vandeghinste and Caro Brosens
09:40 - 10:00	Teaching Syntax with Clarin Corpora and Resources (slides) Antonio Balvet
10:00 - 10:20	A New CLARIN Resource Family for Lexical Semantic Change Research (slides) Paola Marongiu, Fahad Khan and Barbara McGillivray
10:20 - 11:00	Group Photo and Coffee Break
11:00 - 11:45	Keynote by Laurence Devillers Ethical Issues of Generative AI (slides) Abstract In this keynote, I offer studies and reflections on the ethical issues of generative artificial intelligence (AI). The special feature of generative artificial intelligence systems is that they are based on generative models that can produce multiple outputs: generation of text or images for various purposes such as translation, production of computer code, chatbots, decision support and so on. These models, pre-trained on large datasets, can be optimised to produce a new application using little additional data specific to that task. The social and economic impact of generative AI systems is likely to be major in many potential uses, for example, in the environment or in healthcare. However, these generative AI systems raise many ethical, epistemological, anthropological, psychological, economic, social, political and cultural questions. Some of these issues will continue to occur as these technologies are put to new uses, and it is not yet possible to predict all the effects they will have on individuals and society. Since the end of 2022, economic and political actors in several countries have been discussing the impact of language models built with these generative AI systems. Some of these models have an impressive number of parameters. The race for the largest model is ongoing, but it is not certain that larger models would deliver higher performance. I was involved as a co-writer of the opinion n°7 of the ethical issues of generative artificial intelligence in the CNPEN (National Pilot Committee for Digital Ethics). In this opinion, CNPEN focuses on the most important ethical issues in light of current experience with generative AI systems, mainly on language models.	Aula
11:45 - 12:45	Thematic session: Metadata and Annotations Chair: Andreas Witt	Aula
11:45 - 12:05	Documenting Corpus Annotation in CMDI: State of Affairs (slides) Jakob Lenardič
12:05 - 12:25	Do Chatbots Dream of Copyright? Copyright in AI-generated Language Data (slides) Pawel Kamocki, Toby Bond, Krister Lindén and Thomas Margoni
12:25 - 12:45	Between Lexicon and Grammar: Towards Integrated Valencies for Bulgarian (slides) Petya Osenova and Kiril Simov
12:45 - 13:00	Best PhD Poster Award Closing Remarks (slides)	Aula
13:00 - 14:00	Lunch
14:00 - 16:00	SAB Meeting	Board room
14:00 - 17:00	K-Centre Workshop (Part I) (Invite-only) Details Annual workshop for K-centre representatives, see the event page. ~~SSH Open Marketplace Workshop~~ (cancelled) Details This workshop aims at supporting researchers interested in creating a workflow in the SSH Open Marketplace. Following a brief presentation of what the SSH Open Marketplace is and how it works, participants will be supported by members of the Editorial Board of this discovery portal to write and document their research scenarios, based on the use of CLARIN tools, services and data - for example the CLARIN Resource Families or tools from the Language Resource Switchboard. Workflows are an ideal way to share one’s research resources, and harness the power of the SSH Open Marketplace to contextualise tools and services with publications, datasets, and training resources, thus presenting a research activity from A to Z in an easy to follow and reproducible way. EuReCo Workshop (Invite-only) Details The EuReCo workshop brings together representatives of National Corpora from CLARIN countries. Its aim is to explore the possibilities of launching an initiative toward a large multilingual and distributed reference corpus for European languages that would connect these existing resources. Such an initiative could potentially develop into a new CLARIN flagship project. It would enable linguists to explore corpora of different languages, especially annotated ones, by means of the CLARIN infrastructure. Eventually, this project could lead to the creation of a large comparable corpus of European languages accessible through a single access point. For more details, including the agenda, please refer to this link. Agenda You can find the agenda via this link.	CR2 ~~CR7~~ Aula

Day Four

Time	Thursday 19 October 2023	Room
09:00 - 13:00	K-Centre Workshop (Part II) (Invite-only)	CR2