CLARIN Café on Bilingual and Multilingual Corpora

Friday, 29 April 2022 , 14:00 - 16:15

General Information

This edition of the CLARIN café is organised by: Eva Soroli, CLARIN Ambassador, Associate Professor, University of Lille, France; Thomas Gaillat, Associate Professor, University of Rennes, France; Franck Cinato, CNRS researcher, University of Paris, France. The CLARIN host will be Eva Soroli

Date: 29 April 2022

Time: 14:00-16:15 (CEST)

Venue: CLARIN virtual Zoom meeting

Twitter hashtag: #CLARINcafe

This CLARIN Café is supported by the CLARIN K-Centre CORLI and HumaNum.

A full overview of the café sessions scheduled can be found on the CLARIN Café page.

About

A linguistic corpus is a collection of language productions (text/oral/multimodal data) selected and brought together in order to reveal something about human behaviour. Bilingual and multilingual corpora are very common in language studies and are relevant to researchers working, among other domains, in historical linguistics, language acquisition, variation, dialectal and typology studies.

The objective of this Café is to focus on the features of bi-/multilingual parallel, comparable and dialectal corpora (new or already published), and provide demonstrations on how to collect/build, annotate, explore, analyse and archive them in an interoperable way.

Each demo will include hands-on presentations and best practice recommendations for:

Bi-/multilingual corpus building/collection and metadata
Corpus exploitation (principles and tools of transcription, annotation)
Data exploration, cleaning, output reorganisation and analysis
FAIR issues and perspectives for knowledge sharing.

How to Join

You can register at this link, you will receive the Zoom meeting link on the day before the event.

Programme

14.00-14.15

The European Infrastructure CLARIN and its Knowledge Centres

Eva SOROLI, University of Lille, France

14.15-14.30

CORLI (Corpus, Language and Interactions): a CLARIN Knowledge-Centre

Christophe PARISSE, University of Nanterre & Céline POUDAT, Université Côte d'Azur, France

14.30-14.50

The multidialectal corpus of the Crescent dialects: collection, exploitation and analysis

Maximilien GUERIN, University of Paris & CNRS - HTL (UMR 7597)

14.50-15.00 Questions & Discussion

15.00-15.20

Building CIEP+, the parallel Corpus of Indo-European Prose Plus

Annemarie VERKERK & Luigi TALAMO Universität des Saarlandes, Germany

15.20-15.30 Questions & Discussion

15.30-15.15.50

A dynamic architecture to structure and analyse comparable learner corpora: the case of the French and English Corpus InterLangue (CIL).

Thomas GAILLAT, University of Rennes, LIDILE, France

15.50-16.00 Questions & Discussion

16.00-16.15 Wrap-up Session : Franck CINATO

CLARIN Café on Bilingual and Multilingual Corpora

General Information

About

How to Join

Programme

Recordings, Slides and Blog

CLARIN – the research infrastructure for language as social and cultural data