Skip to main content

CLARIN Café on Bilingual and Multilingual Corpora

, -

General Information

This edition of the CLARIN café is organised by: Eva Soroli, CLARIN Ambassador, Associate Professor, University of Lille, France; Thomas Gaillat, Associate Professor, University of Rennes, France; Franck Cinato, CNRS researcher, University of Paris, France. The CLARIN host will be Eva Soroli

Date: 29 April 2022

Time: 14:00-16:15 (CEST)

Venue: CLARIN virtual Zoom meeting

Twitter hashtag: #CLARINcafe 

This CLARIN Café is supported by the CLARIN K-Centre CORLI and HumaNum.
A full overview of the café sessions scheduled can be found on the CLARIN Café page.


A linguistic corpus is a collection of language productions (text/oral/multimodal data) selected and brought together in order to reveal something about human behaviour. Bilingual and multilingual corpora are very common in language studies and are relevant to researchers working, among other domains, in historical linguistics, language acquisition, variation, dialectal and typology studies.

The objective of this Café is to focus on the features of bi-/multilingual parallel, comparable and dialectal corpora (new or already published), and provide demonstrations on how to collect/build, annotate, explore, analyse and archive them in an interoperable way. 

Each demo will include hands-on presentations and best practice recommendations for:

  • Bi-/multilingual corpus building/collection and metadata
  • Corpus exploitation (principles and tools of transcription, annotation)
  • Data exploration, cleaning, output reorganisation and analysis
  • FAIR issues and perspectives for knowledge sharing.

How to Join

You can register at this link, you will receive the Zoom meeting link on the day before the event.



The European Infrastructure CLARIN and its Knowledge Centres

Eva SOROLI, University of Lille, France



CORLI (Corpus, Language and Interactions): a CLARIN Knowledge-Centre

Christophe PARISSE, University of Nanterre & Céline POUDAT, Université Côte d'Azur, France



The multidialectal corpus of the Crescent dialects: collection, exploitation and analysis

Maximilien GUERIN, University of Paris & CNRS - HTL (UMR 7597)

14.50-15.00 Questions & Discussion



Building CIEP+, the parallel Corpus of Indo-European Prose Plus

Annemarie VERKERK & Luigi TALAMO Universität des Saarlandes, Germany

15.20-15.30 Questions & Discussion



A dynamic architecture to structure and analyse comparable learner corpora: the case of the French and English Corpus InterLangue (CIL).

Thomas GAILLAT, University of Rennes, LIDILE, France

15.50-16.00 Questions & Discussion 


16.00-16.15 Wrap-up Session : Franck CINATO

Recordings, Slides and Blog