France has been an observer of CLARIN since 2017. The national coordinator of CLARIN-FR is Nicolas Larousse and is coordinated by Huma-Num. It involves several national partners:
- Analyse et Traitement Informatique de la Langue Française (ATILF, “The Analysis and Computer Processing of the French Language”)
- Bases, Corpus, Langage (BCL, “Databases, Corpora, and Language”)
- Cognition, Langues, Langage, Ergonomie (CLLE, “Cognition, Languages, Language, Ergonomics”)
- Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus-Atelier de Recherche sur la Parole (CLILLAC-ARP, “Center for Interlanguage Linguistics, Lexicology, English Linguistics and Corpus-Speech Research”)
- Centre de Recherche en EthnoMusicologie (CREM, “Ethno-Musicology Research Centre”)
- Dynamique du Langage (DDL, “Language Dynamics”)
- FOrmes et REprésentations en Linguistique, Littérature et dans les arts de l’Image et de la Scène (FORELLIS, “Forms and Representations in Linguistics, Literature and in the Fine and Performing Arts”)
- Histoire des Théories Linguistiques (HTL, “History of Linguistic Theories”)
- Interactions, Corpus, Apprentissages, Représentations (ICAR, “Interactions, Corpora, Learning, Representations”)
- Langage, Langues et Cultures d’Afrique (LLACAN, “The Language(s) and Cultures of Africa”)
- LAngues et CIvilisations à Tradition Orale (LACITO, “Languages and Civilisations with Oral Traditions”)
- Langues, Textes, Traitements Informatiques, Cognition (LATTICE, “Languages, Texts, Computer Processing, Cognition”)
- LInguistique et DIdactique des Langues Etrangères et Maternelles (LIDILEM, “Linguistics and Didactics of Foreign and Native Languages”)
- LInguistique, Langues, PArole (LILPA, “Linguistics, Language, and Communication”)
- Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI, “Computer Science Laboratory for Mechanics and Engineering Sciences”)
- Laboratoire de LINGuistique de Nantes (LLING, “Nantes Linguistics Laboratory”)
- Laboratoire de Linguistique Formelle (LLF, “Formal Linguistics Laboratory”)
- Laboratoire Ligérien de Linguistique (LLL, “Ligérien Linguistics Laboratory”)
- MOdèles, DYnamiques, COrpus (MODYCO)
- PRAXIs el LINGuistique (PRAXILING, “Praxis and Linguistics”)
- Structures Formelles du Langage (SFL, “Formal Structures of Language”)
- Savoirs, Textes, Langage (STL, “Knowledge, Texts, Language”)
The French consortium is mainly composed of French linguists involved through the CORLI expert group, which promotes the use of good practices for corpus creation, focusing on how to maximize corpus reuse and disseminate the corpus data. CORLI is also active in promoting the application of FAIR principles. Moreover, French researchers are involved via the ATALA association and the group GDR TAL. Additionally, CLARIN-FR has recently started to establish contacts with the French cognitive sciences community, in particular with Institut Carnot pour la Cognition for its “Cognition & Langage” research project.
CLARIN-FR has so far established 3 C-Centres:
- The COCOON Centre provides a data repository with access to oral resources (with a focus on dialectal texts) and an interactive web portal that offers a chain of navigational and analytical tools to the French digital research community. A unique feature of COCOON is that all the oral resources are tagged with precise geolocational metadata so they can be searched geographically on a state-of-the-art interactive worldwide map offered on the web portal.
- The ORTOLANG centre provides a general-purpose repository for secure long-term storage of language data mostly pertaining to the languages spoken in France, although resources from other origins are accepted as well, especially when the data come from countries where no public repositories like ORTOLANG are available. For example, COMERE is one of the most visible corpora of ORTOLANG and constitutes computer-mediated language resources such as tweets and text message.
- The MMSH's Sound Archives Center (Phonothèque) preserves, archives and disseminates the archived recordings of the sound heritage relating to the ethnology, languages, history, music and literature from the Mediterranean area. For example, the Phonothèque makes accessible, with ethical and legal rules, recordings of different dialects of Occitan or variants of colloquial Arabic (Syria, Lebanon, Sudan, Algeria, Yemen).
In 2020, CLARIN-FR established the French K-centre for Corpora, Languages and Interaction. The K-centre focuses on providing information, tools, continuing education, to help PhD students and professional linguists work on corpus linguistics. It is run by a panel of corpus linguists who provide their expertise to the community. CLARIN-FR has also successfully added two end points to the Federated Content Search from ORTOLANG and COCOON C-Centers.
Following the Work Plan for the 2019 renewal of the status as CLARIN observer and after establishing the K-Centre, CLARIN-FR now aims to obtain the Core Trust Seal certification for the ORTOLANG repository so that latter can become a CLARIN B-Centre. After the observership period ends in 2021, CLARIN-FR aims to continue discussions with the French ministry of research, the French National Centre for Scientific Research CNRS and national communities about the opportunity for France to become a full CLARIN member.