The CLARIN Resource Families provide a user-friendly overview per data type of the available language resources in the CLARIN infrastructure for researchers from the digital humanities, social sciences and human language technologies. The overviews are meant to facilitate comparative research and the listings are sorted by language.
The listings for each family include the most important metadata as well as brief descriptions, such as resource size, text sources, time periods, annotations and licences, as well as links to download pages and concordancers. In addition to the resources found in the CLARIN infrastructure, an overview of other existing valuable language resources, which have not yet been integrated into the infrastructure, is provided.
The listings also provide hyperlinks to other relevant materials, such as CLARIN workshops and tutorials, video lectures, and key publications.
Corpora
- Computer-Mediated Communication Corpora
- Corpora of Academic Texts
- Corpora of Disordered Speech
- Historical Corpora
- L2 Learner Corpora
- Legal Corpora
- Literary Corpora
- Manually Annotated Corpora
- Multimodal Corpora
- Newspaper Corpora
- Oral History Corpora
- Parallel Corpora
- Parliamentary Corpora
- Reference Corpora
- Sign Language Resources
- Spoken Corpora
Lexical Resources
The overviews were prepared by Darja Fišer and Jakob Lenardič and received funding from the European Union's Horizon 2020 research and innovation programme for projects CLARIN-PLUS, PARTHENOS and SSHOC. We would like to thank the User Involvement Coordinators, National Coordinators, workshop participants and other individuals who participated in the survey and provided information about the resources.
Comments and suggestions to improve this page are welcome. Please send us an resource-families [at] clarin.eu (email).