CLARIN Resource Families: Computer-mediated Communication (CMC) Corpora

Submitted by Linda Stokman on 20 May 2020

The CLARIN Resource Families initiative provides a user-friendly overview of the available language resources in the CLARIN infrastructure for researchers from digital humanities, social sciences and human language technologies. 

This month CLARIN highlights the Computer-mediated communication (CMC) corpora. CMC constitutes public and private communication on-line, such as posts on blogs, forums, comments on online news sites, social media and networking sites such as Twitter and Facebook, mobile phone applications such as WhatsApp and e-mail. These corpora are interesting for a wide range of research fields, such as language variation, pragmatics, media and communication studies, etc. 

 The CLARIN infrastructure offers 13 CMC corpora - most are available for Slovenian, but also for Czech, Dutch, Estonian, Finnish, French, German and Lithuanian. Most of the corpora are richly tagged and available under public licences.

Read more