CLARIN adheres to the following principles:
- Open standards are preferred over proprietary standards
- Formats and protocols should be:
- Proven (being used in practice)
- Text-based formats are (where possible) preferred over binary formats
- In the case of digitisation of an analogue signal, using no or lossless compression is recommended.
- Ongoing work by the CLARIN Standards Committee
- FAQ about recommended formats and standards
- Document: Standards for LRT
Several CLARIN centres have published information on what formats they recommend for language research data depositions:
- TALAR (EKUT)
- TLA (MPI-PL)
The CLARIN Standards Information System (provided by IDS Mannheim) provides information on standards in general and on standards used by the particular centres. As of spring 2020, the system will undergo modifications to reflect centres' recommendations concerning formats for data depositions.