Repositories

Storing language resources and related datasets is something that requires sound organisation and attention to digital sustainability.

One of CLARIN's aims is to ensure that digital language resources are made available to a broad community on a long-term basis. This is achieved by establishing data repositories at the centres, which host digital resources and the associated metadata. These repositories also assign persistent identifiers to the resources, so that a specific dataset can be easily cited in a paper, for example.

Users can inspect the data in such a repository with a local interface. However, the metadata is also shared with the rest of the CLARIN community, by means of metadata harvesting.

CLARIN is a strong advocate of open access, yet in some cases resources have to be password-protected to respect legal, privacy and ethic constraints. Federated login simplifies requesting access to protected collections, and logging in once access has been granted. This decision stays with the resource owner, as authorisation is distributed in the CLARIN infrastructure.

Repository Assessment

The quality, organisational and technical background of the CLARIN repositories is subject to an assessment procedure. Repositories that have successfully undergone this procedure are granted the CLARIN B-centre label.

Technical Details

For more information on the technical details, see the Language Resources section.

Learn More

Repository tutorial