Skip to main content

Metadata

 

Blue mosaïc tiles

 

To ensure that digital language resources are made available to a broad community on a long-term basis, data repositories at the CLARIN centres host digital resources and the associated metadata. Users can inspect the data in such a repository with a local interface. But the metadata is also shared with the rest of the CLARIN community, by means of metadata harvesting. For instance, the information that is shown in the Virtual Language Observatory (VLO) is retrieved from CLARIN centres (and other sources) in this way.


Repository Software and Configuration

Within CLARIN, metadata is distributed using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). The Centre Registry contains an 'OAI-PMH' page, listing which can be used to query the metadata endpoints exposed by CLARIN centres. One may query these endpoints by simply clicking on the hyperlinks, or, in a more technical fashion, in order to find out what software implementing OAI-PMH they operate. To learn more about practical experiences with certain OAI-PMH server software, please contact the endpoint operators (listed in the Centre Registry as the centre's technical contact). CLARIN centres choose freely which system they use, as long as it is compliant with CLARIN's centre requirements (which include support for metadata harvesting, component metadata, persistent identifiers, federated login amongst other requirements). Popular options are Fedora Commons and DSpace. Some centres provide documentation on how to set up a CLARIN-compliant repository – see our page with information on repositories.

CLARIN also offers a solution for the modelling, authoring and exploitation of metadata. This solution is implemented as a set of standards, registries, definitions and workflows which together form the Component Metadata Infrastructure (CMDI).


Supported Metadata Formats

 

CMDI

See the general information about the Component Metadata Infrastructure for an introduction, and the CMDI landing page for in-depth technical details.

 

Other Supported Metadata Formats

The following formats have limited support in the CLARIN infrastructure. They can be converted to CMDI with a stylesheet provided by CLARIN, and some of these are supported by the CLARIN OAI-PMH metadata harvester (see below):


Applications

CLARIN offers the following applications for metadata harvesting, profile management and compliancy:

  • Metadata Harvester: An application to harvest the metadata descriptions from centres using OAI-PMH. The Metadata Harvester manages the regular harvests of CMD records from endpoints provided by the CLARIN centres and additionally harvests of OLAC and DC records from various other endpoints. 
  • Component Registry: An application to manage and create CLARIN-compliant metadata profiles. The Component Registry stores CMDI metadata components and profiles and offers a REST service to list, retrieve, store and modify them.
  • Concept Registry: An application to manage and create CLARIN-compliant concept definitions. The CLARIN Concept Registry (CCR) forms the basis of the semantic interoperability layer of CLARIN, especially in the context of CMDI metadata. It does so by offering a collection of concepts, identifiable by their persistent identifiers, relevant for the domain of language resources. The CCR is based on SKOSMOS, an open source web-based SKOS browser and publishing tool.
  • Curation module & Link Checker: An application to detect and correct errors in CLARIN-compliant metadata descriptions.

 

Additional software components and other related tools and services are listed in the 'awesome-cmdi' repository.


FAQs and Troubleshooting

Metadata-related FAQs on the CLARIN website