Summary
The Component Metadata Infrastructure (CMDI) provides a standard for metadata within CLARIN.
CMDI is highly flexible in terms of supported metadata models, and does not prescribe any specific structure or terminology. Within CMDI, metadata models can be defined, reused and recombined by means of so called Components and Profiles. Semantic interoperability across these different models is obtained through references to shared concepts.
Metadata creators can choose from already existing and published definitions, and in many cases will not have to specify a custom model themselves.
CLARIN uses CMDI to make resources discoverable, to automate information extraction and aggregation, and as a general mechanism to encode and distribute information about language resources.
Multiple versions of CMDI exist, and although new CMDI users are advised to use the latest stable version, older versions are still supported by the infrastructure. Currently, CLARIN recommends using CMDI 1.2 for the creation of new metadata
What is CMDI?
CLARIN offers a solution for the modelling, authoring and exploitation of metadata. This solution is implemented as a set of standards, registries, definitions and workflows which together form the Component Metadata Infrastructure (CMDI).
CMDI was designed to offer the advantages of shared conventions, common semantics and predefined structures while maintaining the flexibility required to serve the needs of CLARIN's core communities.
CMDI offers a solution for modelling metadata, by which we mean predefining the structure and semantics for a metadata use case. In contrast to metadata standards that are based on a single predefined schema or a small set of such schemas, Component Metadata (CMD) builds on an open-ended set of definitions called Components that can be created, used and reused by anyone and ultimately combined into Profiles that fulfil the metadata needs of specific use cases. At various levels within these definitions, links to entries from the CLARIN Concept Registry are included, providing semantic robustness through common identifiers and definitions for shared concepts.
A CMDI metadata Record typically describes a single resource or a specific set of resources. There are multiple ways of authoring Records. Each Record, however, must be based on a single Profile. The Profile dictates the structure of the Record and specifies the scope and semantics of the values that it may contain. A Record is created as a single XML file. All CMDI records are structured in the same way, including a mandatory header and resource references section on top. The remainder of the record is formed by a section that follows a blueprint defined by a specific Profile, which may therefore differ between CMDI records. There are various ways of creating Records, some of which are discussed in further detail below.
In CLARIN's central infrastructure, metadata records are regularly aggregated from multiple sources such as CLARIN centres, after which they are semantically aligned and finally indexed to make the described resources discoverable in a uniform way. Many other possible ways of using CMDI metadata exist for a variety of purposes. Some use cases for CMDI metadata exploitation are ingestion, aggregation, conversion, search and discovery.
The CMDI specification and infrastructure are stable but evolving. They are actively maintained and developed further by CLARIN, and as a result a number of distinct versions exist. Currently, all CMDI records in the CLARIN ecosystem are compliant with either CMDI 1.1 or CMDI 1.2. More versions may exist in the future. Automatic conversion of records, profiles and components based on older versions of CMDI are provided as part of the CMD toolkit. The same toolkit also offers means of checking metadata against formal requirements and best practices.
Should I use CMDI?
Yes, you should use CMDI if:
- Your metadata is hosted and/or distributed by a CLARIN B-centre or one that aims to receive that status, as it is one of the CLARIN B-centre requirements.
- You would like to have your metadata integrated into the CLARIN infrastructure, and your metadata requirements are not met by one of the other supported metadata formats. There may already be a profile that is suitable for your use case; otherwise, you can become a modeller and define your own Profile.
No, there is no need to use CMDI if:
- You already have metadata in one of the supported formats and no CLARIN B-centre is involved. However if you need optimal interoperability within the CLARIN infrastructure or require a high level of control over how your metadata is processed and presented, you could still consider (also) offering your metadata as CMDI.
How can I use CMDI?
What it means to 'use CMDI' depends on what kind of user you are. Most users can get familiar with the services and tools in the CLARIN infrastructure, and use them productively without having to deal with CMDI directly at all.
For users that do have a need to use CMDI in one way or another, we will offer some hints in the following paragraphs, grouped by type of user. The links below lead to pages aimed to answer the question 'how can I use CMDI' for the different types of users:
- 'Using CMDI' for metadata modellers – creators of metadata blueprints
- 'Using CMDI' for metadata authors – creators of metadata records
- 'Using CMDI' for repository managers – operators of systems that distribute metadata
Learn more
Use the following resources to learn more about CMDI and how to use it:
- Awesome CMDI – A curated list of services, tools and documentation for CLARIN's Component Metadata Infrastructure
- Book chapter "Component Metadata Infrastructure" (Windhouwer, M., & Goosen, T. (2022). Component metadata infrastructure. CLARIN: The infrastructure for language resources, 191-222. https://doi.org/10.1515/9783110767377-008)
This page has been updated. For reference, the previous Component Metadata page can be found here.