What it means to switch to 1.2 and whether you should depends on your role within CLARIN. The following pages provide answers to this and other questions for various groups:
Frequently Asked Questions - Basics
What is CMDI 1.2 and how does it affect me?
1.2 is the successor to the CMDI 1.1 metadata framework and is one of the two currently supported versions of CMDI. More information about this specific version can be found at the CMDI 1.2 page. How the introduction of CMDI 1.2 affects you depends on your role within CLARIN. Click one of the following links to find detailed information about the transition to CMDI 1.2 that is relevant to you:
What version of CMDI should I use?
What versions of are there and which should I use?
There currently are two supported versions of the CLARIN's component metadata framework: 1.1 and CMDI 1.2. The former has been in active use for many years and is widely supported within the CLARIN infrastructure. CMDI 1.2 was introduced in 2016 and provides a number of new features and improvements compared to its predecessor. However, its support throughout the infrastructure is still limited (at the time of writing this FAQ, July 2016).
Therefore in order to make a decision about which version of CMDI to use, it's advised to first determine which tools you need your metadata to be processed with. More details about CMDI 1.2, including current information with respect to its support throughout the infrastructure, can be found at the CMDI 1.2 page.
I found a profile that almost matches my needs. Can I add some fields?
The fields of a profile are fixed, so you will need to use a different profile. Don't worry, you can create your own. Since you found a profile that seems to almost match your needs, the most logical thing to do is to create a new profile based on that one.
You can do this yourself, as long as you have a way to login to the Component Registry. Click the 'login' link and select your home institute or another provider you have an account with. If none is in the list, create an account with CLARIN (more info).
When logged in, select the base profile and click the 'Edit as new' button. Save it in your private workspace (under a different name and/or group). The profile consists of links to a number of components (some of which in turn consist partially of links to components), so you will have to identify the components that you need to change. Edit these 'as new', as well and make the required changes. You may have to do this recursively for deeper hierarchies. Then, in your profile, replace the references to the original components with references to your new versions of these components. Save the profile, and test it in an editor (e.g. oXygen or Arbil) before publishing (you can get the XSD link by selecting the profile in the component browser and choosing 'Show Info' from the drop down menu on the far right. You can open this link in an XML editor or validator; in Arbil you can add it via the 'Profiles and templates' settings.
How can I add a link to the original repository where a resource is hosted? (landing page)
If you want to add a link to the original context of the metadata file, e.g. to the repository where it is hosted (example), add a ResourceProxy of the type LandingPage, e.g.:
<ResourceProxy id="lp"> <ResourceType>LandingPage</ResourceType> <ResourceRef>http://hdl.handle.net/11858/00-097C-0000-0008-E130-A</ResourceRef> </ResourceProxy>
How can I indicate that the resources described with a CMDI file are also searchable via a specialised web application?
This can be done with a ResourceProxy where:
- ResourceType = SearchPage
<ResourceProxy id="d55"> <ResourceType>SearchPage</ResourceType> <ResourceRef>http://corpus1.mpi.nl/ds/trova/search.jsp?nodeid=MPI86949%23</ResourceRef> </ResourceProxy>
For a complete example file see: http://catalog.clarin.eu/metadata/cmdi/collections/collection-cgn.cmdi
How can I create an XSD (XML schema) from my CMDI profile?
- In the Component Registry: open the drop down menu on the far right column of the profile's row and select Download XSD ( 1.1) or Download XSD (CMDI 1.2) depending on the support and requirements of your tools, repository etc (more information).
- Or use the web service directly and download the XSD from the following url (if you know the profile ID)
- For CMDI 1.1: http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/1.1/profiles/clarin.eu:cr1:p_1288172614017/xsd - where the part in boldface stands for the unique profile ID
- For CMDI 1.2: http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/1.2/profiles/clarin.eu:cr1:p_1288172614017/xsd - where the part in boldface stands for the unique profile ID
How can I indicate that the resources described with a CMDI file are also searchable via SRU/CQL?
This can be done with a ResourceProxy where:
- ResourceType = SearchService
- mimetype = application/sru+xml
<ResourceProxy id="d55"> <ResourceType mimetype="application/sru+xml">SearchService</ResourceType> <ResourceRef>http://cqlservlet.mpi.nl/</ResourceRef> </ResourceProxy>
For a complete example file see: http://www.clarin.eu/cmd/example/example-cgn-sru.cmdi
Is there a list of recommend components and profiles?
As a starting point, see the list below. We are working to extend it.
- For languages:
- (with fallback to string): cmdi-language
- (short list): iso-language-639-1
- (long list): iso-language-639-3
- For language families: iso-languagefamiliy-639-5
- For countries: iso-country
- For continents: iso-continent
- For mimetypes: cmdi-mimetype
- For license types: License
- Description: cmdi-description
- Free tags: tags
- Collection descriptions: collection
- headers: teiHeader
- : OLAC-DcmiTerms
- session: imdi-session , IMDI corpus: imdi-corpus
- Phonetic collections/corpora: media-session-profile (recording session )and media-corpus-profile (collection level)
- Lexical resources: LexicalResourceProfile
- Interview: OralHistoryInterview
Can I use multiple languages in my metadata description?
Yes. If you tick the checkmark next to Multilingual for an element in the Component Registry, it will result in a multilingual field. With the xml:lang attribute you can then indicate the language in which an element has been described, see eg. the following fragment in this example CMDI file:
<!-- Note the support for multilingual fields, using the xml:lang attribute --> <title xml:lang="eng">mister</title> <title xml:lang="fra">monsieur</title> <title xml:lang="nld">mijnheer</title>
For indicating the language we strongly advice to use the ISO-639-3 language code.
Please note that enabling Multilingual will make the element repeatable, even if the Maximum number of occurences is set to 1.
What is the difference between a component and a profile?
Technically there is no real difference. A profile is a component that can be converted into an XSD file. A normal component can only be used within other components or profiles and can never be transformed into an XSD.
The isProfile="true" attribute indicates that a CMD_ComponentSpec defines a profile and not just a component.
How do I know on which profile a CMDI file is based?
The MdProfile element (in the Header section) contains a unique profile code (e.g.: clarin.eu:cr1:p_1290431694484). Alternatively you can also find the profile identifier as part of the schema location, for example ( 1.1):
<CMD ... xsi:schemaLocation="http://www.clarin.eu/cmd/ http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/1.1/profiles/clarin.eu:cr1:p_1290431694484/xsd">
or (CMDI 1.2):
<cmd:CMD ... xsi:schemaLocation="http://www.clarin.eu/cmd/1 https://infra.clarin.eu/CMDI/1.x/xsd/cmd-envelop.xsd http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1381926654508 https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/1.x/profiles/clarin.eu:cr1:p_1290431694484/xsd">
You can find the profile in the component registry with the following URL:
How can I specify additional details about a ResourceProxy?
The information that a ResourceProxy can contain (a URL and mimetype) is kept very minimal, on purpose. However you can use any component to add more details about such a ResourceProxy, using the id attribute.
E.g. in the example CMDI file we can add a textual description of the photo. First the relevant ResourceProxy gets the id "a_photo":
<ResourceProxy id="a_photo"> <ResourceType mimetype="image/jpeg">Resource</ResourceType> <!-- note that both a normal URL and a handle Persistent Identifier can be used for the ResourceRef --> <ResourceRef>hdl:1839/00-0000-0000-0009-3C7E-F</ResourceRef> </ResourceProxy>
Then, later on in the same CMDI file, we have an explanantory component example-component-photo with a description element:
<example-component-photo ref="a_photo"> <description>a suitable textual description of this photo</description> </example-component-photo>
Thanks to the reference from this component to the ResourceProxy with the ref attribute we know that the description relates to the photo.
Note that the id attribute should be unique for each ResourceProxy.
How do I point to the files I'm describing with CMDI? How does the Resources section work?
Ok, so how can you refer to an external file from a metadata description? That is where the Resources section is for.
In the example CMDI file, the resources section looks like:
<Resources> <!-- List of external resource files and (CMDI) metadata files --> <ResourceProxyList> <ResourceProxy id="a_photo"> <ResourceType mimetype="image/jpeg">Resource</ResourceType> <!-- note that both a normal URL and a handle Persistent Identifier can be used for the ResourceRef --> <ResourceRef>hdl:1839/00-0000-0000-0009-3C7E-F</ResourceRef> </ResourceProxy> <ResourceProxy id="a_text"> <ResourceType mimetype="text/plain">Resource</ResourceType> <ResourceRef>http://www.clarin.eu/sometext.txt</ResourceRef> </ResourceProxy> ...
As you can see, for each link to an external resource a ResourceProxy (= file) is added to the ResourceProxyList (= file list). For each ResourceProxy you need to specify the ResourceType:
- Resource, the default, for a link to a web-accessible file (e.g. text file, MPEG video, file)
- Metadata in case you want to build a hierarchy of CMDI files
- SearchPage, to link to a specialised website where the described resource can be queried (more details...)
- LandingPage, to link to the "original context", e.g. the URL of a repository system displaying the digital object that is described (more details...)
- SearchService, to link to a specialised webservice where the described resource can be queried (more details...)
With an optional (but very useful) mimetype attribute you can (surprise!) indicate the file's mime type. The ResourceRef contains either a normal URL or a handle PID.
What parts does a CMDI metadata file have?
Each files exists of 3 parts:
- a (fixed) Header, containing administrative information:
- MdCreator: the author of the file
- e.g. "Eric Carlson"
- MdCreationDate: the creation date of this file
- e.g. "2016-12-31"
- MdSelfLink: the URL or
of this file
- e.g. "https://www.institute.org/metadata/record.cmdi" or "hdl:1234/00-1234-5678-9ABC" (alternatively "https://hdl.handle.net/1234/00-1234-5678-9ABC")
- MdProfile: the unique identifier of a CMDI profile, as generated by the component registry
- e.g. "clarin.eu:cr1:p_1290431694484"
- MdCollectionDisplayName: an (optional but recommended) plain text indication to which collection this file belongs. Used for the Collection facet in the
- MdCreator: the author of the file
- a (fixed) Resources section, containing links to:
- external files (e.g. an annotation file or a sound recording)
- and/or other CMDI metadata files (to build hierarchies)
- a (flexible) Components section, where the actual components that this profile contains will appear
This example CMDI file illustrates the use of the 3 parts.
Is there a CMDI profile I can use to describe web services?
There are multiple suitable profiles, as described in the CMDI core model for web services (and extended documentation).
See also the following paper:
Windhouwer, M., Broeder, D., & Van Uytvanck, D. (2012). A CMD core model for CLARIN web services. In Proceedings of the workshop on Describing Language Resources with Metadata: Towards Flexibility and Interoperability in the Documentation of Language Resources at LREC 2012 (pp. 41-48).
How can I create a hierarchical collection with CMDI?
Link from the parent .cmdi file to the child .cmdi file with a ResourceProxy that has the ResourceType Metadata.
E.g. http://infra.clarin.eu/cmd/example/collection/collection_root.cmdi has 2 child collections:
- http://infra.clarin.eu/cmd/example/collection/collection_lrt_inventory.cmdi, this file contains in turn links to files like:
The recommended profile to use for collection description is clarin.eu:cr1:p_1345561703620 ("Collection").
All files of this example collection can be accessed and explored via http://infra.clarin.eu/cmd/example/collection/
Notice that this example is based on 1.1. The same principles apply to CMDI 1.2 in which hierarchies can be constructed in the same matter.
Below is a graphical representation (as shown by Arbil) of the CMDI file hierarchy used above as an example.
OK, my CMDI or OLAC metadata (describing linguistic resources) is ready, how to proceed now?
The next step is getting your metadata published in the Virtual Language Observatory. See How can I publish my metadata to the Virtual Language Observatory for detailed information.
Where can I find all details on, references to and the background of this component-based metadata concept?
Check out this specification document on metadata.
PLEASE NOTE: The information in this document might be partially outdated. The information on http://www.clarin.eu/cmdi (including these FAQs) is certainly more up to date and should be considered authorative.
So where do I find more information about creating components, profiles and using profiles created by others?
If there is no single metadata scheme, how should I describe my resources in order for them to be compatible with the CLARIN infrastructure?
CLARIN proposes a component-based approach: you can combine several metadata components (sets of metadata elements) into a self-defined scheme that suits your particular needs. Of course you can share your profile with others (in fact we strongly advise that). If sharing the full profile is not an option, you still can use common components, e.g. a component to describe a sound recording. In case that still does not address your needs, it is even possible to create components yourself.
So what metadata scheme is used within CLARIN?
What metadata schemes are there for the description of linguistic resources?
Quite a few. Examples are: Dublin Core, (which is an enriched version of Dublin Core), , the header.