Skip to main content

Frequently Asked Questions - Persistent Identifiers (PIDs)

As described at the EPIC website it is sufficient to send a mail to handle /at/ gwdg.de with a motivated request.

A handle exists of 2 parts:

  • a prefix (e.g. 1839)
  • a suffix (e.g. 00-0000-0000-0009-3C7E-F)

The official way of refering to a handle is:

hdl: + prefix + / + suffix

e.g.:

hdl:1839/00-0000-0000-0009-3C7E-F

To resolve such a handle (=make it a clickable link that redirects to the resource itself) use the following formula:

http://hdl.handle.net/prefix/suffix

e.g.: http://hdl.handle.net/1839/00-0000-0000-0009-3C7E-F

 

 

The rewriting behaviour of part identifiers can be configured per handle prefix (actually it can also be done per individual handle but this is not supported for at this point). For EPIC (version 1, so with prefix 11858) the choice was made to rewrite [suffix] to ?[suffix]

 So suppose that 11858/1234 resolves to http://clarin.eu then

 11858/1234@test=a will be resolved to http://clarin.eu?test=a

Please note that when you offer PIDs with part identifiers that you are responsible of maintaining the part identification fragment as well. Remember that users will use it to link to your resources and that the resulting end point should always be available.

(Answer taken from the ISO citer draft, p. 11)

This International Standard supports different levels of granularity. The following recommendations are designed to encourage efficiency and promote interoperability with other naming schemes:

1) If there is an existing identifier scheme for a type of resources, for instance, ISBN, this level of granularity should be retained, which is to say that no new PIDs should be issued without very good reasons, such as for chapters. Chapters would preferably be addressed using part identifiers in conjunction with the of the book.

2) If the resource is associated with the complete content of a digital file, an individual PID should probably be assigned for this resource.

3) If the resource is autonomous and exists outside a larger context, an individual PID should probably be assigned for this resource.

4) If a resource should be citable apart from any containing resource, an individual PID should probably be assigned for this resource.

These recommendations are, however, subject to the needs of resource creators with respect to the level of granularity they deem suitable to the specific resource environment.

PIDs are as said unique and persistent identifiers of objects that are made available by proper repositories. For many resources there are additional characteristics such as multiple copies for preservation reasons, a string (such as MD5) that can be used to check authenticity, simple metadata for citation purposes, a reference to the access permission record etc. A proper system should offer such information immediately when resolving a PID. PURLs can't offer functionality, for URNs we do not know about well-proven and robust resolver, although the big libraries agreed on using URNs for their publications.

CLARIN has an arrangement with the EPIC consortium that CLARIN members will be able to register PIDs and of course resolve them. This consortium groups a number of reliable European service providers that want to participate in providing a redundant service for the research world, i.e. we are speaking about millions of PIDs and a service at very low costs. The service is based on the Handle System which according to our investigations is the only robust system meeting all requirements. No one is obliged to register Handles, but of course CLARIN centres will need to demonstrate that their PIDs can be resolved in a robust manner and offer the required functionality.

If the PIDs cannot be resolved at a certain moment one simply cannot access a resource. Think of a situation where hundreds of users are waiting on a resolution of a and nothing happens - a nightmare for any cyberinfrastructure scenario! Since this would not be acceptable, we need to make sure that the PID service is based (a) on a very robust and reliable software offering sufficient functionality, (b) on a proper service based on redundant centres with a high availability and persistency guarantee.

Handling PIDs is very simple. First you need to register a for a resource or service. You can do this very simply by providing the required information to the PID service site, in particular the path to access the resource such as a URL and you will receive back a PID which you can enter into the metadata description for example, so that everyone can use it for referencing. When a user finds such a PID in a resource, he/she can click on this reference and the service will resolve the PID and give access to (one of the copies of) the resource. Normally as user you don't see the intermediate transactions.

In the emerging cyberinfrastructure we are creating more and more references between resources, resource fragments and services. The creation of these references is very costly and often is essential for the interpretation of a resource. Therefore we need proper mechanisms to ensure that these references survive despite all the changes that happen in repositories for example. It is known that URLs are not appropriate - they are not persistent even when we believe that they are proper URIs. Therefore special PIDs come into place which identify an object and which are maintained by reliable institutions.

Persistent identifiers are increasingly often seen as core component for all the many references we are creating at various levels - this can range from references between metadata descriptions and their resources up to references between semantic assertions made by using the RDF (Resource Description Framework). For more information please read the requirements specification document or the short guide.