Rendering endangered languages lexicons interoperable through standards harmonization.
Workshop on Lexicon Tools and Lexicon Standards
In many linguistic fields lexica constitute important resources. In field linguistics lexica may be the only record of a dying or extinct language thereby constituting a unique linguistic and cultural source of information and they are essential parts of describing the language system. In lexica may be found in many NLP applications to access the information necessary to process words, idioms etc. In all cases optimization, maintenance and extension of electronic lexical resources are crucial aspects and an ever growing attention is being paid to interoperability aspects which allow sharing and reuse of lexica created elsewhere. At the MPI we needed a common framework to model a large variety of lexical structures and content without the possibility to maintain all these different lexica with help of the tools with which they were created – mostly even without an explicit structure specification. Other institutes in field linguistics and NLP are suffering from identical problems.
Therefore a number of initiatives and projects address interoperability as core issues in their work plan and have recognized that only a wide acceptance and adoption of standards will
• lead to an interoperable domain of lexical resources
• allow us to maintain long-term accessibility to digital lexica
• and to maintain tools capable of supporting access to different lexica.
In the area of printed lexica it was the (hopefully) long lasting paper and the human eye together with a few conventions that solved this problem. For digital lexica we all understand now that this is not sufficient. After a first meeting between members of the two groups (Field Linguists and NLP) in Munich a few years ago much has happened where mainly the EMELD, and ISO TC37/SC4 initiatives need to be mentioned. Recently the RELISH project was started to bring these tracks together amongst experts being involved in field linguistics mainly. At a recent meeting in Berlin a few major initiatives also including NLP experts (ELRA, TEI, ISO, CLARIN, ACL, T4ME) met and formulated a joint roadmap for standards. This workshop now is meant to bring together field linguistics and NLP experts to discuss the approaches, standards, tools and interoperability of lexical resources. The aim is to understand the requirements and to design concrete steps towards further harmonization if possible. CLARIN which needs to cover resources of all linguistic sub-domains obviously needs to bridge between the different requirements.