Workshop: CLARIN, Standards and the Text Encoding Initiative

CLARIN is a pan-European initiative which aims to build a research infrastructure for language resources integrating numerous tools and resources in a distributed architecture, and which will respond to the needs of researchers across the humanities and social sciences. CLARIN is being built on open standards, but also with a recognition that standards and guidelines are only one part of a complex jigsaw which needs to be assembled to create reliable, durable and high quality services. The Text Encoding Initiative is a long-standing community which develops guidelines for the encoding of scholarly texts in XML, and works with associated technologies. This workshop brings together those involved in these two sets of activities to share experiences and knowledge, and to find ways to work together productively in the next generation of infrastructure services.

Attendence at the workshop is a no-cost option when you regsiter at the Conference via the website: http://digilab2.let.uniroma1.it/teiconf2013/

Timetable

09:30 Registration

10:00 Keynote address: TEI for written historical corpora: why and how? - Alexander Geyken (Berlin-Brandenburg Academy of Sciences) abstract presentation (pdf)

11:00 break

11:30 Presentation 1: The new corpus query engine KorAP: connections with CLARIN and the TEI - Andreas Witt & Piotr Bański (Institut für Deutsche Sprache) presentation (pdf)

12:00 Presentation 2: Poio API1: a CLARIN-D curation project for language documentation and language typology - Peter Bouda (Centro Interdisciplinar de Documentação Linguística e Social, Minde) abstract (pdf) presentation (pdf)

12:30 Presentation 3: TEI, ALTO and : why we need all of them - Günter Mühlberger (University of Innsbruck) abstract (pdf) presentation (pdf)

13:00 lunch (not provided by conference organisers - see conference website for local restaurants)

14:30 Presentation 4: TEI and the Component Metadata Framework - Matej Durco and Karlheinz Mörth (Austrian Academy of Sciences) abstract (pdf) presentation (pdf)

15:00 Presentation 5: WebLicht's Text Corpus Format: susTEInability of CLARIN-D web services? - Jens Stegmann (University of Stuttgart)

15:30 Panel discussion: Responses: problems and opportunities - Arianna Ciula, Karlheinz Mörth and Laurent Romary

16:00 break

16:30 Panel discussion part 2: Next steps

17:00 End

Background and further information

The organizing committee of this workshop invited proposals for presentations on topics which link together CLARIN and the TEI, including:

  • the role of the TEI in developing standards for CLARIN services,
  • technical issues in the integration of TEI-conformant resources or TEI-aware tools in CLARIN services,
  • barriers and problems with the deployment and linking of CLARIN and TEI technologies,
  • training, awareness and advocacy activities.

Presenters are asked not to simply present an overview of their work, but to focus on precisely how, why (or why not) TEI formats, guidelines and technologies are being deployed, and to go into some technical detail to do this if necessary.

It is hoped that this will be only the start of promoting dialogue and collaboration between CLARIN and the TEI at many levels. One result would be an improved dialogue about the use of the TEI in higher-level initiatives to develop standards for the CLARIN architecture, but another would be enhanced engagement directly with the TEI community of developers and researchers in the many centres and institutions related to CLARIN.

Abstracts

TEI for written historical corpora: why and how?

Dr Alexander Geyken, Berlin-Brandenburg Academy of Sciences

In the first part of the talk I will report on our experiences at the Deutsche Textarchiv (German Text Archive, DTA) with the integration of texts from 15 external corpus projects (some of them were using the TEI from scratch, some not), including the pro and cons of the use of TEI. The second part will explain the motivation behind the DTA-Base format, a strict subset of TEI-P5 that is intended to allow rich structural expressiveness while being as precise as possible in order to allow the interoperability of the different corpora. 

Organizing Committee

Martin Wynne (Chair)

Oxford e-Research Centre

University of Oxford

martin.wynne@it.ox.ac.uk

Karlheinz Moerth

Institute for Corpus Linguistics and Text Technology

Austrian Academy of Sciences

Karlheinz.Moerth@oeaw.ac.at

Ineke Schuurman

KU Leuven / U.Utrecht

Belgium / the Netherlands

ineke@ccl.kuleuven.be

Andreas Witt

Institut für Deutsche Sprache

witt@ids-mannheim.de

Xavier Gomez Guinovart

Seminario de Linguistica Informatica

Universidade de Vigo

xgg@uvigo.es

Address

Sapienza Universitá di Roma
Piazzale Aldo Moro 5
Rome RM
Italy