This tutorial gives a basic introduction to research infrastructures and linguistic research data management, focusing on the central services provided by the CLARIN research infrastructure. The presentations have been compiled in three units:
- Unit 1. Introduction to the Language Resources and Research Data Repositories
- Unit 2. How to Use CLARIN for Linguistic Research
- Unit 3. Sharing and Archiving Language Resources
- References
The course is useful to anyone who is new to CLARIN and is interested in learning how to use the central services for language research and sharing and archiving language resources. Teachers and educators can use the presentations to introduce their students to the CLARIN research infrastructure and make them aware of practical aspects of linguistic research data management, equipping students with the necessary skills and competences to thrive in the evolving landscape of open science and data-driven research.
The learning content was initially designed and published in 2023 by Iulianna van der Lek and Darja Fišer in the Introduction to Language Data: Standards and Repository learning block in the UPSKILLS project. Please note that the presentations have been exported from the UPSKILLS Moodle platform, as they were originally published at the end of the project in August 2023. No or very few updates have been made to the content since then. Trainers and teachers interested in further adapting the content of the presentations, can download them either directly from this page or Moodle and upload them to a learning management system, e.g. Moodle, Brightspace, Blackboard, or any content management system, e.g. Drupal, which supports the H5P format. The learning content is also available in plain text format and can be accessed upon request. For questions and support, please email training [at] clarin.eu (training[at]clarin[dot]eu).
Unit 1. Introduction to Language Resources and Research Data Repositories
What are Language Resources?
By the end of this presentation, you will be able to explain language resources and their usefulness.
The Role of Research Infrastructure for Science and Research
By the end of this presentation, you will be able to explain what a research infrastructure is and define main concepts, such as Open Science, FAIR and research data management.
Unit 2. How to Use CLARIN for Linguistic Research
What is CLARIN, and How Can You Access It?
This presentation briefly explains what CLARIN is and how you can access the central services and password-protected resources.
Using CLARIN for Language and Linguistic Research
By the end of this presentation, you will be able to identify the CLARIN central services you can use to discover, analyse, process, share and archive language research data.
Finding Tools in CLARIN to Process Digital Text Collections
By the end of this presentation, you will be able to use the Language Resource Switchboard to find a matching tool in the CLARIN infrastructure and process basic annotation tasks on raw digital text collections.
Citing Data in Language and Linguistics
By the end of this section, you will be able to identify existing guidelines and practices for citing research data in linguistics.
Collecting, Citing and Processing Language Resources from Data Catalogues
By the end of this section, you will be able to collect, share and cite language resources from different research data repositories and catalogues.
Unit 3. Sharing and Archiving Language Resources
How are Language Resources Created, Managed and Shared?
This presentation aims to provide a general overview of the principles of research data management and how they are applied to managing linguistic resources.
Creating a Research Data Management Plan for Linguistic Research
By the end of this presentation, you will familiarize yourself with best practices in linguistic research data management and be able to create your first data management plan.
Sharing and Archiving Language Resources
By the end of this presentation, you will be able to identify the benefits and challenges of sharing and archiving language resources, differentiate between different data publication services, and familiarise yourself with the basic steps of uploading a language resource to a research data repository.
Metadata Standards for Language Resources
By the end of this presentation, you will be able to explain what metadata is, recognise different types of metadata and schemas used to describe language resources, and perform metadata searches to discover language resources via the Virtual Language Observatory.
Overview of Common Standards and Formats for Language Resources
This presentation provides an overview of available standards and formats for creating and archiving language resources and introduces the CLARIN Standards Information System.
How Repositories Help Make your Language Data FAIR
By the end of this presentation, you will be able to explain how research data repositories apply the FAIR data principles to help researchers make their data findable, accessible, interoperable, and reusable.
Learn More
For more examples of tutorials, assignments and learning activities, you can access and download the full learning content on the UPSKILLS Moodle platform. The learning content is accompanied by a guide, Integrating Research Infrastructures into Teaching, which shows trainers and teachers how they can leverage the CLARIN research infrastructure to help students enhance their data collection, processing and analysis, and archiving skills.
How to cite this tutorial:
van der Lek, Iulianna; Fišer, Darja. (2023). Introduction to Language Data: Standards and Repositories. In UPSKILLS Learning Content. https://upskillsproject.eu/project/standards_repositories/. CC BY 4.0.