Tour de CLARIN: the LUND University Humanities Lab Knowledge Centre

Submitted by Jakob Lenardič on 11 August 2020

Blog post written by Johan Frid, edited by Darja Fišer and Jakob Lenardič

Lund University Humanities Lab is a department for research infrastructure, interdisciplinary research and training. Since 2017, the Lab is a certified CLARIN Knowledge Centre with a special focus on multimodal and sensor-based methods. As of 2020 we are also a CLARIN C-centre, meaning that our datasets are integrated with CLARIN's Virtual Language Observatory ( ). The Lab is a member of the Swedish national consortium for language resources and technology, Swe-Clarin.

We provide access to sensor-based technologies, methodological know-how, data management, and archiving expertise. Our mission is to facilitate and help diversify research around the issues of cognition, communication, and culture – traditional domains for the Humanities. That said, many projects (see the Lab’s Annual Report for an overview) undertaken at the Lab are interdisciplinary and conducted in collaboration with the Social Sciences, Medicine, the Natural Sciences, Engineering, and e-Science. The Lab enables researchers to combine traditional and novel methods, and to interact with other disciplines.

We have a wide range of facilities for measurements and recordings: articulography, electrophysiology, EEG, eye-tracking, professional audio and video recording, motion capture and virtual reality. The Lab also offers support and consultancy on statistics, machine learning-related research on language data, and keystroke logging for the study of the writing process.

As a node in the Swedish national infrastructure Swe-Clarin, the Humanities Lab provides speech and language technological support to a wide range of projects and contributes to the development of resources for Swedish language technology. For instance, the Swe-Clarin consortium has formed a thematic working group to develop a resource for benchmarking Swedish Named-Entity Recognition and Classification (NERC) systems. The NERC group links Swe-Clarin nodes at Lund, Gothenburg and Linköping. The aim is to develop a tool for finding and replacing Swedish names in written materials in order to anonymise or pseudonymise them. All the resources developed in this working group will be made available from The Language Bank of Sweden.

The Lab also provides tools and expertise related to language archiving, corpus and (meta)data management, with a continued emphasis on multimodal corpora, many of which contain Swedish resources, but also other (often endangered) languages, multilingual or learner corpora. A primary service is The Lund University Humanities Lab corpus server, containing a varied set of multimodal language corpora with standardised metadata and linked layers of annotations and other resources.

The corpus server hosts two sets of corpora, the Lund Corpora, and the Repository and Workspace for Austroasiatic Intangible Heritage (RWAAI) corpora. The facility contains a wide variety of data types including audio, video, text, images, and eye-movement data. The Lund Corpora offer data from major world languages and lesser-described minority languages, including longitudinal child language studies, adult language acquisition data, dialect surveys, and corpora with linked eye-tracking data. The RWAAI corpora constitute a unique digital resource preserving multidisciplinary research collections documenting the languages and cultures of communities from the Austroasiatic language family of Mainland Southeast Asia and India. The collections span over half a century of research in fields such as linguistics, anthropology, botany, ethnomusicology, and human ecology. More than 50 predominantly endangered minority languages are currently represented in the collection. Metadata is provided format, and is harvested by CLARIN's VLO.

The Lund corpus server