Blog post written by Darja Fišer and Jakob Lenardič
CLARIAH-NL is a project in the Netherlands that is setting up a distributed research infrastructure that provides humanities researchers with access to large collections of digital data and user-friendly processing tools. The Netherlands is a member of both CLARIN and DARIAH ERIC, so CLARIAH-NL contributes therefore not only to CLARIN but also to DARIAH. CLARIAH-NL not only covers humanities disciplines that work with natural language (the defining characteristics of CLARIN) but also disciplines that work with structured quantitative data. Though CLARIAH aims to cover the humanities as a whole in the long run, it currently focusses on three core disciplines: linguistics, social-economic history, and media studies.
CLARIAH-NL is a partnership that involves around 50 partners from universities, knowledge institutions, cultural heritage organizations and several -companies, the full list of which can be found here. Currently, the data and applications of CLARIAH-NL are managed and sustained at eight centres in the Netherlands: Huygens Ing, the Meertens Institute, DANS, the International Institute for Social History, the Max Planck Institute for Psycholinguistics, the Netherlands Institute for Sound and Vision, the National Library of the Netherlands, and Dutch Language Institute. Huygens Ing, The Meertens Institute, the Max Planck Institute for Psycholinguistics, and Dutch Language Institute are Certified CLARIN Type B centres. The consortium is led by an eight-member board and its director and national coordinator for CLARIN ERIC is Jan Odijk.
The research, development and outreach activities at CLARIAH-NL are distributed among five work packages: Dissemination and Education (WP1) and Technology (WP2) deal respectively with User Involvement and the technical design and construction of the infrastructure, whereas the remaining three work packages focus on three selected research areas: Linguistics (WP3), Social and Economic History (WP4) and Media Studies (WP5).
Dissemination and Education work package
In the User Involvement-focused Dissemination and Education package, CLARIAH-NL aims to facilitate knowledge sharing among Digital Humanities and Social Sciences scholars as well as provide services that cater to the needs of their research. In this respect, CLARIAH-NL has successfully organized a variety of User Involvement activities, such as the CLARIAH Linked Data Workshop, which took place in June 2017 and was intended to introduce Linked Data to both novice and advanced researchers.
Linguistics work package
MIMORE
In the Linguistics work package, CLARIAH-NL focusses on developing and improving applications for enriching corpora and searching through them – one such tool is MIMORE (Microcomparative Morphosyntactic Research tool). This tool enables researchers to investigate morphosyntactic variation in the Dutch dialects by searching three related databases with a common on-line search engine. The search results can be visualised on geographic maps and exported for statistical analysis. The three databases involved are DynaSAND, DiDDD and GTRP. For more information on MIMORE see this webpage , this movie and this educational package.
SoNaR
An important data set in this connection is the Dutch reference corpus SoNaR, which was created in earlier projects for developing software, but has been opened up for research by humanities scholars through the OpenSoNaR web application. State-of-the-art tools for the enrichment of textual corpora are also developed at the consortium. An example of such software is Frog, which is a NLP suite containing a tokeniser, PoS-tagger, lemmatiser, morphological analyser, named entity recogniser and dependency parser for Dutch.
Social and Economic History work package
In the Social and Economic History package, structured databases of social-economic history are being integrated into the Linked Data paradigm. With its uniform structure and explicit semantics, it ensures that relations and connections can be searched across different databases, which is of crucial importance for historical analysis and allows researchers to easily test hypotheses that could not be investigated before.
Media Studies work package
Finally, the Media Studies package focuses on providing special tools for viewing, browsing and searching through large collections of audio-visual data such as films, radio broadcasts, and vlogs. It aims to provide a Media Suite, the first version of which can be found here, with access to relevant audio-visual collections by integrating tools developed in earlier projects, such as AVResearcherXL, which is a tool for exploring radio and television programme descriptions, television subtitles and general newspaper articles through a user-friendly graphic interface.
The board of CLARIAH
Click here to read more about Tour de CLARIN