Corpora

New Impact Story: Discovering Slovenian Language Structure Using Corpora

19 September 2022

Jakob Lenardič’s PhD thesis was recently awarded best of the year 2021/2022 at the Faculty of Arts, University of Ljubljana. Read how he combined theoretical and corpus...

Discovering Slovenian Language Structure Using Corpora

14 September 2022

The Project, In recognition of its outstanding quality, Jakob Lenardič’s PhD thesis was recently awarded best of the year 2021/2022 at the Faculty of Arts, Univers

image0.jpeg

Lenardič’s thesis is deeply rooted in theoretical linguistics, and the details of his work are difficult to simplify while also doing the theoretical

Apart from its method, Lenardič’s thesis stands out for another reason: It provides an explicit compositional semantics for Slovenian grammatical stru

'Science is inherently good.' Jakob Lenardič on the importance of asking questions, regardless of the wider impact, btn-arrow-circle, many-old-books-stacked-in-texture-picture-id1074590734_0.jpeg, image-right

Background, Lenardič holds a BA and MA in English literature and linguistics from the University of Ljubljana. When he started his PhD in 2016, he had little expe

To explore his research questions, Lenardič used the tools developed at CLARIN.SI, such as the noSketch Engine concordancer, on corpora relevant to hi

Future directions - CLARIN and DH, Though his own work has undoubtedly benefited from the CLARIN infrastructure, Lenardič also sees outreach and collaboration as important. CLARIN’s ben

'Get them while they're young.' Jakob Lenardič on motivating students to engage with digital humanities, so the benefits of CLARIN are n, btn-arrow-circle, University-of-Ljubljana-logo.png, image-left

Lenardič says: ‘In Slovenia, there is a sizable research community which does not seem to be aware of our national consortium and the services and wea

Dr Jakob Lenardič, Faculty of Arts, University of Ljubljana

Digital Humanities at the University of Ljubljana, btn-arrow-circle

Corpora

Tour d e CLARIN: The CLARIN PL-B-Centre

18 December 2021

Written by Krzysztof Hwaszcz and Jan Wieczorek

The Polish consortium CLARIN-PL, which is a founding member of CLARIN , operating since 2012, was already presented in Tour...

Using a Monitor Newspaper Corpus to Trace Changing Language as a Result of COVID-19

9 December 2021

The Project, This project illustrates the possibility to trace, almost in real time, changes in language in response to a crisis using a monitor newspaper corpus.

As a response to the dramatic developments that took place in early 2020, a sudden and dramatic increase in vocabulary took place in a very short peri

'The pandemic provided an exceptional opportunity to demonstrate the use of this CLARIN monitor newspaper corpus.’ Koenraad De Smedt, btn-arrow-circle, image-right

Methodology, This study used the Norwegian Newspaper Corpus as its data source. All occurrences of words starting with corona/korona in the period from 9 January 2

plot-ny-kum.png, Cumulative increase of the corona compound vocabulary.

Outcome, Not only was the occurrence of new compounds with the stem corona/korona in the studied timeframe very high, but the speed of vocabulary growth and th

Many of the new compounds are heavily context-dependent: for instance, koronatelt (corona tent), koronautsettelsene (corona postponements), coronalov

‘This is the first study to demonstrate the effect of such a spelling change in various Norwegian media sources.’ Koenraad De Smedt, btn-arrow-circle, image-right

CLARIN Tools and Resources, This study used the Norwegian Newspaper Corpus as its data source. The corpus is part of the CLARIN Resource Family ‘Newspaper Corpora’. It is updated

Access Corpuscle via the CLARINO Centre Bergen:, btn-arrow-circle, clarino-green-sml.png, image-left

Browse newspaper corpora in CLARIN's Resource Families: , btn-arrow-circle, resource-families-hires.jpg, image-left

Views on CLARIN, 'Newspaper monitor corpora, which incorporate new materials on a regular basis, are particularly useful for tracking linguistic changes spurred by cur

Koenraad De Smedt, Professor of Computational Linguistics, Department of Linguistic, Literary and Aesthetic Studies, University of Bergen, Norway

See here for more information on how CLARIN has responded to COVID-19.

ParlaCLARIN II Goes Virtual

18 May 2020

The organisers of the second ParlaCLARIN workshop on creating, using and linking parliamentary corpora with other types of political discourse share their experience with the successful virtual edition of this LREC2020 workshop that was originally envisaged to take place in Marseille.

Search engine demonstration:

Country:

Germany

CLARIN Centre:

BBAW

Description

Das Deutsche Textarchiv (German Text Archive) provides access to a comprehensive range of German texts from around 1600 to 1900. The selection of texts is based on scholarly bibliographies of the period, resulting in a balanced corpus, containing more than 1,300 works and almost 100 million words (as of the start of 2014).