Skip to main content

Corpora

Discovering Slovenian Language Structure Using Corpora

The Project, In recognition of its outstanding quality, Jakob Lenardič’s PhD thesis was recently awarded best of the year 2021/2022 at the Faculty of Arts, Univers
image0.jpeg
Lenardič’s thesis is deeply rooted in theoretical linguistics, and the details of his work are difficult to simplify while also doing the theoretical
Apart from its method, Lenardič’s thesis stands out for another reason: It provides an explicit compositional semantics for Slovenian grammatical stru
    'Science is inherently good.'  Jakob Lenardič on the importance of asking questions, regardless of the wider impact, btn-arrow-circle, many-old-books-stacked-in-texture-picture-id1074590734_0.jpeg, image-right
Background, Lenardič holds a BA and MA in English literature and linguistics from the University of Ljubljana. When he started his PhD in 2016, he had little expe
To explore his research questions, Lenardič used the tools developed at CLARIN.SI, such as the noSketch Engine concordancer, on corpora relevant to hi
Future directions - CLARIN and DH, Though his own work has undoubtedly benefited from the CLARIN infrastructure, Lenardič also sees outreach and collaboration as important. CLARIN’s ben
    'Get them while they're young.'  Jakob Lenardič on motivating students to engage with digital humanities, so the benefits of CLARIN are n, btn-arrow-circle, University-of-Ljubljana-logo.png, image-left
Lenardič says: ‘In Slovenia, there is a sizable research community which does not seem to be aware of our national consortium and the services and wea
Dr Jakob Lenardič, Faculty of Arts, University of Ljubljana    
Digital Humanities at the University of Ljubljana, btn-arrow-circle

Using a Monitor Newspaper Corpus to Trace Changing Language as a Result of COVID-19

The Project, This project illustrates the possibility to trace, almost in real time, changes in language in response to a crisis using a monitor newspaper corpus.
As a response to the dramatic developments that took place in early 2020, a sudden and dramatic increase in vocabulary took place in a very short peri
'The pandemic provided an exceptional opportunity to demonstrate the use of this CLARIN monitor newspaper corpus.’  Koenraad De Smedt, btn-arrow-circle, image-right
 
Methodology, This study used the Norwegian Newspaper Corpus as its data source. All occurrences of words starting with corona/korona in the period from 9 January 2
plot-ny-kum.png, Cumulative increase of the corona compound vocabulary.
 
Outcome, Not only was the occurrence of new compounds with the stem corona/korona in the studied timeframe very high, but the speed of vocabulary growth and th
Many of the new compounds are heavily context-dependent: for instance, korona­telt (corona tent), koronautsettelsene (corona postponements), coronalov
‘This is the first study to demonstrate the effect of such a spelling change in various Norwegian media sources.’ Koenraad De Smedt, btn-arrow-circle, image-right
 
CLARIN Tools and Resources, This study used the Norwegian Newspaper Corpus as its data source. The corpus is part of the CLARIN Resource Family ‘Newspaper Corpora’. It is updated
Access Corpuscle via the CLARINO Centre Bergen:, btn-arrow-circle, clarino-green-sml.png, image-left
Browse newspaper corpora in CLARIN's Resource Families: , btn-arrow-circle, resource-families-hires.jpg, image-left
 
Views on CLARIN, 'Newspaper monitor corpora, which incorporate new materials on a regular basis, are particularly useful for tracking linguistic changes spurred by cur
 
Koenraad De Smedt, Professor of Computational Linguistics, Department of Linguistic, Literary and Aesthetic Studies, University of Bergen, Norway
See here for more information on how CLARIN has responded to COVID-19.

ParlaCLARIN II Goes Virtual

The organisers of the second ParlaCLARIN workshop on creating, using and linking parliamentary corpora with other types of political discourse share their experience with the successful virtual edition of this LREC2020 workshop that was originally envisaged to take place in Marseille. 



 

Deutsches Textarchiv

Deutsches Textarchiv logo
Search engine demonstration:
Country:
Germany
CLARIN Centre:
BBAW
Description

Das Deutsche Textarchiv (German Text Archive) provides access to a comprehensive range of German texts from around 1600 to 1900. The selection of texts is based on scholarly bibliographies of the period, resulting in a balanced corpus, containing more than 1,300 works and almost 100 million words (as of the start of 2014).