Skip to main content

computational linguistics

Xenophobia on Greek Twitter during and after the Financial Crisis

The Project, The project presents a replication of a data-driven and linguistically inspired verbal aggression analysis framework that was designed to examine verb
‘This study is an example of how a language technology-based method can be used as a complementary research instrument in order to address broader soc, btn-arrow-circle, image-right
Methodology, The methodology that was initially designed and applied to 2013-2016 Twitter data as part of the XENO@GR project was reapplied to 2019 Twitter data in
‘This information is useful for researchers, such as political and social scientists, journalists, and, given the high correlation between physical an, btn-arrow-circle, image-right
Outcome, During the first study (2013-2016), the most discussed groups in the Twitter collections were refugees and Germans, reflecting the ongoing refugee cri
Publications and Future Plans, The project team is currently working on the extension of the framework to other targets and domains through two case studies in the context of the SS
Views on CLARIN, ‘The natural language processing tools and workflows that you can build are extremely useful for several semantic annotation analysis tasks. And in ge
Maria Pontiki, PhD, Scientific Associate at the Institute for Language and Speech Processing, Athena Research Center, Athens, Greece Maria Gavriili
Access the ILSP suite of NLP tools for Greek via CLARIN:EL:, btn-arrow-circle

Using a Monitor Newspaper Corpus to Trace Changing Language as a Result of COVID-19

The Project, This project illustrates the possibility to trace, almost in real time, changes in language in response to a crisis using a monitor newspaper corpus.
As a response to the dramatic developments that took place in early 2020, a sudden and dramatic increase in vocabulary took place in a very short peri
'The pandemic provided an exceptional opportunity to demonstrate the use of this CLARIN monitor newspaper corpus.’  Koenraad De Smedt, btn-arrow-circle, image-right
 
Methodology, This study used the Norwegian Newspaper Corpus as its data source. All occurrences of words starting with corona/korona in the period from 9 January 2
plot-ny-kum.png, Cumulative increase of the corona compound vocabulary.
 
Outcome, Not only was the occurrence of new compounds with the stem corona/korona in the studied timeframe very high, but the speed of vocabulary growth and th
Many of the new compounds are heavily context-dependent: for instance, korona­telt (corona tent), koronautsettelsene (corona postponements), coronalov
‘This is the first study to demonstrate the effect of such a spelling change in various Norwegian media sources.’ Koenraad De Smedt, btn-arrow-circle, image-right
 
CLARIN Tools and Resources, This study used the Norwegian Newspaper Corpus as its data source. The corpus is part of the CLARIN Resource Family ‘Newspaper Corpora’. It is updated
Access Corpuscle via the CLARINO Centre Bergen:, btn-arrow-circle, clarino-green-sml.png, image-left
Browse newspaper corpora in CLARIN's Resource Families: , btn-arrow-circle, resource-families-hires.jpg, image-left
 
Views on CLARIN, 'Newspaper monitor corpora, which incorporate new materials on a regular basis, are particularly useful for tracking linguistic changes spurred by cur
 
Koenraad De Smedt, Professor of Computational Linguistics, Department of Linguistic, Literary and Aesthetic Studies, University of Bergen, Norway
See here for more information on how CLARIN has responded to COVID-19.

plWordNet 3.0 – Słowosieć 3.0

plWordNet 3.0 – Słowosieć 3.0

plWordNet is a lexico-semantic network which reflects the lexical system of the Polish language. plWN currently contains 178 000 nouns, verbs, adjectives, and adverbs, 259 000 word senses, and over 600 000 relations and 240 000 inter-lingual relations between lexical units. It is now the largest wordnet in the world and is still growing.

Senses in plWordNet are interconnected by relations. In the resulting network, each word is defined implicitly in reference to other words. For example, samochód 'car' is a kind of pojazd drogowy 'road vehicle'; it is a whole consisting of silnik 'engine', spryskiwacz 'windscreen washer', podwozie 'chassis' and so on; its close counterpart is the colloquial fura 'wheels'.

Among plWordNet's numerous applications there is its use as a Polish-English and English-Polish dictionary -- the effect of mapping onto Princeton WordNet (the first and for many years the largest wordnet in the world). plWordNet is also an important resource in natural language processing and in artificial intelligence research. For example, it is used by Google Translate for the purposes of machine translation.

The University has made plWordNet available free of charge for all applications, including commercial ones, on a licence modelled on the Princeton WordNet licence. Users may browse plWordNet via mobile version and via WordNetLoom-Viewer (application enabling display of plWN entries), as well as download source files. Programmers may access plWordNet via Web service.

We provide (currently only in download version) 31 000 lexical units marked with their sentiment values: positive, negative, ambiguous or neutral.

 

CLARIN Centre
CLARIN-PL
Project leader
dr. Maciej Piasecki
Contact email
Acknowledgements

Wroclaw University of Technology, Ministry of Science and Higher Education (Poland)