linguistics

Xenophobia on Greek Twitter during and after the Financial Crisis

11 January 2022

The Project, The project presents a replication of a data-driven and linguistically inspired verbal aggression analysis framework that was designed to examine verb

‘This study is an example of how a language technology-based method can be used as a complementary research instrument in order to address broader soc, btn-arrow-circle, image-right

Methodology, The methodology that was initially designed and applied to 2013-2016 Twitter data as part of the XENO@GR project was reapplied to 2019 Twitter data in

‘This information is useful for researchers, such as political and social scientists, journalists, and, given the high correlation between physical an, btn-arrow-circle, image-right

Outcome, During the first study (2013-2016), the most discussed groups in the Twitter collections were refugees and Germans, reflecting the ongoing refugee cri

Publications and Future Plans, The project team is currently working on the extension of the framework to other targets and domains through two case studies in the context of the SS

Views on CLARIN, ‘The natural language processing tools and workflows that you can build are extremely useful for several semantic annotation analysis tasks. And in ge

Maria Pontiki, PhD, Scientific Associate at the Institute for Language and Speech Processing, Athena Research Center, Athens, Greece Maria Gavriili

Access the ILSP suite of NLP tools for Greek via CLARIN:EL:, btn-arrow-circle

Using a Monitor Newspaper Corpus to Trace Changing Language as a Result of COVID-19

9 December 2021

The Project, This project illustrates the possibility to trace, almost in real time, changes in language in response to a crisis using a monitor newspaper corpus.

As a response to the dramatic developments that took place in early 2020, a sudden and dramatic increase in vocabulary took place in a very short peri

'The pandemic provided an exceptional opportunity to demonstrate the use of this CLARIN monitor newspaper corpus.’ Koenraad De Smedt, btn-arrow-circle, image-right

Methodology, This study used the Norwegian Newspaper Corpus as its data source. All occurrences of words starting with corona/korona in the period from 9 January 2

plot-ny-kum.png, Cumulative increase of the corona compound vocabulary.

Outcome, Not only was the occurrence of new compounds with the stem corona/korona in the studied timeframe very high, but the speed of vocabulary growth and th

Many of the new compounds are heavily context-dependent: for instance, koronatelt (corona tent), koronautsettelsene (corona postponements), coronalov

‘This is the first study to demonstrate the effect of such a spelling change in various Norwegian media sources.’ Koenraad De Smedt, btn-arrow-circle, image-right

CLARIN Tools and Resources, This study used the Norwegian Newspaper Corpus as its data source. The corpus is part of the CLARIN Resource Family ‘Newspaper Corpora’. It is updated

Access Corpuscle via the CLARINO Centre Bergen:, btn-arrow-circle, clarino-green-sml.png, image-left

Browse newspaper corpora in CLARIN's Resource Families: , btn-arrow-circle, resource-families-hires.jpg, image-left

Views on CLARIN, 'Newspaper monitor corpora, which incorporate new materials on a regular basis, are particularly useful for tracking linguistic changes spurred by cur

Koenraad De Smedt, Professor of Computational Linguistics, Department of Linguistic, Literary and Aesthetic Studies, University of Bergen, Norway

See here for more information on how CLARIN has responded to COVID-19.

WordTies

WordTies is a web interface developed to visualize monolingual wordnets as well as their alignments with wordnets in other languages. Wordnets are a kind of lexical-semantic dictionaries where concepts are related to other concepts in language via semantic relations. In the WordTies browser, these semantic relations are made available in a more intuitive and graphical fashion compared to what is found in most other wordnet browsers. It is often difficult to apply wordnets across languages because they expose very different structures. Questions that one might want to ask are: What are the differences/similarities in the way wordnets are structured in the different European languages, and are these differences rooted in actual cultural differences? Or can they rather be explained by the use of different compilation strategies: monolingually based, based on corpora, based on lexica/term lists, cross-lingually based (via translations) etc. A flexible and intuitive way of comparing such resources is to connect them through WordTies.

To the extent where the relevant languages and domains are covered, WordTies can be used for addressing questions regarding cultural differences realized through language: How are educational systems expressed in wordnets in the different European languages? Are the divergences rooted in actual differences in the educational systems across countries? How are food taxonomies expressed in terminologies/wordnets in the different European languages? For example, are cheeses structured differently from a taxonomical view point depending on whether we in each particular country typically eat them as a starter, as a dessert or in a sandwich? Do we have comparable taxonomies for bread? Are these taxonomies changing over time due to more globalised eating habits? To illustrate, follow the example search below.

Example search

Enter WordTies at https://wordties.nors.ku.dk/ and read the brief introduction regarding how each wordnet is developed
Select the Danish wordnet, DanNet.
Choose Multilingual alignments
Look up 'brød' ('bread')
Examine the subconcepts of 'bread' by considering its hyponyms (in the left column you can see the colour of this relation)
Scroll down to see the definitions of each Danish hyponym.
Click on alignment with Finnish
Examine the bread hyponyms for Finnish
Scroll down and read the definitions of the Finnish hyponyms (formulated in English)
Consider to which extend the differences between the two languages are due to cultural differences or due to different strategies in building the wordnets.

The impact of the browser will be increased by extending it to include also languages outside the Nordic and Baltic area. Currently, the Polish wordnet has been integrated in WordTies.

CLARIN Centre

University of Copenhagen

Project leader

Bolette Pedersen

Contact email

bspedersen@hum.ku.dk

Links

Word Ties website

Acknowledgements

WordTies was developed as a prototype in the META-NORD project (no. 270899) with the Nordic and Baltic countries. Task leader was Bolette Pedersen; developers were Mitchell Seaton and Anders Johannsen, University of Copenhagen.

WordTies currently includes Danish, Finnish, Estonian, Swedish and Polish wordnets. The following wordnet developers participated in the task of aligning and evaluating the Nordic and Baltic wordnets: Lars Borin,Markus Forsberg, Neeme Kahusk, Krister Lindén, Jyrki Niemi, Niklas Nisbeth, Lars Nygaard, Heili Orav, Eirikur Rögnvaldsson, Mitchell Seaton, Kadri Vider, and Kaarlo Voionmaa. The Polish wordnet was included under the leadership of Maciej Piasecki as a joint task between CLARIN-DK and CLARIN-PL.

linguistics

Xenophobia on Greek Twitter during and after the Financial Crisis

Using a Monitor Newspaper Corpus to Trace Changing Language as a Result of COVID-19

WordTies

Tags

CLARIN – the research infrastructure for language as social and cultural data