Skip to main content

Tour de CLARIN: Resource from Poland - plWordNet

Submitted by karolina@clarin.eu on

Blog post written by Jan Wieczorek, Ewa Rudnicka and Agnieszka Dziob


plWordNet (Polish Słowosieć) is a (large) lexico-semantic network reflecting (the current content and structure of) the Polish lexical system. It is a kind of dictionary in which word senses are represented by lexical units, linked by relations to create synonym sets - synsets. It is inspired by the Princeton University WordNet – the very first wordnet, which has been in development since the 1980s. Both wordnets are linked via inter-lingual relations, effectively creating a bilingual semantic network. plWordNet is being developed at Wrocław University of Technology by a team of linguists and programmers since 2006.

            The meanings of lexical units and synsets are defined by relations; however, more and more units also contain a gloss and usage example that further describe their meaning. In version 3.0, certain units in plWordNet (a number of which grows progressively) are marked with sentiment values – positive, negative, ambiguous, or neutral. Version 3.1 of plWordNet, published in December 2017, includes:

  • around 191,000 words (lemmas);
  • around 290,000 senses (lexical units);
  • around 600,000 relations that describe words and their meanings within plWordNet and around 239,000 inter-lingual relations;
  • around 160,000 glosses and 70,000 usage examples; and
  • around 80,000 units which contain emotive annotation.

            plWordNet encompasses four parts of speech: nouns (around 177,000 senses), adjectives (around 54,000 senses), adverbs (around 14,000 senses), and verbs (around 40,000 senses) - and is being progressively expanded. In contrast with the Princeton WordNet, plWordNet is characterised by a wide range of relations both on the level of synsets and lexical units that are largely the result of the morphological richness of the Polish language.

 plWordNet can be browsed online (Figure 1), via a mobile app available on Google Play, or via the WNLoomViewer application, which can be downloaded here. It can be used for linguistic analyses both in Polish as well as in comparative and translation studies. Due to its open licence, which is based on the Princeton WordNet, it can also be used for data mining both in research and commercial projects.

 The WordnetLoom Editor is a Java application that provides a visual, graph-based interactive presentation of the structures of plWordNet and thereby enables browsing and direct editing of lexico-semantic relations and synsets (Figure 2). It is remarkable for its flexibility and adjustability to the needs of individual users. It is currently being used by the Portuguese Wordnet team and in a project led by Professor Ewa Geller from Warsaw University which aims to describe the Yiddish language and to map senses on the morphological and semantic level from Yiddish to corresponding senses in plWordNet, GermaNet, and the Princeton WordNet.

            In 2014, plWordNet became one of the crucial parts of the semantic search engine for the Polish language called NEKST (the Natively Enhanced Knowledge Sharing Technologies), which is adapted to Polish syntax (esp. flexible word order) and inflection. plWordNet served as the basis for word sense disambiguation (WSD) and creation of links between words and was also used to develop an anti-plagiarism system, based on the NEKST search engine.     

We encourage everybody interested in the linguistic description of senses in plWordNet to read related publications that are available here. Additionally, if you use plWordNet in your research, please cite the following papers:

  • Piasecki, M., Szpakowicz, S. & Broda, B. (2009). A Wordnet from the Ground Up. Wrocław : Oficyna Wydawnicza Politechniki Wrocławskiej.
  • Maziarz, M., Piasecki, M., Rudnicka, E., Szpakowicz, S. & Kędzia, P (2016). plWordNet 3.0 – a Comprehensive Lexical-Semantic Resource. In Calzolari, N., Matsumoto, Y. & Prasad, R. (editors), {COLING} 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, December 11-16, 2016, Osaka, Japan, pages 2259-2268. ACL.

 

Figure 1: The interface of plWordNet

 

Figure 2: The interface of the WordnetLoom Editor, visualising the wordnet structure of the lemma złodziej (English thief).

 


Click here to read more about Tour de CLARIN