Skip to main content

language technology

Donate Speech: Digital Speech Database to Boost Research and Innovation

The Project, Donate Speech is an ambitious collection of digital speech data intended for both academic and commercial use. Underpinned by an award-winning media c
Methodology, Donate Speech set out to collect everyday, spontaneous speech from as many different groups of Finnish speakers and as many individuals as possible, i
‘The software platform has been published as open source, allowing other organisations to build their own systems for collecting similar speech materi, btn-arrow-circle, Donate Speech logo.png, image-right
A key aspect of the Donate Speech project were the legal requirements surrounding the protection of personal data under European and national data pro
Outcome, In total, more than 25 000 citizens in Finland donated more than 220 000 speech samples comprising roughly 4000 hours of colloquial speech to be used
‘The datasets need to be openly available but in a controlled way, and in a way that still fits within the FAIR principles. This nuanced way of distri
Publications and Future Plans, Donate Speech has already extended the project to include Finnish-Swedish or the Swedish spoken in Finland (Donera Prat), and is thinking of including
Views on CLARIN, ‘Donate Speech deposited the material with CLARIN for several reason. First, its licensing policies: CLARIN has readily available licensing and rights
The Language Bank of Finland – Kielipankki/FIN-CLARIN: Krister Lindén, University of Helsinki; Mietta Lennes, University of Helsinki; Tommi Jauhiainen
Donate Speech campaign, btn-arrow-circle

plWordNet 3.0 – Słowosieć 3.0

plWordNet 3.0 – Słowosieć 3.0

plWordNet is a lexico-semantic network which reflects the lexical system of the Polish language. plWN currently contains 178 000 nouns, verbs, adjectives, and adverbs, 259 000 word senses, and over 600 000 relations and 240 000 inter-lingual relations between lexical units. It is now the largest wordnet in the world and is still growing.

Senses in plWordNet are interconnected by relations. In the resulting network, each word is defined implicitly in reference to other words. For example, samochód 'car' is a kind of pojazd drogowy 'road vehicle'; it is a whole consisting of silnik 'engine', spryskiwacz 'windscreen washer', podwozie 'chassis' and so on; its close counterpart is the colloquial fura 'wheels'.

Among plWordNet's numerous applications there is its use as a Polish-English and English-Polish dictionary -- the effect of mapping onto Princeton WordNet (the first and for many years the largest wordnet in the world). plWordNet is also an important resource in natural language processing and in artificial intelligence research. For example, it is used by Google Translate for the purposes of machine translation.

The University has made plWordNet available free of charge for all applications, including commercial ones, on a licence modelled on the Princeton WordNet licence. Users may browse plWordNet via mobile version and via WordNetLoom-Viewer (application enabling display of plWN entries), as well as download source files. Programmers may access plWordNet via Web service.

We provide (currently only in download version) 31 000 lexical units marked with their sentiment values: positive, negative, ambiguous or neutral.

 

CLARIN Centre
CLARIN-PL
Project leader
dr. Maciej Piasecki
Contact email
Acknowledgements

Wroclaw University of Technology, Ministry of Science and Higher Education (Poland)