The 2021 Steven Krauwer Award for CLARIN Achievements was awarded to Tomaž Erjavec (Jožef Stefan Institute / CLARIN.SI) for his outstanding contributions toward CLARIN goals.
Tomaž Erjavec is an Associate Professor for the field of language technologies at the Faculty of Arts at the University of Ljubljana. He has worked at the Jožef Stefan Institute, now as senior researcher at the Department of Knowledge Technologies, since 1984. Previous positions were at the University of Edinburgh, the University of Tokyo and at the EU Joint Research Centre in Ispra, Italy. He has taught at the Jožef Stefan International Postgraduate School, the Faculty for the Humanities at the University of Nova Gorica, and the Faculty for the Humanities at the University of Graz. He has supervised several PhDs and has served as a member of several Masters and PhD committees at home and abroad.
His research interests lie in the field of computational and formal linguistics and language technologies, especially in the compilation and annotation of language resources. A large part of his work is devoted to the Slovene language: he has collaborated in the production of most of the Slovene reference corpora and many specialised corpora. In the past years, his work has also included the digital humanities – he is active in the area of complex digital editions and digital libraries, and bridging the gap between humanities research and computer science. A continuing thread of his work is the emancipation of the Slovene language through the compilation of language resources and tools, and enabling free, open and stable access to such research data and programs, as well as the written cultural heritage of Slovene. He is the first-ranking researcher for the field of linguistics nationwide in terms of the number of citations and the h-index.
Recently, Tomaž Erjavec participated in the CLARIN -funded ParlaMint Project. This ambitious data engineering task included both creating a multilingual set of uniformly annotated corpora of parliamentary proceedings, as well as processing the corpora linguistically to add syntactic structures of Universal Dependencies and Named Entities annotation. He invented the interoperable annotation format used for the corpus based on Parla-CLARIN recommendations, created validation schemata and conversion scripts, and managed the repository and distribution of the resulting datasets. This unique data collection presents a crucial milestone for research in the digital humanities and political sciences.
Tomaž Erjavec's contributions to Parla-CLARIN (framework for encoding corpora of parliamentary proceedings) and ParlaMint establish an innovative strategy for handling and processing parliamentary data. Its novelties relate to the proper and unified handling of cross-lingual and across-parliament comparable data, and to making this data uniformly available. The ParlaMint framework developed is becoming a de-facto standard for national parliamentary data and will be further developed to cover more detailed and specific metadata across languages and parliaments.
The visibility of Tomaž Erjavec’s work goes beyond ParlaMint – he is maintaining the certified CLARIN.SI repository, which currently contains more than 200 language resources and tools, or approximately 200 GB data for 80 languages, 65 of which were (co-)authored by Tomaž Erjavec himself. Besides his proficiency in language resources and evaluation, linguistic standards and parliamentary corpora, what makes Tomaž Erjavec a great colleague, according to the jury, is his knowledge and scientific professionalism, commitment, sense of humour and confidence, which makes working with him a real pleasure.