The Project
In recognition of its outstanding quality, Jakob Lenardič’s PhD thesis was recently awarded best of the year 2021/2022 at the Faculty of Arts, University of Ljubljana. What made Lenardič's thesis stand out was the fact that he combined two different approaches: he used a robust theoretical foundation rooted in formalism and paired it with corpus-based methodology more often associated with functionalism.
Apart from its method, Lenardič’s thesis stands out for another reason: It provides an explicit compositional semantics for Slovenian grammatical structure related to the extended verbal domain. For English, many such structures have already been formalised. Consider the distinction between a passive sentence such as 'The door was opened', which necessarily entails some kind of event initiator, and 'The door opened', which has a wider meaning, namely 'The door opened by itself'. Lenardič explored the role that grammatical features play in meaning-making of such sentences. He did so by trying to formally capture a piece of syntactic structure, which likely necessitates the use of specific grammatical features that govern how event initiation is realised both syntactically and semantically (i.e, participle morphemes). In addition, Lenardič also focused on similar sentence constructions in Slovenian, studying grammatical voice, aspectual interpretation, and the interpretation of person and number features. This approach, he explains, has not been taken before in the case of Slovenian, so Lenardič’s thesis ‘is a bit foundational in this sense’.
Background
Lenardič holds a BA and MA in English literature and linguistics from the University of Ljubljana. When he started his PhD in 2016, he had little experience with computational linguistics or digital humanities (DH). Alongside his research, he was offered a job with Darja Fišer as an administrative assistant in the Department of Translation in 2016. That same year, Fišer was appointed Director of User Involvement at CLARIN . Over time, working together had an impact on both Lenardič’s understanding of DH and CLARIN, but also on his linguistic research. Lenardič explains: ‘Even though formally my main job concerned mostly CLARIN-related things such as Tour de CLARIN and CLARIN Resource Families until about 2020, in practice Darja also helped me pursue corpus linguistics research by getting me involved in relevant research projects at the national level, so my role slowly but surely shifted into that of a researcher that does both corpus and theoretical linguistics, often combining the two.’
In a nutshell, Lenardič’s thesis focused on two topics: First, he explored the pronominal system and case assignment in Slovenian. Second, he focused on the syntax-semantics interface of both English and Slovenian in relation to the so-called middle construction, which in English concerns structures like The book reads well.
Lenardič’s work is based on a syntax-only, formalist approach to grammar, which he claims is underrepresented in linguistics at the University of Ljubljana. More functionalist approaches, he feels, can be vague and speculative in describing interactive factors of context, in other words ‘fuzzy when they don’t need to be fuzzy’. In his view, it is a misconception that formalist approaches do not consider context, and he believes that corpus-based approaches to grammar could benefit from taking formal aspects and the associated methodologies into account. His thesis is evidence that combining the two approaches leads to outstanding work.
To explore his research questions, Lenardič used the tools developed at CLARIN.SI, such as the noSketch Engine concordancer, on corpora relevant to his research interests. Specifically, Lenardič went on to investigate two sets: Gigafida, which is the reference corpus for written standard Slovenian, and the corpora of the JANES family, which contains Slovenian computer-mediated communication on platforms such as Twitter and Facebook. Over time, and not least thanks to the expertise at CLARIN.SI, he developed more sophisticated skills using the noSketchEngine concordancer, which were essential for exploring the linguistic structures he was interested in.
Working with corpora was essential, as it helped Lenardič to infer subtle characteristics of Slovenian language structure that he says he would never have figured out ‘by resorting to [...] intuition alone.’ In his view, corpus work requires robust assumptions and should take an advanced approach to querying that goes beyond simple keyword searches, as this is crucial for a highly inflected language with pragmatic word order such as Slovenian.
Future directions - CLARIN and DH
Lenardič says: ‘In Slovenia, there is a sizable research community which does not seem to be aware of our national consortium and the services and wealth of data that it offers. Funding opportunities such as the Mobility Grants could be especially useful for young researchers.’ To spread the word, he recently led a face-to-face workshop as part of the JTDH 2022 pre-conference programme, which introduced both the CLARIN.SI and CLARIN infrastructures to PhD students in linguistics and the wider humanities.
Lenardič also plans to continue collaborating with CLARIN in other ways. He says: ‘I hope to continue with the CLARIN Resource Families, which became much broader in scope this year due to the project funding, where a couple of projects are already underway.’