Tour de CLARIN highlights prominent User Involvement (UI) activities of a particular CLARIN national consortium. This time the focus is on Latvia and Sanita Reinsone, a leading researcher at the Institute of Literature, Folklore and Art at the University of Latvia. The interview was conducted via Skype by Jakob Lenardič.
1. Please describe your research background. What sparked your interest in Digital Humanities?
I work at the Institute of Literature, Folklore and Art at the University of Latvia, where I research life writing, oral history and digital participatory practices. I hold a PhD in philology, which I obtained in 2012 from the University of Latvia. I am leading several Digital Humanities and Cultural Heritage initiatives at the Institute. This gives me the opportunity to collaborate with passionate and talented researchers from diverse fields such as folkloristics, literary studies, music and theatre research, as well as history and linguistics. Concretely, my work mainly involves development and curation of different crowdsourcing initiatives within Digital Humanities and Cultural Heritage.
I became interested in digital approaches to Humanities when I was a first-year philology student and started working at the Artificial Intelligence Laboratory of the Institute of Mathematics and Computer Sciences at the University of Latvia, the same team who is now heading CLARIN Latvia, where the creation of some of the first Latvian corpora was already underway at the time. I helped with digitising Latvian literary classics and folklore publications, from which I learned how digital methods can be used in the study of cultural heritage. This experience served as a foundational background for my future research at the Archives of Latvian Folklore(part of the Institute of Literature, Folklore and Art), since digital methodologies were not taught at the University at that time.
2. You are a leading researcher at the Institute of Literature, Folklore and Art at the University of Latvia. What does it mean to apply a Digital Humanities approach to folklore? Could you give a concrete example of how folklore studies can be complemented by such an approach?
A digital approach to folklore collections essentially means that we are able to work with tools that can automatically analyse unstructured collections and provide new ways of visualising, indexing and classifying the texts and other type of folklore material. Implementing such an approach has immensely sped up the initial process of collecting and sorting the data, which in our field are very diverse in terms of material type. It is now easier than ever to answer complex research questions, as computation tools allow us to examine textual and stylistic variation observed in different periods and dialects or the geographical distribution of vernacular expressions in a precise and very time efficient manner. This has only become possible now that our digitized folklore texts are enriched with metadata such as geographical information and interlinked with other materials in the Archives, such as photographs and sound recordings.
For instance, Sandis Laime, who is a post-doctoral researcher at the Archives, has used geospatial analysis tools to examine the geographical distribution of legends related to witches and witchcraft in Latvia. His research turned out to be of crucial importance for the better understanding of the historical aspects of the tradition as well. Before he started working on this topic, an opinion existed that witchcraft beliefs were more or less uniform in the whole country and did not differ much from the European tradition. However, Sandis was able to prove, by using digital methods, that the Latvian witchcraft belief system is not at all as homogeneous as previously believed. Along with the character of the diabolised witch, which was present in most parts of Latvian at the turn of the 20th century, he was able to determine several conservative areas in the peripheral regions of the country which were not influenced by Christian demonology. Finally, the geographical aspect of the research turned out to be historically significant.
3. How does the collection, processing, analysis and archiving of research material in folklore differ from other DH disciplines? What are the main obstacles with respect to technologies applied to your material? How could CLARIN be of help in this respect?
Since 2000, my research has mostly focused on oral history. Related to this, together with my colleagues I’ve recently launched an initiative at the Archives to create the Autobiography Collection, which consists of written diaries, memories and life stories. Such materials are relatively unstudied phenomena, at least in comparison to oral life stories. They provide a very personal insight into the lives of ordinary people and a direct perception of a historical event, especially since these texts are not written and edited by professional biographers. There are often spelling mistakes for example, but this just means that we’re dealing with pure, unaltered text that provides a unique and often colourful perspective. The textual materials of the Autobiography Collection are very diverse, possibly more so than in other disciplines. Life writing texts such as diaries are often accompanied by additional contextual materials such as photographs and audio interviews with the authors or contributors.
In general, the Archives of Latvian Folklore hold a lot of the dialectal speech and the quality of older sound recordings often isn’t the best, which is then a problem for speech-to-text transcription software. The archiving of the material is also very challenging because of its diversity, so we are planning on future collaboration with the Latvian CLARIN consortium to streamline the collection and digitization process. Additionally, vernacular expressions are often used on social media these days – in a way, such informal language is a type of modern folklore – and I believe CLARIN could provide us with help to mine such data.
4. Does your Institute collaborate with the Latvian CLARIN consortium in the digitization of folklore and the curation of digital folklore archives?
We collaborate with CLARIN Latvia at two levels. The first, of course, is the personal level, which means that we often consult with their experts on how to use a specific language tool or resource. The second, which I think is crucial, is the institutional level. This involves communication on how to improve our Archives’ infrastructure and align it with CLARIN standards.
At the Archives, we are currently creating a corpus of life writing. First, however, we have to reach out to the general public and get in touch with museums and other archives in order to get the materials. In relation to such outreach, we already have close cooperation with the CLARIN Latvian team, as we have successfully organized several awareness raising and knowledge sharing events for researchers and students of Humanities and Social Sciences. I see that such educational initiatives are appreciated and very much needed, since they provide direct showcases on how language tools and resources can be applied within qualitative research and bridge the gap between computational experts and Humanities and Social Sciences researchers. For instance, one such successful event was a Digital Humanities workshop which members of our institute and the National Library of Latvia organised together with CLARIN Latvia. The interest was unexpectedly high, and we couldn’t provide enough seats for everyone who wanted to attend.
For the future, we very much look forward to incorporating some of CLARIN Latvia’s automated services for language processing at the Archives. We especially want to implement their tools for speech-to-text transcription and the automatic annotation of spoken data, since conducting interviews with informants can be a very laborious process if you have to do the transcriptions by hand. I would also appreciate an automatic image annotator, given the very large number of photographs in the Archive.
5. Your Institute has also been successfully involved in crowdsourcing. Could you please describe this? Why is crowdsourcing important for Digital Humanities?
The crowdsourcing initiative began five years ago, when we set up our Archive’s online repository. We were faced with a very large number of handwritten manuscripts that were not yet converted to a computer-readable format. Since we wanted the Archive not only to be openly accessible, but rather also involved with the general public, we decided to reach out and find volunteers who would be willing to transcribe the manuscripts, which were made available on the platform.
In the first year, the volunteers managed to transcribe around 1000 pages of handwritings. This wasn’t a very large number, but at that point we had not yet managed to fully promote the initiative, since we were mostly focusing on the further development and maintenance of the repository. Soon after, we started collaborating with the Latvian branch of UNESCO, and together we launched a special outreach campaign with which we invited schoolchildren to participate in transcribing the handwritten texts. It was a wonderful experience that lasted for a little more than 2 months. During this relatively short period, schoolchildren managed to transcribe around 15,000 pages which is a lot of text, especially in comparison to the first round. This inspired us to continue with the initiative, which gradually built an active community of transcribers who are passionate about our materials. They regularly communicate with us and send us helpful suggestions for potential future implementations to the Archives. A concrete result of our collaboration with the transcribers is that we managed to establish a new and improved online platform for transcription which is very user friendly and minimises the need for technical knowledge – the volunteers only need to log in, select one of the 10 languages that the manuscripts are in, and then immediately begin transcribing one of the manuscript pages. There is also an option to add comments to the text, which further solidifies our collaboration.
I think the reason as to why this crowdsourcing initiative has been a success is the fact that many people take pride in their local lore. Perhaps what’s important here is that folklore does not only encompass such genres as folk tales, legends and folk songs; it also includes a lot of regional knowledge and memories of the old ways of life and traditions in rural areas that are disappearing from the modern world. Hence the reason why many people are so willing to engage with our materials.
In addition, we have recently started several other crowdsourcing initiatives. For instance, a children’s poetry reading campaign, in which we invited society to the database of Latvian literature to read poems out loud, record their voices for the enjoyment of future generations and for research. The poetry chosen for this project was written at least one hundred years ago by well-known poets, loved by many generations, and also lesser-known poets worthy of attention. This initiative, which was also supported by the National Library of Latvia, was very successful in that it basically led to the creation of a speech corpus of poetry, which we now use to study the different ways in which poetry is read; that is, the different manners, whether it is recited or sung, and so forth. Another initiative, which will be launched on 15 February, is called Sing with the Archive, with which we aim to popularise the musical recordings of the Archives and to collect modern musical versions that will be performed by the participants. Additionally, a campaign called Contemporary Calendar invites society to record their special calendar events and thus help researchers to study the contemporary ritual year.
6. How can research infrastructures such as CLARIN benefit from crowdsourcing?
I believe that CLARIN-related research could also be complemented by crowdsourcing, especially if it involves, for instance, building a spoken language corpus. In order to ensure that such a corpus is representative of the spoken language, it should also contain dialect samples. I think it wouldn’t be too difficult to motivate people to provide their own recordings, given that a person’s dialect is part of his or her personal identity much in the same way as history and folklore are. What’s crucial, though, is that CLARIN should focus on making their tools, platforms and interfaces as user-friendly as possible, which means that CLARIN experts should actively engage with the external community, be they established researchers or passionate amateurs, and try to meet their needs and expectations. As the success of our own crowdsourcing initiative shows, communication from both sides goes a long way to establishing fruitful cooperation.
7. How are the Digital Humanities represented in Latvian research institutions and universities?
Digital Humanities in Latvia is fairly new. Although computational linguistics has quite a long tradition in Latvia, other disciplines have only recently started to adopt digital methods. I think that crucial to its promotion in our country is the digitalhumanities.lv initiative, which involves voluntary collaboration among research institutions like our Archives and CLARIN Latvia. The initiative is currently organising the 2019 Baltic Summer School of Digital Humanities. In 2018, Riga Technical University launched the first Master’s programme in Digital Humanities, which has turned out to be quite popular among students. In addition, the Faculty of Humanities at the University of Latvia has started to offer foundational courses in Digital Humanities which often get filled to full capacity. Generally, I think the younger generation is keen on learning how to apply digital methodologies in their work or use them in their studies even if they come from traditionally non-digital fields like history or philology.
For the future, we plan on further collaborating with other Latvian research institutes like Riga Technical University and the National Library of Latvia to promote Digital Humanities, computational linguistics and computational folkloristics in Latvian Universities, and plan on including additional subjects in school curricula.
8. How in your opinion could CLARIN Latvia help promote computational methods and the use of research infrastructures in traditional fields such as your own (i.e., folklore studies)?
I think educational activities should be a major priority for CLARIN Latvia at this stage as this is the most efficient way for experienced and novice researchers to learn how to integrate the CLARIN infrastructure in their own work. What is more, such activities can also spark new collaboration opportunities among researchers from different disciplines.
Another important topic on CLARIN’s agenda in my opinion should be copyright issues. For example, many of the materials in our Archives are challenging from the perspective of copyright, since collecting life writing such as diaries and memoirs means that we store a lot of personal and sensitive data. Although we try to be very rigorous in securing copyrights and discussing this with our informants, it would be very helpful if there were more joint discussions about the legal implications related to the creation and maintenance of such collections, as there are many other institutes who are dealing with materials that fall into a kind of legal grey area. This is why I think CLARIN could be very helpful in this respect by providing researchers with some helpful and easily reusable scenarios and guidelines.
Click here to read more about Tour de CLARIN