Tour de CLARIN: Interview with Dr Maciej Maryl

Submitted by karolina@clarin.eu on 5 February 2018

Tour de CLARIN highlights prominent User Involvement (UI) activities of a particular CLARIN national consortium. This time the focus is on Poland and Dr Maciej Maryl, the Deputy Director of the Institute of Literary Research of the Polish Academy of Sciences. The following interview took place via Skype and was conducted and transcribed by Jakob Lenardič.

1. Could you please briefly introduce yourself? What inspired you to start studying literature and to take an empirical approach toward it?

I became interested in applying empirical methodologies of Social Sciences to literary studies as an MA student at the University of Warsaw. I have tried to quantify the way people read and approach texts, which eventually led me to computational methods. My PhD, which I defended in 2013, was dedicated to the influence of electronic media on literary communication.

2. How did you get involved with the Polish CLARIN consortium? Are you currently collaborating with them?

In 2013, when I was in the process of setting up the Digital Humanities Centre at the Institute of Literary Research at the Polish Academy of Science, I was introduced to Maciej Piasecki, who is the coordinator for CLARIN-PL. At the same time, the Institute was organizing the first THATCamp (The Humanities and Technology Camp) in Warsaw, so I invited him to present the tools developed by the Polish consortium. I was inspired by his talk and wanted to use the tools in my analyses of weblogs that I was conducting at the time. This in turn led to a very fruitful collaboration between CLARIN-PL and our Institute which goes on to this day. We have successfully cooperated on quite a number of projects. To name a few, one ongoing project involves the creation of the Literary Map, in which geographical information that appears in Polish literary texts is mapped onto Google Maps. Another project is LEM (Literary Exploration Machine), an online system that brings together various tools dedicated to processing and analysing literary texts. We have also started two lexicographical projects. One aims at creating a dictionary of Polish Romantic poets, using CLARIN-PL tools and WorldNet, while the other, in cooperation with many institutions, is dedicated to linking together various historical dictionaries of Polish on a single platform. In most cases, we help develop the tools that CLARIN-PL had already created, providing the expertise and needs of our field. This helps to establish a productive feedback loop between developers and users.

3. Which CLARIN services would you recommend to your colleagues working in Literary Studies?

I would especially recommend LEM. One of the biggest problems of novice literary scholars who want to conduct computational research is the lack of expertise in using linguistic tools. In other words, novice researchers are faced with elaborated and sophisticated tools which are simply overwhelming, especially for researchers without computational background. LEM helps researchers to overcome this problem because it pools together a variety of tools into a single workflow and supplements them with detailed description, which makes them user-friendly even for beginners. Work on LEM is an ongoing process and we are currently planning new features like topic modelling and description of case-studies which will enable a better understanding of tools.

4. Your website says that you are involved in the following projects – The Polish Literary Bibliography and Blog as a new form of multimedia writing. Could you describe them? How do they benefit from the CLARIN infrastructure?

The Polish Literary Bibliography is an ongoing project we run in cooperation with Poznań Supercomputing and Networking Center. We use CLARIN-PL’s INFOREX to extract structured information from scanned volumes of bibliographical records and incorporate them into a multipurpose online research platform. We are aiming at extracting bibliographical data from printed volumes ranging between 1945 and 1988 and we are currently trying to work around some problems such as the low quality of print, which makes parsing more difficult.

Blog as a new form of multimedia writing is actually the project that marks the beginning of my collaboration with CLARIN-PL. Together with the Polish consortium we worked on the tools used to classify weblogs on the basis of their genre. To give some background, what we did at first at the Institute – that is, before involving CLARIN-PL – was to draft a typology of weblog genres based on a systematic, qualitative analysis of actual texts[1]. We then started the cooperation with Maciej Piasecki in order to corroborate our findings with computational methods. We applied various clustering methods, using tools like CLUTO and CLARIN-PL’s stylometric system WebSty to see whether they would group the weblogs together in accordance with our proposed typology. Together with Maciej Piasecki and Ksenia Młynarczyk we have written an article dedicated to combining close reading with distant reading on the basis of CLARIN-PL tools.

However, weblogs are tricky when it comes to the application of computational methods. The main obstacle is that individual blogs are far from homogenous in terms of style and other linguistic characteristics, as they consist of many different posts. So, the classification of genres did not yield satisfactory results – we were most successful with cooking blogs, which are characterized by very specific language. That is why we currently work on shifting the unit of analysis from entire blogs to individual posts, in order to get more accurate results.

5. What are the main advantages of taking a Digital Humanist approach to literary history? Can a quantitative approach help uncover answers to more traditional questions that are at the heart of literature, such as the political and sociological aspects of writing, the value of the literary canon etc.?

There seems to be a consensus in the field that the application of computational methods actually involves a two-fold approach. First, computational tools and methodologies may be used to corroborate existing claims in the field, i.e. to see if we can arrive at similar results with empirical methodologies. And this is what we are doing in the blog project right now. Second, once we establish that our computational approach yields significant results, we may use it to uncover aspects of writing which are too difficult to assess by means of traditional non-computational methodologies, such as the problems of authorship or language change in literary history.

For me personally, a computational approach is important because it allows me to see a wider picture of the research field. However, we should not take computational results for granted. What I believe is crucial in using DH tools is that at some point we should return to actual texts in order to understand the computational results fully. In other words, the main advantage of working in Digital Humanities is the multifaceted approach that combines distant reading via the computational tools with the close reading. I think that both methodologies should be intertwined in a research workflow.

6. Can you discuss how the Internet has shaped the contemporary literary scene, esp. that of Poland? How do literary historians and critics, esp. in your country, evaluate new forms of writing, such as fictional stories published through non-traditional media like blogs and forum, in relation to the older traditional printed forms?

There is of course a division between researchers who are dedicated to solely working with traditional texts and a relatively smaller group which also focuses on digital writing. However, I do think that more and more studies are beginning to focus on new textual phenomena and sooner or later we just have to research them together as it is hard to talk about contemporary literature if you disregard digital writing. For instance, weblogs became popular in Poland around 2006, so slightly more than 10 years ago, and at first they served almost exclusively as a social medium through which people tried to connect with friends or write about their lives. However, blogs evolved over the years – partly thanks to social media which took over the function of the main platform for personal communication – and to some extent they now resemble print media like magazines, newspapers or books. In this process weblog genres have crystalized and now serve as a very interesting research object, especially given their accessibility for computational analyses, as one does not have to digitize them beforehand.

As to actual fiction writers, there has also been some change in the way writers make use of digital communication. When I started doing my research for my PhD thesis around 10 years ago, a rule of thumb was that the more popular a writer was the more limited online presence he or she maintained. Popularity meant access to mainstream media and that used to be enough not so long ago. Nowadays, there are many very successful writers who use Facebook or run their own blogs to cultivate relationship with readers. When it comes to actual experiments with literary form – such as electronic poems or interactive novels – there are many examples of interesting texts but the majority of writers remains quite conservative and tends to stick to traditional forms. As the popular interest in interactive narratives is captured by computer games, in literature there seems to be greater demand for traditional, stable, linear and finite narratives. Perhaps this is, as Umberto Eco observed 20 years ago, a real power and value of literature in the times of interactivity – it provides narratives that cannot be manipulated according to the readers’ will.

7. How do your students and fellow researchers embrace the Digital Humanist approach? How are Digital Humanities in general represented in the Polish academic environment?

There are still quite a few scholars who think that using computational approaches shifts your attention from the actual texts to the linguistic surface. I actually believe that this kind of scepticism in Polish academia is quite a widespread phenomenon due to the idea that digital approaches are reductionist and ill-suited for addressing the “big”, critical questions of literary studies. But we shouldn’t forget that similar reservations have been formulated against empirical approaches in the humanities probably since the birth of antipostivism. So, we should have probably got used to it by now. However, in the last five years, Digital Humanities have begun to flourish in Poland, entering the phase of institutionalisation. DH research centres were established, and researchers established the CLARIN-PL and DARIAH-PL consortia. CLARIN-PL is especially very eager to bridge the gap between computational experts and humanities users, organising hands-on workshops for researchers and translators. So, I expect the body of DH research to grow, but let us not fool ourselves – it is not going to be mainstream.

What we need is more DH courses at universities, so the base of DH practitioners could steadily grow. Obviously there are many courses in linguistic departments, but we should also reach out to students of history, literature, and cultural studies. My Institute has just received a grant to start a graduate program on digital literary studies to enable the studying of literature with the help of digital methods and technologies at the PhD level. This program will be carried out in cooperation with Polish-Japanese Academy of Information Technology, which is also a member of CLARIN-PL.

8. What would you recommend CLARIN to do in order to attract more researchers from your community? How do you envision the future of the Polish CLARIN consortium?

CLARIN-PL is already very active in terms of attracting new researchers. It has already organized a series of workshops and we are proud to have hosted the first CLARIN-PL workshop in 2015. However, I believe a more structured approach to outreach is needed – that is, a long-term involvement of users through a more established educational program that would complement the workshops with something like additional online courses. Such a program could maintain researchers’ interests after the events. We also need to continue making the interfaces of the tools more user-friendly, with better documentation guiding users through the research process. What could also help is a presentation of successful case studies from a variety of fields that could serve as a guidance for further research.

The future that I envision for CLARIN-PL is one where more and more new researchers join its users’ network. One of the best things about the consortium is that it always addresses the needs of the end user. I think it can only be a good thing if more institutions and individual researchers who want to perform computational analyses but lack tools or expertise reach out to CLARIN-PL.

[1] Maryl, M., Niewiadomski, K. and Kidawa, M. (2016). “Empirically Generated Typology of Weblog Genres”. CLCWeb: Comparative Literature and Culture, 18.2., June 2016.

Click here to read more about Tour de CLARIN