Tour de CLARIN highlights prominent User Involvement (UI) activities of a particular CLARIN national consortium. This time the focus is on The Netherlands and Melvin Wevers, a Digital Humanities researcher focusing on the study of cultural-historical phenomena with the use of computational means. The following interview took place via Skype and was conducted and transcribed by Jakob Lenardič.
1. Can you please briefly describe your research background and tell us how you became a Digital Humanist who uses computational approaches to studying cultural phenomena?
I have a pretty diverse research background. I started out in the Social Sciences studying psychology and after I received my degree in 2006 I actually discovered that I didn’t want to be a psychologist but would rather do something that is much more based in the humanities such as researching culture. Since I’ve always been interested in American culture specifically, I applied for a Master’s track in American Studies at Utrecht University. What I really liked about this field is its methodological variety in the sense that it combines elements from historical studies, media studies and literature. This kind of multidisciplinary approach made me become very interested in research and I decided to pursue a PhD after receiving my MA in 2009. However, I couldn’t find a suitable PhD at first so I started studying Cultural Analysis at the University of Amsterdam, which was based more on quantitative methods, and at the same time began working at a software company that also used text mining, which sparked my interest for language technologies. Then I saw an advertisement for a PhD position at the University of Utrecht for using computational methods to research how American culture was represented in Dutch Newspapers throughout the 20th century. I thought that this was a perfect opportunity for me, so I applied and began my career as a Digital Humanist!
2. How does your research benefit from the CLARIAH-NL infrastructure?
My PhD was funded by the Dutch Science Organisation and the data that we used were provided by the National Library of the Netherlands. Though the project itself wasn’t directly linked to CLARIAH-NL, I met a lot of people affiliated with CLARIAH-NL, like Arjan van Hessen and Franciska de Jong, at various conferences that I attended during the course of my studies. They pointed me to events organized by CLARIN consortia. These tutorials made it much easier for me to learn programming languages like R and Python and I met a lot of my future colleagues with whom I could discuss my work in relation to source criticism or tool criticism. For instance, through the CLARIAH-NL consortium I learned that the German DARIAH was organizing a tutorial on topic modelling. I learned a great deal about specific algorithms related to topic modelling that I would later use for my PhD. I think that CLARIAH-NL serves as an essential network that makes it significantly easier for researchers working in different fields to collaborate, especially since people like Arjan and Franciska put so much effort into helping novice researchers build the much needed connections.
3. In your PhD thesis “Consuming America”, you’ve applied a quantitative approach to a socio-historical study of how American consumer culture was depicted in the Netherlands throughout the 20th century. What inspired you to start researching this topic? Could you briefly describe your approach as well as the main findings of your research?
My PhD was part of a very large project funded by the Dutch government called Translantis which focused on determining how the United States was perceived in Dutch public discourse throughout the 20th century. My role was to focus on consumer goods such as Coca Cola and cigarettes in order to determine how American cultural values were portrayed in Dutch newspapers. This kind of research allowed me to gain a very multifaceted understanding of how Dutch people reacted to notions such as modernisation and globalisation through their perception of consumer goods. Since I’ve had a lifelong interest in all aspects of American Culture—especially its international impact—I felt that this kind of research was a perfect opportunity for me.
In my approach, I combined the close reading of a more traditional historian with data-driven computational methodology. I first looked at a number of specific newspaper articles to get a very general feel of what the Dutch people thought about American culture at different times throughout the 20th century. Then I used quantitative methods like topic modelling on millions of newspaper articles to see whether such perceptions as reported by these newspapers constituted broader trends in Dutch history. In American Studies there is a deeply-entrenched idea that the 1950s and 1960s were a turning point during which American influences started becoming pervasive in the Netherlands—in other words, there is an idea of an American cultural invasion after the 1950s. However, by focusing on the depiction of consumer goods in newspapers I was able to show that the Dutch were already very much interested in and directly involved with American culture even before World War I. That is to say, the American influence in the Netherlands was relatively stable throughout the 20th century, so there was no specific mid-century turning point, as the Dutch had started to perceive themselves as modern consumers in the American sense from very early on.
In relation to a specific finding, Coca Cola was one of my case studies, and there I uncovered a very interesting dichotomy. In international advertisements, the Coca Cola company strove to advertise its product as global by omitting references to its American origin; however, in spite of this attempt, Dutch newspapers continued to overwhelmingly associate Coca Cola as something distinctively American. This in turn led me to uncover a major trend in Dutch public discourse, which is that the notion of globalisation became associated with Americanisation.
4. How does such a data-driven approach complement the traditional methods of a historian? Are there any specific advantages to such an approach?
After I finished my psychology degree, I lost interest in the quantitative methods of Social Sciences like statistics for a time and rather wanted to solely focus on the traditional methods in the humanities, such as close reading and reading against the grain. However, I soon became critical of the lack of empirical evidence in the humanities and I again became interested in data-driven approaches. Ultimately what I learned is that these two approaches need to be combined, since this greatly increases the breadth of the research questions that a researcher is able to ask. That is, I think that computational methodologies can greatly assist a historian, especially since they make it much easier to adopt a bird’s eye view of the periods that are being researched and thereby contextualize them properly as parts of the overarching historical trends.
5. Have historians working in your field generally embraced such quantitative methodology? Are there any changes that you would personally like to see take place within the field?
Unfortunately, in my field—that is, Cultural History—using computational approaches is still a very new endeavour, so there are many researchers who still outright refuse to use anything other than the traditional non-quantitative methods. This is understandable to an extent, since a senior researcher probably won’t find the time to learn how to program late in his or her career. However, I feel that if you want to train a future generation of humanities scholars, you should include courses on programming in the curriculum. Of course this is far easier said than done, since I think this would require a kind of paradigm shift where entire syllabi would have to be revised in order to explicitly define, for instance, how a programming language like Python can be used to tackle research questions in fields where it is not immediately obvious how to apply quantitative methodologies. Because what often happens in practice is that a humanities department has a course on Python, but there are no other related courses that would help students apply their programming knowledge to research problems directly applicable to humanities questions.
In general, my opinion is that there should be a marriage between distant reading and close reading in the humanities, so I would like to see a greater degree of collaboration between scientists from different fields, such as between historians and computational linguists. I’ve written some papers with people who have a better understanding of mathematics than I do. If I had been left solely to my own devices, I would have had to spend a lot of time learning advanced mathematics, which would in turn probably make me neglect the humanities part of my research question. However, since I know some programming and some mathematics, it is easier for me to communicate with people that are experts in these fields. Such communication has already resulted in some very worthwhile interdisciplinary collaborations.
6. In your opinion, what could CLARIN do to become more widely used by historians? What activities, resources or tools would be needed to achieve this?
In the Netherlands, I think that CLARIN is still associated almost exclusively with computational linguistics even though CLARIAH-NL tries very hard to branch out into other humanities disciplines. So I think that they should continue to organize tutorials and especially focus on showcasing how the various datasets that are already out there are relevant for various disciplines. For instance, there is a plethora of historical sources that have been digitized but many historians aren’t aware of the various exciting ways in which they could direct their research on the basis of the wide availability of these datasets.
7. You’ve been involved in the development of ShiCo, a tool for the analysis of how words denoting a certain concept change diachronically. Could you briefly describe how this project came about? What are the main advantages of ShiCo?
I have been interested in figuring out how words denoting a certain concept change over time but found that approaches such as topic modelling were too rigid to do this efficiently. I approached another PhD student, Tom Kenter, who specializes in Natural Language Processing and information retrieval with this problem and he came up with the idea to use a relatively novel technique to chart how these changes happen. We involved some other researchers working in the history department at the University of Amsterdam so that we could test whether the results of our first prototype were in accordance with their expert knowledge of the domains. Since the prototype was successful, we were encouraged by some of the professors to apply for a grant and turn the prototype into an interactive tool. By working with programmers from the eScience Centre, we eventually managed to turn the tool into ShiCo.
ShiCo allows researchers to find words associated with a certain concept in various historical periods in a much more dynamic way than in the case of topic modelling, for instance. In the latter approach, you always have to define the terms you’re looking for in advance. If you’re working with diachronic corpora, you find that a certain word unexpectedly becomes associated with a new concept over time—this happened, for instance, with the word propaganda, which was originally synonymous with advertisement and was thus a relatively neutral term but, in the 1950s and 1960s, it suddenly became associated with political concepts like socialism and communism. If you’re doing topic modelling, you are forced to create a separate topic model for each new concept that you encounter. ShiCo, on the other hand, allows you to branch off your diachronic search on-the-fly to focus solely on these new topics. It was only through the use of ShiCo that we were able to determine that the newer marked use of propaganda was associated by the Dutch with communism but not for instance US influence.
8. Can you highlight any other project that you’re currently working on?
After finishing my PhD, I became a post-doc researcher at the National Library of the Netherlands, where I applied techniques from the field of Computer Vision to gain insights into non-textual trends in the Dutch advertisements landscape. The research produced a dataset of advertisements as well as a tool called SIAMESE to find visually similar images in a large corpus of advertisements. Currently I’m working on applying computational methods to analysing Dutch academic historical journals and thereby determine the trends related to the understanding of history on the part of Dutch historians – for instance, how notions like progress and modernity are discussed and which countries were in focus over different periods of time.
9. What is your vision for the future of CLARIAH-NL and Digital Humanities in the Netherlands?
I believe that computational methodology should become part and parcel of all kinds of disciplines and that Digital Humanities should, at a certain point, lose the modifier digital and become the standard way of doing humanities research. I think that CLARIAH-NL can play an important role in bridging the gap between these different fields, especially by offering interactive tutorials and ensuring interoperability between repositories and tools. Like I said previously, one of the problems is that researchers do not know what to do with the available data so CLARIAH-NL could offer these much-needed guidelines and training, as well as educate researchers on concepts like open science so as to ensure that their work is as transparent as possible.
Click here to read more about Tour de CLARIN