CLARIN-DK presents: Teaching the teachers – an interactive workshop for the Voyant Tools

Submitted by Jakob Lenardič on 14 May 2019

Blog post written by Lene Offersgaard and Dorte Haltrup Hansen, edited by Darja Fišer and Jakob Lenardič

Digital methods are only slowly gaining ground in the teaching of literary studies in Denmark. While many lecturers are interested in introducing digital methods to their students, they often lack the knowledge of existing tools. From previous workshops, CLARIN-DK learned that neither traditional NLP tools like lemmatizers, POS-taggers, and named entity recognizers, nor simple command line scripting, were suitable in such teaching scenarios. This is why CLARIN-DK started to explore other technologies, such as data visualization tools that could serve as a better and easier entry point to the use of digital methodologies for non-computational researchers and teachers.

We opted for Voyant Tools, introduced to us by information specialists from HUMlab – a datalab at the Copenhagen University Library. Voyant Tools is an online environment that performs automatic text analysis with functionalities such as word frequency lists, frequency distribution plots, and KWIC displays (Figure 15). CLARIN-DK and HUMlab have organized several interactive workshops presenting the use of this environment to lecturers and researchers at the Faculty of Humanities at the University of Copenhagen. CLARIN-DK hosted a dedicated event at the Department of Nordic Studies and Linguistics on 21 November 2018, which was attended by 12 teachers and researchers.

Figure 1: the Voyant Tools

In order to tailor the events to the needs of the participants, CLARIN-DK asked some of them in advance which literary works were most relevant to be showcased and which research questions could be investigated and discussed during the events. They opted for novels written around the Modern Breakthrough period, an era in the Scandinavian literature which started at the end of the 19th century and in which Naturalism replaced Romanticism. The Archive of Danish Literature ( provided a collection of 54 novels. The novels were preprocessed and uploaded to a local instance of Voyant Tools by the CLARIN-DK team and information specialists from HUMlab.

A research question addressed the use of terms before and after the Modern Breakthrough (1870 – 1890). If it was possible to visualize changes in the use of, for example, terms for emotions (like love) which are typical for the Romanticism period compared to the use of more concrete terms (like work) which should be more common in the Naturalism novels. Using the Trends tool in Voyant (Figure 16), it was found that the term for love is used relatively more often before 1875 than after 1888. Moreover, the term for work is not used before 1875 in the novels, while it was used after then. Therefore, the use of these terms indicates that there is a shift in the use of common themes around the Modern Breakthrough. However, by using this simplistic method, it is impossible to differentiate novels representing the Modern Breakthrough.

Figure 2: The chronological distribution of the terms love vs. work for the period between 1826 and 1899 with regard to 54 novels

We therefore investigated if other tools in Voyant could also confirm the differences between the two literary periods. In the ScatterPlot tool it is, among other things, possible to visualize the results of document similarity analysis. Figure 17 shows the document similarity using the TF-IDF frequency count for all novels in the corpus. In the figure, the novels by Herman Bang and a few novels by Sophus Schandorph are clearly separated from the other works. The novels from the late 19th century of these two writers are considered representatives of the Modern Breakthrough. It was now up to the researchers to interpret the similarities in the other groups of the scatter plot and from there to pose more research questions.

Figure 3: Novel similarity based on TF-IDF counts.

In this and other workshops, the participants soon realized that studying texts through isolated words (word forms) was limiting, and there was a clear need for lemmatization. Moreover, the need for PoS-tagged texts became evident since some researchers were interested in investigating adjectives showing emotions, while others were interested in analysing events, requiring the automatic extraction of verbs. Despite this, Voyant Tools has proved to be very illustrative and useful to get a first quantitative overview of a collection of novels, and it allowed the comparison of two or more novels.

As a follow up to this event, the CLARIN-DK team will organize a workshop introducing corpus tools and corpus querying techniques in linguistically annotated texts for Literary Studies. The event will also showcase how automatic linguistic annotations are performed on texts from before and after the Danish orthographic reform of 1948, and discuss how it is possible to circumvent problems encountered when applying NLP tools developed for contemporary texts to older texts.

Click here to read more about Tour de CLARIN