Blog post written by Petra Kralj Novak, Jasmina Smailović, Borut Sluban and Igor Mozetič; edited by Darja Fišer
Emoji are Unicode graphic symbols, used as a shorthand to express concepts and ideas and can play an important role in the social media text analytics. In 2015, Petra Kralj Novak, Jasmina Smailović, Borut Sluban and Igor Mozetič from the Jožef Stefan Institute in Ljubljana, Slovenia released the first emoji sentiment lexicon, called Emoji Sentiment Ranking 1.0, and published it as a resource in the public language resource repository CLARIN.SI. With 78,500 downloads to date, the lexicon is the most downloaded resource in the CLARIN.SI repository.
The sentiment of the emoji was computed from the sentiment of the tweets in which they occur based on the labelling of sentiment polarity (negative, neutral, or positive) of about 1.6 million tweets in 13 European languages by 83 human annotators. About 4% of the annotated tweets contained emoji. The sentiment score of each emoji was computed based on its probability estimate to appear in a tweet with each sentiment.
The process and analysis of the Emoji Sentiment Ranking is described in detail by Kralj Novak et al. (2015). The authors draw a sentiment map of the 751 emoji (see Figure 1), formalize sentiment and present a novel intuitive visualization of sentiment distribution in the form of a sentiment bar (Figure 2). Furthermore, they compare the sentiment of tweets with and without emoji, and find out that tweets with emoji tend to be more positive. They have also found differences between the more and less frequent emoji: the more frequently used emoji tend to be more positive. Another interesting aspect is the position of emoji in tweets: the more sentimental charge an emoji has, the more likely it is to appear at the end of tweets (see Figure 3). An exception is the soccer ball emoji, which is commonly used to replace a word but has a very positive sentiment associated with it.
Figure 1: Sentiment map of the 751 most frequently used emoji. The position of the emoji denotes its sentiment score and neutrality, while its size is proportional to the frequency of its usage. An interactive version is available here: http://kt.ijs.si/data/Emoji_sentiment_ranking/emojimap.html
Figure 2: The sentiment distribution of each emoji is visualized in form of a sentiment bar. http://kt.ijs.si/data/Emoji_sentiment_ranking/index.html
Figure 3: Emoji position in tweets. The horizontal axis represents the length of a tweet. The vertical axis represents the neutrality of the emoji: top for very neutral and bottom for very emotional, either positive or negative. Emoji that act as word replacements, thus positioned in the middle of the tweets, tend to have a neutral sentiment. The emoji that act as sentiment conveyers are more likely positioned at the end of tweets.
As a further analysis, the authors investigated whether the Emoji Sentiment Ranking can be considered as a universal language-independent resource, at least for European languages. They have made independent rankings of emoji sentiment for each of the 13 languages and showed that there is no evidence of significant differences between emoji sentiment between the languages.
The information about the sentiment of emoji can be used in the automated sentiment classification of informal texts. A basic distinction between positive and negative emoji can be used to automatically label positive and negative samples of texts. These samples can then be used to train and test sentiment-classification models using machine learning techniques. Such emoji-labeled sets can be used to automatically train sentiment classifiers. Emoji can also be exploited to extend the more common features used in text mining, such as sentiment-carrying words.
Kralj Novak P, Smailović J, Sluban B, Mozetič I (2015) Sentiment of Emoji. PLoS ONE 10(12): e0144296. doi:10.1371/journal.pone.0144296.
Click here to read more about Tour de CLARIN