Sentiment lexicons or BERT? A comparison of sentiment analysis approaches and their performance.

Grisot G, Rebora S, Herrmann JB, Pennino F (2022)
In: The Book of Abstracts of DH2022. ADHO (Ed); Tokyo.

Konferenzbeitrag | Veröffentlicht | Englisch
 
Download
OA 703.57 KB
Autor*in
Grisot, GiuliaUniBi ; Rebora, Simone; Herrmann, J. BerenikeUniBi ; Pennino, Federico
herausgebende Körperschaft
ADHO
Abstract / Bemerkung
With the development of new powerful technologies for computational data analysis, the opportunities for -- and interest in -- the detection of sentiments and opinion in texts have grown considerably (Liu 2015). Because of the vast amount of material available online, these investigations have focused mostly on textual material gathered from social media, making use of traditional corpus linguistic approaches as well as deep learning tools. Sophisticated sentiment analysis (SA) of literary texts, especially in languages other than English, is still in its infancy (Kim and Klinger 2019). This has depended partly on the limited amount of digital texts available, partly the complex structure of literary texts, and finally on methodological challenges, with skills needed that seldom form part of the training of literary scholars. Emotions are however central to the experience of literary narrative (Oatley 2012; Hogan 2016), and recent advances in their systematic, quantitative analysis have been made within computational literary studies. Yet, such investigations have mostly relied on existing lists of words associated with sentiment and emotion values, the so-called sentiment lexicons. While these remain conventional and useful tools, they can only provide limited insights to the representation of emotions in texts. Using a lexicon-based method, we have previously investigated emotions and sentiments in relation to the representation of landscape in Swiss literature, looking in particular at the differences between the rural and urban spaces portrayed in a corpus of Swiss novels written in German (see Grisot, G., Herrmann, J.B. (in preparation). The present paper takes a step forward, using manual annotation and advanced machine learning methods to train a fine-tuned model to recognise valence and arousal on a historical corpus. Our goals are higher levels of lexical coverage and validity when compared to our prior results obtained with sentiment lexicons. We describe here the current state of our method to detect sentiment using deep learning approaches. Using a language model trained on a large corpus (3000+) of German literary texts spanning from 1800 to 1950 (Fischer and Strötgen 2017), we make use of BERT word embeddings and manually annotated sentences to recognise sentiment. 500+ sentences were taken from three Swiss-German novels - one of which children's fiction - and annotated for valence - understood here as the degree of 'positivity' of the detected sentiment - and arousal - its 'intensity' or 'degree of activation' - by two trained student assistants using the same instructions. Currently, intra-class correlation coefficient (ICC) between manual annotators calculated on these sentences is 0.721 for valence and 0.606 for arousal. Scores for individual texts indicated preliminary evidence of a genre effect, with higher ICCs for the children's novel (valence 0.86; arousal 0.78 for 149 sentences) as compared to the other two, more complex, realist novels (valence 0.78, 0.62 arousal for 182 sentences; 0.51 for valence, 0.41 arousal for 198 sentences). The annotated sample was used to train a deep learning classifier, using linear regression to finetune the Literary German BERT model (https://huggingface.co/severinsimmler/literary-german-bert), which reached Pearson's r scores of 0.53 for valence and 0.63 for arousal. These scores are very promising, suggesting the possibility - provided more training data - of a full automation of the annotation task on our domain of historical literary texts. We are currently appending the annotation and at the time of the conference shall be able to update the results on a broader data base. While potentially taking automatic SA of German literary texts to a new level, our study also allows evaluating the performance of lexicon-based in direct comparison with deep learning SA approaches, thus allowing to gauge the validity of different SA methods on a data-driven basis. This approach also raises questions concerning the effect of genre on the ease and validity of manual sentiment annotations.
Stichworte
Sentiment Analysis; BERT; sentiment lexicons; Emotions; Swiss literature
Erscheinungsjahr
2022
Titel des Konferenzbandes
The Book of Abstracts of DH2022
Konferenz
International Conference Digital Humanities 2022
Konferenzort
Tokyo, Japan
Konferenzdatum
2022-07-27 – 2022-07-29
Page URI
https://pub.uni-bielefeld.de/record/2965612

Zitieren

Grisot G, Rebora S, Herrmann JB, Pennino F. Sentiment lexicons or BERT? A comparison of sentiment analysis approaches and their performance. In: ADHO, ed. The Book of Abstracts of DH2022. Tokyo; 2022.
Grisot, G., Rebora, S., Herrmann, J. B., & Pennino, F. (2022). Sentiment lexicons or BERT? A comparison of sentiment analysis approaches and their performance. In ADHO (Ed.), The Book of Abstracts of DH2022 Tokyo.
Grisot, Giulia, Rebora, Simone, Herrmann, J. Berenike, and Pennino, Federico. 2022. “Sentiment lexicons or BERT? A comparison of sentiment analysis approaches and their performance.”. In The Book of Abstracts of DH2022, ed. ADHO. Tokyo.
Grisot, G., Rebora, S., Herrmann, J. B., and Pennino, F. (2022). “Sentiment lexicons or BERT? A comparison of sentiment analysis approaches and their performance.” in The Book of Abstracts of DH2022, ADHO ed. (Tokyo).
Grisot, G., et al., 2022. Sentiment lexicons or BERT? A comparison of sentiment analysis approaches and their performance. In ADHO, ed. The Book of Abstracts of DH2022. Tokyo.
G. Grisot, et al., “Sentiment lexicons or BERT? A comparison of sentiment analysis approaches and their performance.”, The Book of Abstracts of DH2022, ADHO, ed., Tokyo: 2022.
Grisot, G., Rebora, S., Herrmann, J.B., Pennino, F.: Sentiment lexicons or BERT? A comparison of sentiment analysis approaches and their performance. In: ADHO (ed.) The Book of Abstracts of DH2022. Tokyo (2022).
Grisot, Giulia, Rebora, Simone, Herrmann, J. Berenike, and Pennino, Federico. “Sentiment lexicons or BERT? A comparison of sentiment analysis approaches and their performance.”. The Book of Abstracts of DH2022. Ed. ADHO. Tokyo, 2022.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2022-09-08T13:35:44Z
MD5 Prüfsumme
98ad98d97008687be24a3cfc610c5324


Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar