Predicting sentiments and space in Swiss literature using BERT and Prodigy

Grisot G, Pennino F, Herrmann JB (2022)
Presented at the CHR2023 - 3rd Conference on Computational Humanities Research, Antwerp.

Kurzbeitrag Konferenz / Poster | Englisch
 
Download
OA 2.75 MB
OA abstract.pdf 229.75 KB
Autor*in
Abstract / Bemerkung
Thanks to the development of new powerful technologies for computational data analysis, an increasing number of researchers has investigated sentiment in texts, making use of traditional corpus linguistic approaches as well as machine learning tools. When considering literary texts, however, sentiment analysis is still in its infancy, especially when it focuses on languages other than English [1]. Crucially, only very few studies so far have related the representation of sentiment and emotions to that of space. This has depended partly on the limited amount of literary texts available digitally and partly of the challenges of defining and identifying space in literature. Emotions and space are however central to the experience of literary narrative [2, 3, 4], and recent advances in their systematic, quantitative analysis have been made within computational literary studies [5, 6, 7]. Using lexicon-based methods, Grisot and Herrmann [8] investigated emotions and sentiments in relation to the representation of literary space, looking in particular at the differences between the rural and urban landscapes portrayed in a corpus of Swiss novels written in German. The present paper takes a step forward, building on their data and using manual annotation and advanced machine learning methods to train a fine-tuned model, in order to automatically detect and recognise on the one hand sentiment (valence, arousal) and discrete emotions (joy, anger, sadness, disgust, fear, surprise), and on the other spatial entities (named and unnamed), in a historical corpus of Swiss novels. With such model, we aim at higher levels of lexical coverage and validity when compared to existing results obtained with sentiment lexicons and entities lists. Using a language model trained on a large corpus (3000+) of German literary texts spanning from 1800 to 1950 (Literary German BERT [9]), we make use of BERT word embeddings [10], Prodigy active learning tool [11] and manually annotated sentences to recognise sentiment, emotions and space, and see whether and how these relate to one another. More than 6000 sentences were taken from Swiss-German novels and annotated for discrete emotions, valence (understood here as the degree of 'positivity' of the detected emotion) and arousal (its 'intensity' or 'degree of activation'), while active learning was used on more than 4000 sentences to implement existing lists of labelled spatial entities. Annotations were conducted by several trained student assistants. The annotated samples were employed to train a deep learning classifier using BERT transformers. In this preliminary phase we reached an accuracy over 70% on valence prediction, an over 66% on emotion prediction, and an around 64% on arousal prediction. In terms of space, we used active learning on a word2vec model bootstrapped with a Swiss Geographic location corpus, annotating sentences on six categories (geolocations: *geo-rural, geo-natural, geo-urban*, and spatial terms (unnamed): *rural, natural, urban*). For these, we obtained the following preliminary results: F1: .65, precision: .66, and recall: .64. These scores are very promising, suggesting the possibility – provided more training data – of a full automation of the annotation task on our domain of historical literary texts, both in terms of sentiments and in terms of spatial entities. We are currently gathering more annotations, and at the time of the conference shall be able to update the results on a broader data base, and to show whether our model will be able to predict a relation between sentiments and space in Swiss literature. While potentially taking automatic SA of German literary texts to a new level, our study also allows evaluating the performance of lexicon-based in direct comparison with deep learning SA approaches, thus allowing to gauge the validity of different SA methods on a data-driven basis. This approach also raises questions concerning the effect of genre on the ease and validity of manual sentiment annotations. References [1] R. Klinger, S. S. Suliya, N. Reiter, Automatic Emotion Detection for Quantitative Literary Studies. A case study based on Franz Kafka's “Das Schloss” and “Amerika”, Proceedings of the Digital Humanities (2016). [2] K. Oatley, A taxonomy of the emotions of literary response and a theory of identification in fictional narrative, Poetics 23 (1995) 53–74. URL: https://www.sciencedirect.com/science/article/pii/0304422X94P4296S. doi:https://doi.org/10.1016/0304-422X(94) P4296-S. [3] K. Oatley, Fiction and its study as gateways to the mind, Scientific Study of Literature 1 (2011) 153–164. doi:10.1075/ssol.1.1.16oat. [4] P. C. Hogan, Affect Studies, 2016. URL: https://oxfordre.com/literature/view/10.1093/acrefore/9780190201098.001.0001/acrefore-9780190201098-e-105. doi:10.1093/acrefore/9780190201098.013.105. [5] R. Heuser, M. Algee-Hewitt, A. Lockhart, Mapping the emotions of London in fiction, 1700–1900: A crowdsourcing experiment, in: Literary mapping in the digital age, Routledge, 2016, pp. 43–64. [6] M. Jockers, Extracts sentiment and sentiment-derived plot arcs from text, R package “syuzhet (2017). [7] M. Burghardt, C. Wolff, T. Schmidt, Toward multimodal sentiment analysis of historic plays: A case study with text and audio for Lessing's Emilia Galotti, in: 4th Conference of the Association Digital Humanities in the Nordic Countries, Copenhagen, 2019. [8] G. Grisot, J. B. Herrmann, Examining the representation of landscape and its emotional value in German-Swiss fiction around 1900, 2022. [9] F. Fischer, J. Str ̈otgen, Corpus of German-Language Fiction (txt), 2017. URL: https://figshare.com/articles/dataset/Corpus of German-Language Fiction txt /4524680https://ndownloader.figshare.com/files/7320866. doi:10.6084/m9.figshare.4524680.v1. [10] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [11] I. Montani, M. Honnibal, Prodigy: A new annotation tool for radically efficient machine teaching, Artificial Intelligence to appear (2018).
Stichworte
Sentiment Analysis; Geography of Literature; Machine Learning; BERT; Swiss Literature
Erscheinungsjahr
2022
Konferenz
CHR2023 - 3rd Conference on Computational Humanities Research
Konferenzort
Antwerp
Konferenzdatum
2022-12-12 – 2022-12-14
Page URI
https://pub.uni-bielefeld.de/record/2969114

Zitieren

Grisot G, Pennino F, Herrmann JB. Predicting sentiments and space in Swiss literature using BERT and Prodigy. Presented at the CHR2023 - 3rd Conference on Computational Humanities Research, Antwerp.
Grisot, G., Pennino, F., & Herrmann, J. B. (2022). Predicting sentiments and space in Swiss literature using BERT and Prodigy. Presented at the CHR2023 - 3rd Conference on Computational Humanities Research, Antwerp.
Grisot, Giulia, Pennino, Federico, and Herrmann, J. Berenike. 2022. “Predicting sentiments and space in Swiss literature using BERT and Prodigy”. Presented at the CHR2023 - 3rd Conference on Computational Humanities Research, Antwerp .
Grisot, G., Pennino, F., and Herrmann, J. B. (2022).“Predicting sentiments and space in Swiss literature using BERT and Prodigy”. Presented at the CHR2023 - 3rd Conference on Computational Humanities Research, Antwerp.
Grisot, G., Pennino, F., & Herrmann, J.B., 2022. Predicting sentiments and space in Swiss literature using BERT and Prodigy. Presented at the CHR2023 - 3rd Conference on Computational Humanities Research, Antwerp.
G. Grisot, F. Pennino, and J.B. Herrmann, “Predicting sentiments and space in Swiss literature using BERT and Prodigy”, Presented at the CHR2023 - 3rd Conference on Computational Humanities Research, Antwerp, 2022.
Grisot, G., Pennino, F., Herrmann, J.B.: Predicting sentiments and space in Swiss literature using BERT and Prodigy. Presented at the CHR2023 - 3rd Conference on Computational Humanities Research, Antwerp (2022).
Grisot, Giulia, Pennino, Federico, and Herrmann, J. Berenike. “Predicting sentiments and space in Swiss literature using BERT and Prodigy”. Presented at the CHR2023 - 3rd Conference on Computational Humanities Research, Antwerp, 2022.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2023-02-21T14:31:44Z
MD5 Prüfsumme
feca2749159146cfc726bc9c8f9f846c
Name
229.75 KB
Access Level
OA Open Access
Zuletzt Hochgeladen
2023-02-21T14:33:07Z
MD5 Prüfsumme
105ecbb27d550eec1685ec1f9a2bc443


Link(s) zu Volltext(en)
Access Level
OA Open Access

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar