A multimodal, multilabel approach to recognize emotions in oral history interviews

Viswanath, Anargh; Gref, Michael; Hassan, Teena; Schmidt, Christoph

A multimodal, multilabel approach to recognize emotions in oral history interviews

Viswanath A, Gref M, Hassan T, Schmidt C (2024)
In: Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE: 291–299.

Konferenzbeitrag | Veröffentlicht | Englisch

Download

viswanath-etal-2024-ACII.pdf 247.33 KB

URN

urn:nbn:de:0070-pub-29925070

Autor*in

Viswanath, Anargh^UniBi ; Gref, Michael; Hassan, Teena; Schmidt, Christoph

Einrichtung

Fakultät für Linguistik und Literaturwissenschaft > Department Linguistik

Abstract / Bemerkung

In this paper, we present a multilabel approach for multimodal emotion recognition in the challenging but unexplored use case of analyzing oral history interviews. Oral history is a methodological tool where historical research into the past is conducted through recorded video interviews that reflect the narrator’s personal experiences. The analysis of emotional content in oral history interviews is helpful in studies to understand trauma and historical memories of individuals. The emotions present in these interviews are subtle, natural, and complex. The lack of self-reported labels necessitates the use of observer-reported emotion labels. However, the complexity of the emotions as well as the diversity in perception of emotions makes it difficult to describe these emotions using single categories. Furthermore, unimodal analysis relying only on one of facial expressions, vocal cues, or spoken words is inadequate given the multimodal nature of the narration. To address these challenges, this paper proposes a multilabel, multimodal approach to perform emotion recognition on the novel HdG dataset, consisting of German oral history interviews. The proposed approach utilizes the state-of-the-art Multimodal Transformer for the fusion of audio, textual, and visual features and extends the work to perform multilabel classification on the six Ekman emotion classes. Our approach achieves a mean AUC score of 0.74 and a mean balAcc score of 0.70, significantly outperforming previous unimodal multiclass methods and setting a benchmark for future multimodal emotion recognition research in this domain.

Stichworte

dililab

Erscheinungsjahr

2024

Titel des Konferenzbandes

Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction (ACII)

Seite(n)

291–299

Urheberrecht / Lizenzen

Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International Public License (CC BY-SA 4.0)

Konferenz

12th International Conference on Affective Computing and Intelligent Interaction (ACII 2024)

Konferenzort

Glasgow, UK

Konferenzdatum

2024-09-15 – 2024-09-18

Page URI

https://pub.uni-bielefeld.de/record/2992507

Zitieren

Viswanath A, Gref M, Hassan T, Schmidt C. A multimodal, multilabel approach to recognize emotions in oral history interviews. In: Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE; 2024: 291–299.

Viswanath, A., Gref, M., Hassan, T., & Schmidt, C. (2024). A multimodal, multilabel approach to recognize emotions in oral history interviews. Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction (ACII), 291–299. IEEE.

Viswanath, Anargh, Gref, Michael, Hassan, Teena, and Schmidt, Christoph. 2024. “A multimodal, multilabel approach to recognize emotions in oral history interviews”. In Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction (ACII), 291–299. IEEE.

Viswanath, A., Gref, M., Hassan, T., and Schmidt, C. (2024). “A multimodal, multilabel approach to recognize emotions in oral history interviews” in Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction (ACII) (IEEE), 291–299.

Viswanath, A., et al., 2024. A multimodal, multilabel approach to recognize emotions in oral history interviews. In Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, pp. 291–299.

A. Viswanath, et al., “A multimodal, multilabel approach to recognize emotions in oral history interviews”, Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction (ACII), IEEE, 2024, pp.291–299.

Viswanath, A., Gref, M., Hassan, T., Schmidt, C.: A multimodal, multilabel approach to recognize emotions in oral history interviews. Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction (ACII). p. 291–299. IEEE (2024).

Viswanath, Anargh, Gref, Michael, Hassan, Teena, and Schmidt, Christoph. “A multimodal, multilabel approach to recognize emotions in oral history interviews”. Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 2024. 291–299.

Alle Dateien verfügbar unter der/den folgenden Lizenz(en):

Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International Public License (CC BY-SA 4.0):