Assessing the impact of contextual framing on subjective TTS quality

Edlund, Jens; Tånnander, Christina; LeMaguer, Sébastien; Wagner, Petra

Assessing the impact of contextual framing on subjective TTS quality

Edlund J, Tånnander C, LeMaguer S, Wagner P (2024)
In: Proceedings of INTERSPEECH 2024. 1205--1209.

Konferenzbeitrag | Veröffentlicht | Englisch

Download

Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!

DOI

https://doi.org/10.21437/Interspeech.2024-781

Autor*in

Edlund, Jens; Tånnander, Christina; LeMaguer, Sébastien; Wagner, Petra^UniBi

Einrichtung

Center of Excellence - Cognitive Interaction Technology CITEC > Phonetik
Fakultät für Linguistik und Literaturwissenschaft

Projekt

Lernen tiefer Sprachrepräsentationen

Abstract / Bemerkung

Text-To-Speech (TTS) evaluations are habitually carried out without contextual and situational framing. Since humans adapt their speaking style to situation specific communicative needs, such evaluations may not generalize across situations. Without clearly defined framing, it is even unclear in which situations evaluation results hold at all. We test the hypothesized impact of framing on TTS evaluation in a crowdsourced MOS evaluation of four TTS voices, systematically varying (a) the intended TTS task (domestic humanoid robot, child’s voice replacement, fiction audio books and long and information-rich texts) and (b) the framing of that task. The results show that framing differentiated MOS responses, with individual TTS performance varying significantly across tasks and framings. This corrobo- rates the assumption that decontextualized MOS evaluations do not generalize, and suggests that TTS evaluations should not be reported without the type of framing that was employed, if any.

Stichworte

biphonetics

Erscheinungsjahr

2024

Titel des Konferenzbandes

Proceedings of INTERSPEECH 2024

Seite(n)

1205--1209

Page URI

https://pub.uni-bielefeld.de/record/2991164

Zitieren

Edlund J, Tånnander C, LeMaguer S, Wagner P. Assessing the impact of contextual framing on subjective TTS quality. In: Proceedings of INTERSPEECH 2024. 2024: 1205--1209.

Edlund, J., Tånnander, C., LeMaguer, S., & Wagner, P. (2024). Assessing the impact of contextual framing on subjective TTS quality. Proceedings of INTERSPEECH 2024, 1205--1209. https://doi.org/10.21437/Interspeech.2024-781

Edlund, Jens, Tånnander, Christina, LeMaguer, Sébastien, and Wagner, Petra. 2024. “Assessing the impact of contextual framing on subjective TTS quality”. In Proceedings of INTERSPEECH 2024, 1205--1209.

Edlund, J., Tånnander, C., LeMaguer, S., and Wagner, P. (2024). “Assessing the impact of contextual framing on subjective TTS quality” in Proceedings of INTERSPEECH 2024 1205--1209.

Edlund, J., et al., 2024. Assessing the impact of contextual framing on subjective TTS quality. In Proceedings of INTERSPEECH 2024. pp. 1205--1209.

J. Edlund, et al., “Assessing the impact of contextual framing on subjective TTS quality”, Proceedings of INTERSPEECH 2024, 2024, pp.1205--1209.

Edlund, J., Tånnander, C., LeMaguer, S., Wagner, P.: Assessing the impact of contextual framing on subjective TTS quality. Proceedings of INTERSPEECH 2024. p. 1205--1209. (2024).

Edlund, Jens, Tånnander, Christina, LeMaguer, Sébastien, and Wagner, Petra. “Assessing the impact of contextual framing on subjective TTS quality”. Proceedings of INTERSPEECH 2024. 2024. 1205--1209.

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar

PUB - Publikationen an der Universität Bielefeld

Assessing the impact of contextual framing on subjective TTS quality

Zitieren