Assessing the impact of contextual framing on subjective TTS quality

Edlund J, Tånnander C, LeMaguer S, Wagner P (2024)
In: Proceedings of INTERSPEECH 2024. 1205--1209.

Konferenzbeitrag | Veröffentlicht | Englisch
 
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Edlund, Jens; Tånnander, Christina; LeMaguer, Sébastien; Wagner, PetraUniBi
Abstract / Bemerkung
Text-To-Speech (TTS) evaluations are habitually carried out without contextual and situational framing. Since humans adapt their speaking style to situation specific communicative needs, such evaluations may not generalize across situations. Without clearly defined framing, it is even unclear in which situations evaluation results hold at all. We test the hypothesized impact of framing on TTS evaluation in a crowdsourced MOS evaluation of four TTS voices, systematically varying (a) the intended TTS task (domestic humanoid robot, child’s voice replacement, fiction audio books and long and information-rich texts) and (b) the framing of that task. The results show that framing differentiated MOS responses, with individual TTS performance varying significantly across tasks and framings. This corrobo- rates the assumption that decontextualized MOS evaluations do not generalize, and suggests that TTS evaluations should not be reported without the type of framing that was employed, if any.
Stichworte
biphonetics
Erscheinungsjahr
2024
Titel des Konferenzbandes
Proceedings of INTERSPEECH 2024
Seite(n)
1205--1209
Page URI
https://pub.uni-bielefeld.de/record/2991164

Zitieren

Edlund J, Tånnander C, LeMaguer S, Wagner P. Assessing the impact of contextual framing on subjective TTS quality. In: Proceedings of INTERSPEECH 2024. 2024: 1205--1209.
Edlund, J., Tånnander, C., LeMaguer, S., & Wagner, P. (2024). Assessing the impact of contextual framing on subjective TTS quality. Proceedings of INTERSPEECH 2024, 1205--1209. https://doi.org/10.21437/Interspeech.2024-781
Edlund, Jens, Tånnander, Christina, LeMaguer, Sébastien, and Wagner, Petra. 2024. “Assessing the impact of contextual framing on subjective TTS quality”. In Proceedings of INTERSPEECH 2024, 1205--1209.
Edlund, J., Tånnander, C., LeMaguer, S., and Wagner, P. (2024). “Assessing the impact of contextual framing on subjective TTS quality” in Proceedings of INTERSPEECH 2024 1205--1209.
Edlund, J., et al., 2024. Assessing the impact of contextual framing on subjective TTS quality. In Proceedings of INTERSPEECH 2024. pp. 1205--1209.
J. Edlund, et al., “Assessing the impact of contextual framing on subjective TTS quality”, Proceedings of INTERSPEECH 2024, 2024, pp.1205--1209.
Edlund, J., Tånnander, C., LeMaguer, S., Wagner, P.: Assessing the impact of contextual framing on subjective TTS quality. Proceedings of INTERSPEECH 2024. p. 1205--1209. (2024).
Edlund, Jens, Tånnander, Christina, LeMaguer, Sébastien, and Wagner, Petra. “Assessing the impact of contextual framing on subjective TTS quality”. Proceedings of INTERSPEECH 2024. 2024. 1205--1209.
Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar