Discerning dimensions of quality for state of the art synthetic speech

Seebauer FM, Kuhlmann M, Haeb-Umbach R, Wagner P (2023)
In: Proceedings of the 20th International Congress of Phonetic Sciences. Skarnitzl R, Volín J (Eds); Prague: 3106-3110.

Konferenzbeitrag | Veröffentlicht | Englisch
 
Download
OA 194.87 KB
Autor*in
Seebauer, Fritz MichaelUniBi; Kuhlmann, Michael; Haeb-Umbach, Reinhold; Wagner, PetraUniBi
Herausgeber*in
Skarnitzl, Radek; Volín, Jan
Abstract / Bemerkung
This paper describes an approach for determining the dimensions of quality for state-of-the-art synthetic speech. We propose that current evaluation metrics do not fully capture the meaningful dimensions of text-to-speech (TTS) and voice conversion (VC) systems. In order to develop a revised paradigm for meaningful evaluation, we conducted two experiments. First, we determined descriptive terms by querying naïve listeners on their impressions of modern TTS and VC systems. In a second experiment, we refined these terms into dimensions of quality and similarity by showcasing a consolidation procedure of manual clusterings. The resulting dimensions contain the standard evaluation categories of “intelligibility” and “naturalness” for both conditions. We could additionally discern dimensions of “tempo” and “demographics” in both domains. The final two dimensions as well as the relationships between categories proved to be different between TTS and VC, suggesting the need for modified evaluation scales based on the target construct.
Stichworte
biphonetics
Erscheinungsjahr
2023
Titel des Konferenzbandes
Proceedings of the 20th International Congress of Phonetic Sciences
Seite(n)
3106-3110
Konferenz
20th International Congress of Phonetic Sciences
Konferenzort
Prague
Konferenzdatum
2023-08-07 – 2023-08-11
ISBN
978-80-908 114-2-3
Page URI
https://pub.uni-bielefeld.de/record/2982616

Zitieren

Seebauer FM, Kuhlmann M, Haeb-Umbach R, Wagner P. Discerning dimensions of quality for state of the art synthetic speech. In: Skarnitzl R, Volín J, eds. Proceedings of the 20th International Congress of Phonetic Sciences. Prague; 2023: 3106-3110.
Seebauer, F. M., Kuhlmann, M., Haeb-Umbach, R., & Wagner, P. (2023). Discerning dimensions of quality for state of the art synthetic speech. In R. Skarnitzl & J. Volín (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (pp. 3106-3110). Prague.
Seebauer, Fritz Michael, Kuhlmann, Michael, Haeb-Umbach, Reinhold, and Wagner, Petra. 2023. “Discerning dimensions of quality for state of the art synthetic speech”. In Proceedings of the 20th International Congress of Phonetic Sciences, ed. Radek Skarnitzl and Jan Volín, 3106-3110. Prague.
Seebauer, F. M., Kuhlmann, M., Haeb-Umbach, R., and Wagner, P. (2023). “Discerning dimensions of quality for state of the art synthetic speech” in Proceedings of the 20th International Congress of Phonetic Sciences, Skarnitzl, R., and Volín, J. eds. (Prague), 3106-3110.
Seebauer, F.M., et al., 2023. Discerning dimensions of quality for state of the art synthetic speech. In R. Skarnitzl & J. Volín, eds. Proceedings of the 20th International Congress of Phonetic Sciences. Prague, pp. 3106-3110.
F.M. Seebauer, et al., “Discerning dimensions of quality for state of the art synthetic speech”, Proceedings of the 20th International Congress of Phonetic Sciences, R. Skarnitzl and J. Volín, eds., Prague: 2023, pp.3106-3110.
Seebauer, F.M., Kuhlmann, M., Haeb-Umbach, R., Wagner, P.: Discerning dimensions of quality for state of the art synthetic speech. In: Skarnitzl, R. and Volín, J. (eds.) Proceedings of the 20th International Congress of Phonetic Sciences. p. 3106-3110. Prague (2023).
Seebauer, Fritz Michael, Kuhlmann, Michael, Haeb-Umbach, Reinhold, and Wagner, Petra. “Discerning dimensions of quality for state of the art synthetic speech”. Proceedings of the 20th International Congress of Phonetic Sciences. Ed. Radek Skarnitzl and Jan Volín. Prague, 2023. 3106-3110.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Volltext(e)
Name
194.87 KB
Access Level
OA Open Access
Zuletzt Hochgeladen
2023-09-05T14:33:22Z
MD5 Prüfsumme
7386c936108e64ad72d5249096d8215a


Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar
ISBN Suche