Discerning dimensions of quality for state of the art synthetic speech
Seebauer FM, Kuhlmann M, Haeb-Umbach R, Wagner P (2023)
In: Proceedings of the 20th International Congress of Phonetic Sciences. Skarnitzl R, Volín J (Eds); Prague: 3106-3110.
Konferenzbeitrag
| Veröffentlicht | Englisch
Download
438.pdf
194.87 KB
Autor*in
Herausgeber*in
Skarnitzl, Radek;
Volín, Jan
Einrichtung
Abstract / Bemerkung
This paper describes an approach for determining
the dimensions of quality for state-of-the-art
synthetic speech. We propose that current evaluation
metrics do not fully capture the meaningful
dimensions of text-to-speech (TTS) and voice
conversion (VC) systems. In order to develop a
revised paradigm for meaningful evaluation, we
conducted two experiments. First, we determined
descriptive terms by querying naïve listeners on
their impressions of modern TTS and VC systems.
In a second experiment, we refined these terms
into dimensions of quality and similarity by
showcasing a consolidation procedure of manual
clusterings. The resulting dimensions contain the
standard evaluation categories of “intelligibility”
and “naturalness” for both conditions. We could
additionally discern dimensions of “tempo” and
“demographics” in both domains. The final two
dimensions as well as the relationships between
categories proved to be different between TTS and
VC, suggesting the need for modified evaluation
scales based on the target construct.
Stichworte
biphonetics
Erscheinungsjahr
2023
Titel des Konferenzbandes
Proceedings of the 20th International Congress of Phonetic Sciences
Seite(n)
3106-3110
Urheberrecht / Lizenzen
Konferenz
20th International Congress of Phonetic Sciences
Konferenzort
Prague
Konferenzdatum
2023-08-07 – 2023-08-11
ISBN
978-80-908 114-2-3
Page URI
https://pub.uni-bielefeld.de/record/2982616
Zitieren
Seebauer FM, Kuhlmann M, Haeb-Umbach R, Wagner P. Discerning dimensions of quality for state of the art synthetic speech. In: Skarnitzl R, Volín J, eds. Proceedings of the 20th International Congress of Phonetic Sciences. Prague; 2023: 3106-3110.
Seebauer, F. M., Kuhlmann, M., Haeb-Umbach, R., & Wagner, P. (2023). Discerning dimensions of quality for state of the art synthetic speech. In R. Skarnitzl & J. Volín (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (pp. 3106-3110). Prague.
Seebauer, Fritz Michael, Kuhlmann, Michael, Haeb-Umbach, Reinhold, and Wagner, Petra. 2023. “Discerning dimensions of quality for state of the art synthetic speech”. In Proceedings of the 20th International Congress of Phonetic Sciences, ed. Radek Skarnitzl and Jan Volín, 3106-3110. Prague.
Seebauer, F. M., Kuhlmann, M., Haeb-Umbach, R., and Wagner, P. (2023). “Discerning dimensions of quality for state of the art synthetic speech” in Proceedings of the 20th International Congress of Phonetic Sciences, Skarnitzl, R., and Volín, J. eds. (Prague), 3106-3110.
Seebauer, F.M., et al., 2023. Discerning dimensions of quality for state of the art synthetic speech. In R. Skarnitzl & J. Volín, eds. Proceedings of the 20th International Congress of Phonetic Sciences. Prague, pp. 3106-3110.
F.M. Seebauer, et al., “Discerning dimensions of quality for state of the art synthetic speech”, Proceedings of the 20th International Congress of Phonetic Sciences, R. Skarnitzl and J. Volín, eds., Prague: 2023, pp.3106-3110.
Seebauer, F.M., Kuhlmann, M., Haeb-Umbach, R., Wagner, P.: Discerning dimensions of quality for state of the art synthetic speech. In: Skarnitzl, R. and Volín, J. (eds.) Proceedings of the 20th International Congress of Phonetic Sciences. p. 3106-3110. Prague (2023).
Seebauer, Fritz Michael, Kuhlmann, Michael, Haeb-Umbach, Reinhold, and Wagner, Petra. “Discerning dimensions of quality for state of the art synthetic speech”. Proceedings of the 20th International Congress of Phonetic Sciences. Ed. Radek Skarnitzl and Jan Volín. Prague, 2023. 3106-3110.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Volltext(e)
Name
438.pdf
194.87 KB
Access Level
Open Access
Zuletzt Hochgeladen
2023-09-05T14:33:22Z
MD5 Prüfsumme
7386c936108e64ad72d5249096d8215a