Modular Synthesis of Disfluencies for Conversational Speech Systems

Betz S, Wagner P, Schlangen D (2015)
In: Elektronische Sprachsignalverarbeitung 2015. Conference proceedings of the 26st conference in Eichstätt with 30 contributions. Wirsching G (Ed); Studientexte zur Sprachkommunikation, 78. Dresden: TUD Press: 128-134.

Konferenzbeitrag | Veröffentlicht | Englisch
 
Download
OA
Herausgeber*in
Wirsching, Günther
Abstract / Bemerkung
It has been shown that dialogue systems benefit from incremental architectures to produce fast responses and to interact with the interlocutor in a more human-like way. The advantage of quick responses yields the disadvantage of running out of things to say for a while. In such occasions, humans tend to produce disfluencies as a listener-oriented strategy to signal the ongoing production process and to buy time for finalizing the turn. Introducing disfluency capabilities into a speech synthesis module of a dialogue system may therefore be a straightforward strategy towards conversational speech systems. Disfluencies are a very complex matter, they can take various chaining and nested forms in human communication. We do not attempt to equip our system with the full range of possible disfluent time-buying strategies found in human interaction. For a first perceptual evaluation of the most suitable synthetic disfluency strategy to be integrated into the dialogue system, we focus on three structural factors that are able to cover a wide range of attested disfluency patterns: lengthening, word cutoffs and pauses. This leads to several different configurations a disfluent sentence can take. Sentences from a spontaneous speech corpus were resynthesized in all possible configurations using Mary TTS. In order to identify euphone configurations, these stimuli were then presented to test subjects in a perception test.
Stichworte
Incrementality; Disfluencies; Speech Synthesis; biphonetics
Erscheinungsjahr
2015
Titel des Konferenzbandes
Elektronische Sprachsignalverarbeitung 2015. Conference proceedings of the 26st conference in Eichstätt with 30 contributions.
Serien- oder Zeitschriftentitel
Studientexte zur Sprachkommunikation
Band
78
Seite(n)
128-134
Konferenz
ESSV 2015
Konferenzort
Eichstätt
Konferenzdatum
2015-03-25 – 2015-03-27
ISBN
978-3-959080-00-2
Page URI
https://pub.uni-bielefeld.de/record/2719973

Zitieren

Betz S, Wagner P, Schlangen D. Modular Synthesis of Disfluencies for Conversational Speech Systems. In: Wirsching G, ed. Elektronische Sprachsignalverarbeitung 2015. Conference proceedings of the 26st conference in Eichstätt with 30 contributions. Studientexte zur Sprachkommunikation. Vol 78. Dresden: TUD Press; 2015: 128-134.
Betz, S., Wagner, P., & Schlangen, D. (2015). Modular Synthesis of Disfluencies for Conversational Speech Systems. In G. Wirsching (Ed.), Studientexte zur Sprachkommunikation: Vol. 78. Elektronische Sprachsignalverarbeitung 2015. Conference proceedings of the 26st conference in Eichstätt with 30 contributions. (pp. 128-134). Dresden: TUD Press.
Betz, Simon, Wagner, Petra, and Schlangen, David. 2015. “Modular Synthesis of Disfluencies for Conversational Speech Systems”. In Elektronische Sprachsignalverarbeitung 2015. Conference proceedings of the 26st conference in Eichstätt with 30 contributions., ed. Günther Wirsching, 78:128-134. Studientexte zur Sprachkommunikation. Dresden: TUD Press.
Betz, S., Wagner, P., and Schlangen, D. (2015). “Modular Synthesis of Disfluencies for Conversational Speech Systems” in Elektronische Sprachsignalverarbeitung 2015. Conference proceedings of the 26st conference in Eichstätt with 30 contributions., Wirsching, G. ed. Studientexte zur Sprachkommunikation, vol. 78, (Dresden: TUD Press), 128-134.
Betz, S., Wagner, P., & Schlangen, D., 2015. Modular Synthesis of Disfluencies for Conversational Speech Systems. In G. Wirsching, ed. Elektronische Sprachsignalverarbeitung 2015. Conference proceedings of the 26st conference in Eichstätt with 30 contributions. Studientexte zur Sprachkommunikation. no.78 Dresden: TUD Press, pp. 128-134.
S. Betz, P. Wagner, and D. Schlangen, “Modular Synthesis of Disfluencies for Conversational Speech Systems”, Elektronische Sprachsignalverarbeitung 2015. Conference proceedings of the 26st conference in Eichstätt with 30 contributions., G. Wirsching, ed., Studientexte zur Sprachkommunikation, vol. 78, Dresden: TUD Press, 2015, pp.128-134.
Betz, S., Wagner, P., Schlangen, D.: Modular Synthesis of Disfluencies for Conversational Speech Systems. In: Wirsching, G. (ed.) Elektronische Sprachsignalverarbeitung 2015. Conference proceedings of the 26st conference in Eichstätt with 30 contributions. Studientexte zur Sprachkommunikation. 78, p. 128-134. TUD Press, Dresden (2015).
Betz, Simon, Wagner, Petra, and Schlangen, David. “Modular Synthesis of Disfluencies for Conversational Speech Systems”. Elektronische Sprachsignalverarbeitung 2015. Conference proceedings of the 26st conference in Eichstätt with 30 contributions. Ed. Günther Wirsching. Dresden: TUD Press, 2015.Vol. 78. Studientexte zur Sprachkommunikation. 128-134.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International (CC BY-NC-ND 4.0):
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:30Z
MD5 Prüfsumme
79600cc514ea31339ebd15644c89a0a4


Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar
ISBN Suche