Modular Synthesis of Disfluencies for Conversational Speech Systems

Betz, Simon; Wagner, Petra; Schlangen, David

Modular Synthesis of Disfluencies for Conversational Speech Systems

Betz S, Wagner P, Schlangen D (2015)
Presented at the ESSV 2015, Eichstätt.

Konferenzbeitrag | Veröffentlicht | Englisch

Download

BETZ_WAGNER_SCHLANGEN_ESSV2015.pdf

URN

urn:nbn:de:0070-pub-27199738

Autor*in

Betz, Simon^UniBi ; Wagner, Petra^UniBi ; Schlangen, David^UniBi

Einrichtung

Fakultät für Linguistik und Literaturwissenschaft > Department Linguistik
Center of Excellence - Cognitive Interaction Technology CITEC
Fakultät für Linguistik und Literaturwissenschaft > Arbeitsgruppe Angewandte Computerlinguistik

Abstract / Bemerkung

It has been shown that dialogue systems benefit from incremental architectures to produce fast responses and to interact with the interlocutor in a more human-like way. The advantage of quick responses yields the disadvantage of running out of things to say for a while. In such occasions, humans tend to produce disfluencies as a listener-oriented strategy to signal the ongoing production process and to buy time for finalizing the turn. Introducing disfluency capabilities into a speech synthesis module of a dialogue system may therefore be a straightforward strategy towards conversational speech systems. Disfluencies are a very complex matter, they can take various chaining and nested forms in human communication. We do not attempt to equip our system with the full range of possible disfluent time-buying strategies found in human interaction. For a first perceptual evaluation of the most suitable synthetic disfluency strategy to be integrated into the dialogue system, we focus on three structural factors that are able to cover a wide range of attested disfluency patterns: lengthening, word cutoffs and pauses. This leads to several different configurations a disfluent sentence can take. Sentences from a spontaneous speech corpus were resynthesized in all possible configurations using Mary TTS. In order to identify euphone configurations, these stimuli were then presented to test subjects in a perception test.

Stichworte

Incrementality; Disfluencies; Speech Synthesis; biphonetics

Erscheinungsjahr

2015

Seite(n)

128-134

Urheberrecht / Lizenzen

Creative Commons Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International (CC BY-NC-ND 4.0)

Konferenz

ESSV 2015

Konferenzort

Eichstätt

Konferenzdatum

2015-03-25 – 2015-03-27

Page URI

https://pub.uni-bielefeld.de/record/2719973

Zitieren

Betz S, Wagner P, Schlangen D. Modular Synthesis of Disfluencies for Conversational Speech Systems. Presented at the ESSV 2015, Eichstätt.

Betz, S., Wagner, P., & Schlangen, D. (2015). Modular Synthesis of Disfluencies for Conversational Speech Systems. Presented at the ESSV 2015, Eichstätt.

Betz, Simon, Wagner, Petra, and Schlangen, David. 2015. “Modular Synthesis of Disfluencies for Conversational Speech Systems”. Presented at the ESSV 2015, Eichstätt , 128-134.

Betz, S., Wagner, P., and Schlangen, D. (2015).“Modular Synthesis of Disfluencies for Conversational Speech Systems”. Presented at the ESSV 2015, Eichstätt.

Betz, S., Wagner, P., & Schlangen, D., 2015. Modular Synthesis of Disfluencies for Conversational Speech Systems. Presented at the ESSV 2015, Eichstätt.

S. Betz, P. Wagner, and D. Schlangen, “Modular Synthesis of Disfluencies for Conversational Speech Systems”, Presented at the ESSV 2015, Eichstätt, 2015.

Betz, S., Wagner, P., Schlangen, D.: Modular Synthesis of Disfluencies for Conversational Speech Systems. Presented at the ESSV 2015, Eichstätt (2015).

Betz, Simon, Wagner, Petra, and Schlangen, David. “Modular Synthesis of Disfluencies for Conversational Speech Systems”. Presented at the ESSV 2015, Eichstätt, 2015.

Alle Dateien verfügbar unter der/den folgenden Lizenz(en):

Creative Commons Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International (CC BY-NC-ND 4.0):