Investigation into Target Speaking Rate Adaptation for Voice Conversion

Kuhlmann, Michael; Seebauer, Fritz Michael; Ebbers, Janek; Wagner, Petra; Haeb-Umbach, Reinhold

Investigation into Target Speaking Rate Adaptation for Voice Conversion

Kuhlmann M, Seebauer FM, Ebbers J, Wagner P, Haeb-Umbach R (2022)
In: Proceedings of Interspeech 2022. ISCA: 4930-4934.

Konferenzbeitrag | Veröffentlicht | Englisch

Download

kuhlmann22_interspeech.pdf 303.86 KB

URL

https://www.isca-speech.org/archive/interspeech_2022/kuhlmann22_interspeech.html

DOI

https://doi.org/10.21437/Interspeech.2022-10740

URN

urn:nbn:de:0070-pub-29670233

Autor*in

Kuhlmann, Michael; Seebauer, Fritz Michael^UniBi; Ebbers, Janek; Wagner, Petra^UniBi ; Haeb-Umbach, Reinhold

Einrichtung

Fakultät für Linguistik und Literaturwissenschaft
Center of Excellence - Cognitive Interaction Technology CITEC > Phonetik

Projekt

TRR 318 TP C06: TRR 318 Teilprojekt C06: Technisch ermöglichtes Erklären von Sprecher-Eigenschaften

Abstract / Bemerkung

Disentangling speaker and content attributes of a speech sig- nal into separate latent representations followed by decoding the content with an exchanged speaker representation is a pop- ular approach for voice conversion, which can be trained with non-parallel and unlabeled speech data. However, previous ap- proaches perform disentanglement only implicitly via some sort of information bottleneck or normalization, where it is usually hard to find a good trade-off between voice conversion and con- tent reconstruction. Further, previous works usually do not con- sider an adaptation of the speaking rate to the target speaker or they put some major restrictions to the data or use case. There- fore, the contribution of this work is two-fold. First, we employ an explicit and fully unsupervised disentanglement approach, which has previously only been used for representation learn- ing, and show that it allows to obtain both superior voice conver- sion and content reconstruction. Second, we investigate simple and generic approaches to linearly adapt the length of a speech signal, and hence the speaking rate, to a target speaker and show that the proposed adaptation allows to increase the speaking rate similarity with respect to the target speaker.

Stichworte

voice conversion; any-to-any; speaking rate adaptation; biphonetics

Erscheinungsjahr

2022

Titel des Konferenzbandes

Proceedings of Interspeech 2022

Seite(n)

4930-4934

Urheberrecht / Lizenzen

Creative Commons Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International (CC BY-NC-ND 4.0)

Konferenz

Interspeech 2022

Konferenzort

Incheon, Korea

Konferenzdatum

2022-09-18 – 2022-09-22

Page URI

https://pub.uni-bielefeld.de/record/2967023

Zitieren

Kuhlmann M, Seebauer FM, Ebbers J, Wagner P, Haeb-Umbach R. Investigation into Target Speaking Rate Adaptation for Voice Conversion. In: Proceedings of Interspeech 2022. ISCA; 2022: 4930-4934.

Kuhlmann, M., Seebauer, F. M., Ebbers, J., Wagner, P., & Haeb-Umbach, R. (2022). Investigation into Target Speaking Rate Adaptation for Voice Conversion. Proceedings of Interspeech 2022, 4930-4934. ISCA. https://doi.org/10.21437/Interspeech.2022-10740

Kuhlmann, Michael, Seebauer, Fritz Michael, Ebbers, Janek, Wagner, Petra, and Haeb-Umbach, Reinhold. 2022. “Investigation into Target Speaking Rate Adaptation for Voice Conversion”. In Proceedings of Interspeech 2022, 4930-4934. ISCA.

Kuhlmann, M., Seebauer, F. M., Ebbers, J., Wagner, P., and Haeb-Umbach, R. (2022). “Investigation into Target Speaking Rate Adaptation for Voice Conversion” in Proceedings of Interspeech 2022 (ISCA), 4930-4934.

Kuhlmann, M., et al., 2022. Investigation into Target Speaking Rate Adaptation for Voice Conversion. In Proceedings of Interspeech 2022. ISCA, pp. 4930-4934.

M. Kuhlmann, et al., “Investigation into Target Speaking Rate Adaptation for Voice Conversion”, Proceedings of Interspeech 2022, ISCA, 2022, pp.4930-4934.

Kuhlmann, M., Seebauer, F.M., Ebbers, J., Wagner, P., Haeb-Umbach, R.: Investigation into Target Speaking Rate Adaptation for Voice Conversion. Proceedings of Interspeech 2022. p. 4930-4934. ISCA (2022).

Kuhlmann, Michael, Seebauer, Fritz Michael, Ebbers, Janek, Wagner, Petra, and Haeb-Umbach, Reinhold. “Investigation into Target Speaking Rate Adaptation for Voice Conversion”. Proceedings of Interspeech 2022. ISCA, 2022. 4930-4934.

Alle Dateien verfügbar unter der/den folgenden Lizenz(en):

Creative Commons Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International (CC BY-NC-ND 4.0):