Investigation into Target Speaking Rate Adaptation for Voice Conversion

Kuhlmann M, Seebauer FM, Ebbers J, Wagner P, Haeb-Umbach R (2022)
In: Proceedings of Interspeech 2022. ISCA: 4930-4934.

Konferenzbeitrag | Veröffentlicht | Englisch
 
Download
OA 303.86 KB
Autor*in
Kuhlmann, Michael; Seebauer, Fritz MichaelUniBi; Ebbers, Janek; Wagner, PetraUniBi ; Haeb-Umbach, Reinhold
Abstract / Bemerkung
Disentangling speaker and content attributes of a speech sig- nal into separate latent representations followed by decoding the content with an exchanged speaker representation is a pop- ular approach for voice conversion, which can be trained with non-parallel and unlabeled speech data. However, previous ap- proaches perform disentanglement only implicitly via some sort of information bottleneck or normalization, where it is usually hard to find a good trade-off between voice conversion and con- tent reconstruction. Further, previous works usually do not con- sider an adaptation of the speaking rate to the target speaker or they put some major restrictions to the data or use case. There- fore, the contribution of this work is two-fold. First, we employ an explicit and fully unsupervised disentanglement approach, which has previously only been used for representation learn- ing, and show that it allows to obtain both superior voice conver- sion and content reconstruction. Second, we investigate simple and generic approaches to linearly adapt the length of a speech signal, and hence the speaking rate, to a target speaker and show that the proposed adaptation allows to increase the speaking rate similarity with respect to the target speaker.
Stichworte
voice conversion; any-to-any; speaking rate adaptation; biphonetics
Erscheinungsjahr
2022
Titel des Konferenzbandes
Proceedings of Interspeech 2022
Seite(n)
4930-4934
Konferenz
Interspeech 2022
Konferenzort
Incheon, Korea
Konferenzdatum
2022-09-18 – 2022-09-22
Page URI
https://pub.uni-bielefeld.de/record/2967023

Zitieren

Kuhlmann M, Seebauer FM, Ebbers J, Wagner P, Haeb-Umbach R. Investigation into Target Speaking Rate Adaptation for Voice Conversion. In: Proceedings of Interspeech 2022. ISCA; 2022: 4930-4934.
Kuhlmann, M., Seebauer, F. M., Ebbers, J., Wagner, P., & Haeb-Umbach, R. (2022). Investigation into Target Speaking Rate Adaptation for Voice Conversion. Proceedings of Interspeech 2022, 4930-4934. ISCA. https://doi.org/10.21437/Interspeech.2022-10740
Kuhlmann, Michael, Seebauer, Fritz Michael, Ebbers, Janek, Wagner, Petra, and Haeb-Umbach, Reinhold. 2022. “Investigation into Target Speaking Rate Adaptation for Voice Conversion”. In Proceedings of Interspeech 2022, 4930-4934. ISCA.
Kuhlmann, M., Seebauer, F. M., Ebbers, J., Wagner, P., and Haeb-Umbach, R. (2022). “Investigation into Target Speaking Rate Adaptation for Voice Conversion” in Proceedings of Interspeech 2022 (ISCA), 4930-4934.
Kuhlmann, M., et al., 2022. Investigation into Target Speaking Rate Adaptation for Voice Conversion. In Proceedings of Interspeech 2022. ISCA, pp. 4930-4934.
M. Kuhlmann, et al., “Investigation into Target Speaking Rate Adaptation for Voice Conversion”, Proceedings of Interspeech 2022, ISCA, 2022, pp.4930-4934.
Kuhlmann, M., Seebauer, F.M., Ebbers, J., Wagner, P., Haeb-Umbach, R.: Investigation into Target Speaking Rate Adaptation for Voice Conversion. Proceedings of Interspeech 2022. p. 4930-4934. ISCA (2022).
Kuhlmann, Michael, Seebauer, Fritz Michael, Ebbers, Janek, Wagner, Petra, and Haeb-Umbach, Reinhold. “Investigation into Target Speaking Rate Adaptation for Voice Conversion”. Proceedings of Interspeech 2022. ISCA, 2022. 4930-4934.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International (CC BY-NC-ND 4.0):
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2022-11-15T13:22:31Z
MD5 Prüfsumme
1052bb187b40117b6fc2757c29c15c23


Link(s) zu Volltext(en)
Access Level
OA Open Access

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar