Investigation into Target Speaking Rate Adaptation for Voice Conversion
Kuhlmann M, Seebauer FM, Ebbers J, Wagner P, Haeb-Umbach R (2022)
In: Proceedings of Interspeech 2022. ISCA: 4930-4934.
Konferenzbeitrag
| Veröffentlicht | Englisch
Download
kuhlmann22_interspeech.pdf
303.86 KB
Autor*in
Kuhlmann, Michael;
Seebauer, Fritz MichaelUniBi;
Ebbers, Janek;
Wagner, PetraUniBi ;
Haeb-Umbach, Reinhold
Einrichtung
Projekt
Abstract / Bemerkung
Disentangling speaker and content attributes of a speech sig-
nal into separate latent representations followed by decoding
the content with an exchanged speaker representation is a pop-
ular approach for voice conversion, which can be trained with
non-parallel and unlabeled speech data. However, previous ap-
proaches perform disentanglement only implicitly via some sort
of information bottleneck or normalization, where it is usually
hard to find a good trade-off between voice conversion and con-
tent reconstruction. Further, previous works usually do not con-
sider an adaptation of the speaking rate to the target speaker or
they put some major restrictions to the data or use case. There-
fore, the contribution of this work is two-fold. First, we employ
an explicit and fully unsupervised disentanglement approach,
which has previously only been used for representation learn-
ing, and show that it allows to obtain both superior voice conver-
sion and content reconstruction. Second, we investigate simple
and generic approaches to linearly adapt the length of a speech
signal, and hence the speaking rate, to a target speaker and show
that the proposed adaptation allows to increase the speaking rate
similarity with respect to the target speaker.
Stichworte
voice conversion;
any-to-any;
speaking rate adaptation;
biphonetics
Erscheinungsjahr
2022
Titel des Konferenzbandes
Proceedings of Interspeech 2022
Seite(n)
4930-4934
Urheberrecht / Lizenzen
Konferenz
Interspeech 2022
Konferenzort
Incheon, Korea
Konferenzdatum
2022-09-18 – 2022-09-22
Page URI
https://pub.uni-bielefeld.de/record/2967023
Zitieren
Kuhlmann M, Seebauer FM, Ebbers J, Wagner P, Haeb-Umbach R. Investigation into Target Speaking Rate Adaptation for Voice Conversion. In: Proceedings of Interspeech 2022. ISCA; 2022: 4930-4934.
Kuhlmann, M., Seebauer, F. M., Ebbers, J., Wagner, P., & Haeb-Umbach, R. (2022). Investigation into Target Speaking Rate Adaptation for Voice Conversion. Proceedings of Interspeech 2022, 4930-4934. ISCA. https://doi.org/10.21437/Interspeech.2022-10740
Kuhlmann, Michael, Seebauer, Fritz Michael, Ebbers, Janek, Wagner, Petra, and Haeb-Umbach, Reinhold. 2022. “Investigation into Target Speaking Rate Adaptation for Voice Conversion”. In Proceedings of Interspeech 2022, 4930-4934. ISCA.
Kuhlmann, M., Seebauer, F. M., Ebbers, J., Wagner, P., and Haeb-Umbach, R. (2022). “Investigation into Target Speaking Rate Adaptation for Voice Conversion” in Proceedings of Interspeech 2022 (ISCA), 4930-4934.
Kuhlmann, M., et al., 2022. Investigation into Target Speaking Rate Adaptation for Voice Conversion. In Proceedings of Interspeech 2022. ISCA, pp. 4930-4934.
M. Kuhlmann, et al., “Investigation into Target Speaking Rate Adaptation for Voice Conversion”, Proceedings of Interspeech 2022, ISCA, 2022, pp.4930-4934.
Kuhlmann, M., Seebauer, F.M., Ebbers, J., Wagner, P., Haeb-Umbach, R.: Investigation into Target Speaking Rate Adaptation for Voice Conversion. Proceedings of Interspeech 2022. p. 4930-4934. ISCA (2022).
Kuhlmann, Michael, Seebauer, Fritz Michael, Ebbers, Janek, Wagner, Petra, and Haeb-Umbach, Reinhold. “Investigation into Target Speaking Rate Adaptation for Voice Conversion”. Proceedings of Interspeech 2022. ISCA, 2022. 4930-4934.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International (CC BY-NC-ND 4.0):
Volltext(e)
Name
kuhlmann22_interspeech.pdf
303.86 KB
Access Level
Open Access
Zuletzt Hochgeladen
2022-11-15T13:22:31Z
MD5 Prüfsumme
1052bb187b40117b6fc2757c29c15c23
Link(s) zu Volltext(en)
Access Level
Open Access