FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation
Harz L, Voß H, Kopp S (2023)
In: Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23). André E, Chetouani M, Vaufreydaz D, Lucas G, Schultz T, Morency L-P, Vinciarelli A (Eds); New York, NY, USA: ACM: 763–771.
Konferenzbeitrag
| Veröffentlicht | Englisch
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Herausgeber*in
André, Elisabeth;
Chetouani, Mohamed;
Vaufreydaz, Dominique;
Lucas, Gale;
Schultz, Tanja;
Morency, Louis-Philippe;
Vinciarelli, Alessandro
Einrichtung
Abstract / Bemerkung
Human communication relies on multiple modalities such as verbal expressions, facial cues, and bodily gestures. Developing computational approaches to process and generate these multimodal signals is critical for seamless human-agent interaction. A particular challenge is the generation of co-speech gestures due to the large variability and number of gestures that can accompany a verbal utterance, leading to a one-to-many mapping problem. This paper presents an approach based on a Feature Extraction Infusion Network (FEIN-Z) that adopts insights from robot imitation learning and applies them to co-speech gesture generation. Building on the BC-Z architecture, our framework combines transformer architectures and Wasserstein generative adversarial networks. We describe the FEIN-Z methodology and evaluation results obtained within the GENEA Challenge 2023, demonstrating good results and significant improvements in human-likeness over the GENEA baseline. We discuss potential areas for improvement, such as refining input segmentation, employing more fine-grained control networks, and exploring alternative inference methods.
Erscheinungsjahr
2023
Titel des Konferenzbandes
Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23)
Seite(n)
763–771
Konferenz
ICMI '23: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
Konferenzort
Paris France
Konferenzdatum
2023-10-09 – 2023-10-13
ISBN
979-8-4007-0055-2
Page URI
https://pub.uni-bielefeld.de/record/2984549
Zitieren
Harz L, Voß H, Kopp S. FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation. In: André E, Chetouani M, Vaufreydaz D, et al., eds. Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23). New York, NY, USA: ACM; 2023: 763–771.
Harz, L., Voß, H., & Kopp, S. (2023). FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation. In E. André, M. Chetouani, D. Vaufreydaz, G. Lucas, T. Schultz, L. - P. Morency, & A. Vinciarelli (Eds.), Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23) (p. 763–771). New York, NY, USA: ACM. https://doi.org/10.1145/3577190.3616115
Harz, Leon, Voß, Hendric, and Kopp, Stefan. 2023. “FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation”. In Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23), ed. Elisabeth André, Mohamed Chetouani, Dominique Vaufreydaz, Gale Lucas, Tanja Schultz, Louis-Philippe Morency, and Alessandro Vinciarelli, 763–771. New York, NY, USA: ACM.
Harz, L., Voß, H., and Kopp, S. (2023). “FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation” in Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23), André, E., Chetouani, M., Vaufreydaz, D., Lucas, G., Schultz, T., Morency, L. - P., and Vinciarelli, A. eds. (New York, NY, USA: ACM), 763–771.
Harz, L., Voß, H., & Kopp, S., 2023. FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation. In E. André, et al., eds. Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23). New York, NY, USA: ACM, pp. 763–771.
L. Harz, H. Voß, and S. Kopp, “FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation”, Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23), E. André, et al., eds., New York, NY, USA: ACM, 2023, pp.763–771.
Harz, L., Voß, H., Kopp, S.: FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation. In: André, E., Chetouani, M., Vaufreydaz, D., Lucas, G., Schultz, T., Morency, L.-P., and Vinciarelli, A. (eds.) Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23). p. 763–771. ACM, New York, NY, USA (2023).
Harz, Leon, Voß, Hendric, and Kopp, Stefan. “FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation”. Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23). Ed. Elisabeth André, Mohamed Chetouani, Dominique Vaufreydaz, Gale Lucas, Tanja Schultz, Louis-Philippe Morency, and Alessandro Vinciarelli. New York, NY, USA: ACM, 2023. 763–771.