FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation

Harz, Leon; Voß, Hendric; Kopp, Stefan

FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation

Harz L, Voß H, Kopp S (2023)
In: Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23). André E, Chetouani M, Vaufreydaz D, Lucas G, Schultz T, Morency L-P, Vinciarelli A (Eds); New York, NY, USA: ACM: 763–771.

Konferenzbeitrag | Veröffentlicht | Englisch

Download

Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!

DOI

https://doi.org/10.1145/3577190.3616115

Autor*in

Harz, Leon; Voß, Hendric^UniBi ; Kopp, Stefan^UniBi

Herausgeber*in

André, Elisabeth; Chetouani, Mohamed; Vaufreydaz, Dominique; Lucas, Gale; Schultz, Tanja; Morency, Louis-Philippe; Vinciarelli, Alessandro

Einrichtung

Technische Fakultät > AG Kognitive Systeme und soziale Interaktion
Center of Excellence - Cognitive Interaction Technology CITEC

Abstract / Bemerkung

Human communication relies on multiple modalities such as verbal expressions, facial cues, and bodily gestures. Developing computational approaches to process and generate these multimodal signals is critical for seamless human-agent interaction. A particular challenge is the generation of co-speech gestures due to the large variability and number of gestures that can accompany a verbal utterance, leading to a one-to-many mapping problem. This paper presents an approach based on a Feature Extraction Infusion Network (FEIN-Z) that adopts insights from robot imitation learning and applies them to co-speech gesture generation. Building on the BC-Z architecture, our framework combines transformer architectures and Wasserstein generative adversarial networks. We describe the FEIN-Z methodology and evaluation results obtained within the GENEA Challenge 2023, demonstrating good results and significant improvements in human-likeness over the GENEA baseline. We discuss potential areas for improvement, such as refining input segmentation, employing more fine-grained control networks, and exploring alternative inference methods.

Erscheinungsjahr

2023

Titel des Konferenzbandes

Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23)

Seite(n)

763–771

Konferenz

ICMI '23: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

Konferenzort

Paris France

Konferenzdatum

2023-10-09 – 2023-10-13

ISBN

979-8-4007-0055-2

Page URI

https://pub.uni-bielefeld.de/record/2984549

Zitieren

Harz L, Voß H, Kopp S. FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation. In: André E, Chetouani M, Vaufreydaz D, et al., eds. Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23). New York, NY, USA: ACM; 2023: 763–771.

Harz, L., Voß, H., & Kopp, S. (2023). FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation. In E. André, M. Chetouani, D. Vaufreydaz, G. Lucas, T. Schultz, L. - P. Morency, & A. Vinciarelli (Eds.), Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23) (p. 763–771). New York, NY, USA: ACM. https://doi.org/10.1145/3577190.3616115

Harz, Leon, Voß, Hendric, and Kopp, Stefan. 2023. “FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation”. In Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23), ed. Elisabeth André, Mohamed Chetouani, Dominique Vaufreydaz, Gale Lucas, Tanja Schultz, Louis-Philippe Morency, and Alessandro Vinciarelli, 763–771. New York, NY, USA: ACM.

Harz, L., Voß, H., and Kopp, S. (2023). “FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation” in Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23), André, E., Chetouani, M., Vaufreydaz, D., Lucas, G., Schultz, T., Morency, L. - P., and Vinciarelli, A. eds. (New York, NY, USA: ACM), 763–771.

Harz, L., Voß, H., & Kopp, S., 2023. FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation. In E. André, et al., eds. Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23). New York, NY, USA: ACM, pp. 763–771.

L. Harz, H. Voß, and S. Kopp, “FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation”, Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23), E. André, et al., eds., New York, NY, USA: ACM, 2023, pp.763–771.

Harz, L., Voß, H., Kopp, S.: FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation. In: André, E., Chetouani, M., Vaufreydaz, D., Lucas, G., Schultz, T., Morency, L.-P., and Vinciarelli, A. (eds.) Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23). p. 763–771. ACM, New York, NY, USA (2023).

Harz, Leon, Voß, Hendric, and Kopp, Stefan. “FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation”. Proceedings of the 25th International Conference on Multimodal Interaction (ICMI '23). Ed. Elisabeth André, Mohamed Chetouani, Dominique Vaufreydaz, Gale Lucas, Tanja Schultz, Louis-Philippe Morency, and Alessandro Vinciarelli. New York, NY, USA: ACM, 2023. 763–771.

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar
ISBN Suche

PUB - Publikationen an der Universität Bielefeld

FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation

Zitieren