AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech  Gesture Synthesis

Voß, Hendric; Kopp, Stefan

AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

Voß H, Kopp S (2023)
In: Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). ACM Press.

Konferenzbeitrag | Veröffentlicht | Englisch

Download

Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!

DOI

https://doi.org/10.1145/3577190.3614135

Autor*in

Voß, Hendric^UniBi ; Kopp, Stefan^UniBi

Einrichtung

Research Institute for Cognition and Robotics
Technische Fakultät > AG Kognitive Systeme und soziale Interaktion

Abstract / Bemerkung

The generation of realistic and contextually relevant co-speech gestures is a challenging yet increasingly important task in the creation of multimodal artificial agents. Prior methods focused on learning a direct correspondence between co-speech gesture representations and produced motions, which created seemingly natural but often unconvincing gestures during human assessment. We present an approach to pre-train partial gesture sequences using a generative adversarial network with a quantization pipeline. The resulting codebook vectors serve as both input and output in our framework, forming the basis for the generation and reconstruction of gestures. By learning the mapping of a latent space representation as opposed to directly mapping it to a vector representation, this framework facilitates the generation of highly realistic and expressive gestures that closely replicate human movement and behavior, while simultaneously avoiding artifacts in the generation process. We evaluate our approach by comparing it with established methods for generating co-speech gestures as well as with existing datasets of human behavior. We also perform an ablation study to assess our findings. The results show that our approach outperforms the current state of the art by a clear margin and is partially indistinguishable from human gesturing. We make our data pipeline and the generation framework publicly available.

Erscheinungsjahr

2023

Titel des Konferenzbandes

Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023)

Konferenz

ICMI 2023

Konferenzort

Paris

Page URI

https://pub.uni-bielefeld.de/record/2980542

Zitieren

Voß H, Kopp S. AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis. In: Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). ACM Press; 2023.

Voß, H., & Kopp, S. (2023). AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis. Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023) ACM Press. https://doi.org/10.1145/3577190.3614135

Voß, Hendric, and Kopp, Stefan. 2023. “AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis”. In Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). ACM Press.

Voß, H., and Kopp, S. (2023). “AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis” in Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023) (ACM Press).

Voß, H., & Kopp, S., 2023. AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis. In Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). ACM Press.

H. Voß and S. Kopp, “AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis”, Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023), ACM Press, 2023.

Voß, H., Kopp, S.: AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis. Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). ACM Press (2023).

Voß, Hendric, and Kopp, Stefan. “AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis”. Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). ACM Press, 2023.

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

arXiv: 2305.01241

Suchen in

Google Scholar

PUB - Publikationen an der Universität Bielefeld