AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis
Voß H, Kopp S (2023)
In: Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). ACM Press.
Konferenzbeitrag
| Veröffentlicht | Englisch
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Einrichtung
Abstract / Bemerkung
The generation of realistic and contextually relevant co-speech gestures is a
challenging yet increasingly important task in the creation of multimodal
artificial agents. Prior methods focused on learning a direct correspondence
between co-speech gesture representations and produced motions, which created
seemingly natural but often unconvincing gestures during human assessment. We
present an approach to pre-train partial gesture sequences using a generative
adversarial network with a quantization pipeline. The resulting codebook
vectors serve as both input and output in our framework, forming the basis for
the generation and reconstruction of gestures. By learning the mapping of a
latent space representation as opposed to directly mapping it to a vector
representation, this framework facilitates the generation of highly realistic
and expressive gestures that closely replicate human movement and behavior,
while simultaneously avoiding artifacts in the generation process. We evaluate
our approach by comparing it with established methods for generating co-speech
gestures as well as with existing datasets of human behavior. We also perform
an ablation study to assess our findings. The results show that our approach
outperforms the current state of the art by a clear margin and is partially
indistinguishable from human gesturing. We make our data pipeline and the
generation framework publicly available.
Erscheinungsjahr
2023
Titel des Konferenzbandes
Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023)
Konferenz
ICMI 2023
Konferenzort
Paris
Page URI
https://pub.uni-bielefeld.de/record/2980542
Zitieren
Voß H, Kopp S. AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis. In: Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). ACM Press; 2023.
Voß, H., & Kopp, S. (2023). AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis. Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023) ACM Press. https://doi.org/10.1145/3577190.3614135
Voß, Hendric, and Kopp, Stefan. 2023. “AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis”. In Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). ACM Press.
Voß, H., and Kopp, S. (2023). “AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis” in Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023) (ACM Press).
Voß, H., & Kopp, S., 2023. AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis. In Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). ACM Press.
H. Voß and S. Kopp, “AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis”, Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023), ACM Press, 2023.
Voß, H., Kopp, S.: AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis. Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). ACM Press (2023).
Voß, Hendric, and Kopp, Stefan. “AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis”. Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). ACM Press, 2023.
Export
Markieren/ Markierung löschen
Markierte Publikationen
Web of Science
Dieser Datensatz im Web of Science®Quellen
arXiv: 2305.01241
Suchen in