More Than Just Natural. Contextually Relevant and Semantically Meaningful Gesture Generation

Voß H (2026)
Bielefeld: Universität Bielefeld.

Bielefelder E-Dissertation | Englisch
 
Download
OA 17.42 MB
Gutachter*in / Betreuer*in
Abstract / Bemerkung
Human communication is inherently multimodal, with gestures playing a crucial role in conveying meaning beyond words alone. However, current gesture generation systems predominantly focus on visual naturalness while neglecting the communicative pur- pose of gestures, resulting in movements that appear fluid yet lack semantic informa- tion. This thesis addresses the fundamental challenge of creating non-verbal behaviors that are not only visually natural but also semantically meaningful and contextually grounded, thereby enhancing multimodal communication. It establishes a new cate- gory of gesture generation approaches called context-driven, that combines the strengths of both intent-driven and speech-driven gesture generation methods. Through five inter- connected papers, this work develops and evaluates several novel frameworks. First, the AQ-GT model demonstrates how quantization and hybrid GRU-Transformer archi- tectures can generate highly realistic beat gestures, while the AQ-GT-A model extends this work by incorporating form and meaning annotations to guide the gesture genera- tion process. An evaluation study of both models highlights the weaknesses of current speech-driven gesture generation and shifts the focus of this thesis from speech-driven to context-driven approaches. Based on this, the TF-JAX-IK algorithm provides a real- time inverse kinematics solution for mapping high-level gesture concepts onto natural human motion. This TF-JAX-IK algorithm is then used for the ImaGGen framework, which introduces a Semantic Planning approach that generates contextually grounded semantic gestures by analyzing visual input without requiring extensive training data. The evaluation of ImaGGens contextually grounded gestures shows that they signifi- cantly improve the delivery of information, particularly when speech is ambiguous, while maintaining naturalness through the integration of speech-driven beat gestures. The work presented here makes several theoretical and practical contributions to the field of multimodal interaction and gesture generation. It challenges the prevailing as- sumption of recent years that visual naturalness alone is sufficient for effective gesture generation, emphasizing instead the crucial role of communicative intent and semantic grounding in producing meaningful gestures. The developed methods have direct ap- plications in virtual reality and assistive communication tools, enabling virtual agents with ImaGGen-like capabilities to deliver clearer explanations and support more natu- ral interactions. Finally, this thesis focuses on an open-source research approach, with a transparent and extendable code basis for all presented papers.
Jahr
2026
Seite(n)
275
Page URI
https://pub.uni-bielefeld.de/record/3016358

Zitieren

Voß H. More Than Just Natural. Contextually Relevant and Semantically Meaningful Gesture Generation. Bielefeld: Universität Bielefeld; 2026.
Voß, H. (2026). More Than Just Natural. Contextually Relevant and Semantically Meaningful Gesture Generation. Bielefeld: Universität Bielefeld. https://doi.org/10.4119/unibi/3016358
Voß, Hendric. 2026. More Than Just Natural. Contextually Relevant and Semantically Meaningful Gesture Generation. Bielefeld: Universität Bielefeld.
Voß, H. (2026). More Than Just Natural. Contextually Relevant and Semantically Meaningful Gesture Generation. Bielefeld: Universität Bielefeld.
Voß, H., 2026. More Than Just Natural. Contextually Relevant and Semantically Meaningful Gesture Generation, Bielefeld: Universität Bielefeld.
H. Voß, More Than Just Natural. Contextually Relevant and Semantically Meaningful Gesture Generation, Bielefeld: Universität Bielefeld, 2026.
Voß, H.: More Than Just Natural. Contextually Relevant and Semantically Meaningful Gesture Generation. Universität Bielefeld, Bielefeld (2026).
Voß, Hendric. More Than Just Natural. Contextually Relevant and Semantically Meaningful Gesture Generation. Bielefeld: Universität Bielefeld, 2026.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International (CC BY-NC-ND 4.0):
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2026-04-29T13:22:41Z
MD5 Prüfsumme
d236bfbb3e2f81ceb87662e3658c9caa


Material in PUB:
Teil dieser Dissertation
AQ-GT: A Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis
Voß H, Kopp S (2023)
In: Proceedings of the 25th International Conference on Multimodal Interaction (ICMI 2023). André E, Chetouani M, Vaufreydaz D, Lucas G, Schultz T, Morency L-P, Vinciarelli A (Eds); New York: ACM Press: 60-69.
Teil dieser Dissertation
Augmented Co-Speech Gesture Generation: Including Form and Meaning Features to Guide Learning-Based Gesture Synthesis
Voß H, Kopp S (2023)
In: ACM International Conference on Intelligent Virtual Agents (IVA '23). New York: ACM.
Teil dieser Dissertation
Teil dieser Dissertation
Real-Time Inverse Kinematics for Generating Multi-Constrained Movements of Virtual Human Characters
Voß H, Kopp S (2025)
In: Proceedings of the 25th ACM International Conference on Intelligent Virtual Agents. Gebhard P, Schneeberger T, Biancardi B, Sabouret N, Schmitz M, Yumak Z (Eds); New York, NY, USA: ACM.
Teil dieser Dissertation
Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar