The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents
Bergmann K (2012)
Bielefeld, Germany: University.
Bielefelder E-Dissertation | Englisch
Download
Autor*in
Gutachter*in / Betreuer*in
Kopp, Stefan
Einrichtung
Abstract / Bemerkung
The use of speech-accompanying iconic gestures is a ubiquitous characteristic of human-human communication, especially when spatial information is expressed. At the starting point of this thesis, however, it was a widely open question why different gestures take the particular physical form they actually do. Accordingly, previous computational models simulating the use of gestures were of limited significance. The goal of this thesis was to develop a comprehensive computational simulation model for the production of co-speech iconic gestures to be realized in virtual agents. The rationale behind this objective was to devise and probe a predictive model of gesture use in order to gain insight into human gesture production, and thereby to improve human-agent interaction such that it progresses towards intuitive and human-like communication.
As an empirical basis for the generation model, a corpus of natural speech and gesture use was statistically analyzed, revealing novel findings regarding the question when and how speakers use gestures. It was found that iconic gesture use is not only influenced by the shape of the object to depict, but also by other characteristics of the referent, by the linguistic and discourse-contextual situation, as well as a speaker's previous gestural behavior. Further, it could be shown that the role of gestural representation techniques (like placing or drawing) is decisive for the physical form of iconic gestures. And finally, the analysis revealed obvious inter-individual differences, both at the surface of gestural behavior and also in how strong particular influencing relations were.
Based on these empirical insights, the Generation Network for Iconic Gestures (GNetIc) was developed – a computational simulation model for the production of speech-accompanying iconic gestures. It goes beyond previous systems in several respects. First, the model combines data-driven machine learning techniques and rule-based decision making to account for both inter-individual differences in gesture use, as well as patterns of form-meaning mappings specific to representation techniques. Second, the network accounts for the fact that the physical appearance of generated gestures is influenced by multiple factors: characteristic features of the referent accounting for iconicity, as well as contextual factors like the given communicative goal, information state, or previous gesture use. And third, learning gesture networks from individual speakers' data gives an easily interpretable visual image of preferences and strategies in composing gestures and makes them available to generate novel gesture forms in the style of the respective speaker.
GNetIc models were brought to application in an overall architecture for integrated speech and gesture generation. Being equipped with proper knowledge sources, i.e., communicative plans, lexicon, grammar, propositional, and imagistic knowledge, a virtual agent was enabled to autonomously explain buildings of a virtual environment using speech and gestures. By switching between the respective decision networks, the system has the ability to simulate speaker-specific gesture use.
Accounting for the two-fold rationale followed in this thesis, the GNetIc model was finally evaluated in two ways. First, in comparison with empirically observed gestural behavior, the model was shown to be able to successfully approximate human use of iconic gestures, especially when capturing the characteristics of individual speakers' gesture style. Second, when brought to application in a virtual agent, the generated gestural behavior was found to be positively rated by human recipients. In particular, individualized GNetIc-generated gestures could increase the perceived quality of object descriptions. Moreover, the virtual agent itself was rated more positively in terms of verbal capability, likeability, competence, and human-likeness.
Accordingly the results of this work provide first steps towards a more thorough understanding of iconic gesture production in humans and also on how gesture use may improve human-agent interaction.
Jahr
2012
Page URI
https://pub.uni-bielefeld.de/record/2460005
Zitieren
Bergmann K. The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents. Bielefeld, Germany: University; 2012.
Bergmann, K. (2012). The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents. Bielefeld, Germany: University.
Bergmann, Kirsten. 2012. The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents. Bielefeld, Germany: University.
Bergmann, K. (2012). The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents. Bielefeld, Germany: University.
Bergmann, K., 2012. The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents, Bielefeld, Germany: University.
K. Bergmann, The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents, Bielefeld, Germany: University, 2012.
Bergmann, K.: The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents. University, Bielefeld, Germany (2012).
Bergmann, Kirsten. The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents. Bielefeld, Germany: University, 2012.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:17:59Z
MD5 Prüfsumme
e108b0e02c165911dad2c2b7a19de121