The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents

Bergmann K (2012)
Bielefeld, Germany: University.

Download
OA
Bielefeld Dissertation | English
Supervisor
Kopp, Stefan
Abstract
The use of speech-accompanying iconic gestures is a ubiquitous characteristic of human-human communication, especially when spatial information is expressed. At the starting point of this thesis, however, it was a widely open question why different gestures take the particular physical form they actually do. Accordingly, previous computational models simulating the use of gestures were of limited significance. The goal of this thesis was to develop a comprehensive computational simulation model for the production of co-speech iconic gestures to be realized in virtual agents. The rationale behind this objective was to devise and probe a predictive model of gesture use in order to gain insight into human gesture production, and thereby to improve human-agent interaction such that it progresses towards intuitive and human-like communication. As an empirical basis for the generation model, a corpus of natural speech and gesture use was statistically analyzed, revealing novel findings regarding the question when and how speakers use gestures. It was found that iconic gesture use is not only influenced by the shape of the object to depict, but also by other characteristics of the referent, by the linguistic and discourse-contextual situation, as well as a speaker's previous gestural behavior. Further, it could be shown that the role of gestural representation techniques (like placing or drawing) is decisive for the physical form of iconic gestures. And finally, the analysis revealed obvious inter-individual differences, both at the surface of gestural behavior and also in how strong particular influencing relations were. Based on these empirical insights, the Generation Network for Iconic Gestures (GNetIc) was developed – a computational simulation model for the production of speech-accompanying iconic gestures. It goes beyond previous systems in several respects. First, the model combines data-driven machine learning techniques and rule-based decision making to account for both inter-individual differences in gesture use, as well as patterns of form-meaning mappings specific to representation techniques. Second, the network accounts for the fact that the physical appearance of generated gestures is influenced by multiple factors: characteristic features of the referent accounting for iconicity, as well as contextual factors like the given communicative goal, information state, or previous gesture use. And third, learning gesture networks from individual speakers' data gives an easily interpretable visual image of preferences and strategies in composing gestures and makes them available to generate novel gesture forms in the style of the respective speaker. GNetIc models were brought to application in an overall architecture for integrated speech and gesture generation. Being equipped with proper knowledge sources, i.e., communicative plans, lexicon, grammar, propositional, and imagistic knowledge, a virtual agent was enabled to autonomously explain buildings of a virtual environment using speech and gestures. By switching between the respective decision networks, the system has the ability to simulate speaker-specific gesture use. Accounting for the two-fold rationale followed in this thesis, the GNetIc model was finally evaluated in two ways. First, in comparison with empirically observed gestural behavior, the model was shown to be able to successfully approximate human use of iconic gestures, especially when capturing the characteristics of individual speakers' gesture style. Second, when brought to application in a virtual agent, the generated gestural behavior was found to be positively rated by human recipients. In particular, individualized GNetIc-generated gestures could increase the perceived quality of object descriptions. Moreover, the virtual agent itself was rated more positively in terms of verbal capability, likeability, competence, and human-likeness. 
Accordingly the results of this work provide first steps towards a more thorough understanding of iconic gesture production in humans and also on how gesture use may improve human-agent interaction.
Year
PUB-ID

Cite this

Bergmann K. The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents. Bielefeld, Germany: University; 2012.
Bergmann, K. (2012). The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents. Bielefeld, Germany: University.
Bergmann, K. (2012). The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents. Bielefeld, Germany: University.
Bergmann, K., 2012. The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents, Bielefeld, Germany: University.
K. Bergmann, The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents, Bielefeld, Germany: University, 2012.
Bergmann, K.: The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents. University, Bielefeld, Germany (2012).
Bergmann, Kirsten. The Production of Co-Speech Iconic Gestures: Empirical Study and Computational Simulation with Virtual Agents. Bielefeld, Germany: University, 2012.
Main File(s)
Access Level
OA Open Access
Last Uploaded
2012-01-20 14:02:15

This data publication is cited in the following publications:
This publication cites the following data publications:

Export

0 Marked Publications

Open Data PUB

Search this title in

Google Scholar