Multilevel Integration of Vision and Speech Understanding Using Bayseian Networks

Wachsmuth S, Brandt-Pook H, Socher G, Kummert F, Sagerer G (1999)
In: Computer Vision Systems. Christensen HI (Ed); Lecture Notes in Computer Science, 1542. Berlin, Heidelberg: Springer: 231-254.

Download
Es wurde kein Volltext hochgeladen. Nur Publikationsnachweis!
Konferenzbeitrag | Veröffentlicht | Englisch
Herausgeber
Abstract / Bemerkung
The interaction of image and speech processing is a crucial property of multimedia systems. Classical systems using inferences on pure qualitative high level descriptions miss a lot of information when concerned with erroneous, vague, or incomplete data. We propose a new architecture that integrates various levels of processing by using multiple representations of the visually observed scene. They are vertically connected by Bayesian networks in order to find the most plausible interpretation of the scene. The interpretation of a spoken utterance naming an object in the visually observed scene is modeled as another partial representation of the scene. Using this concept, the key problem is the identification of the verbally specified object instances in the visually observed scene. Therefore, a Bayesian network is generated dynamically from the spoken utterance and the visual scene representation. In this network spatial knowledge as well as knowledge extracted from psycholinguistic experiments is coded. First results show the robustness of our approach.
Erscheinungsjahr
Titel des Konferenzbandes
Computer Vision Systems
Band
1542
Seite
231-254
Konferenz
1st International Conference on Computer Vision Systems
Konferenzort
Las Palmas, Gran Canaria, Spain
Konferenzdatum
1999-01-13 – 1999-01-15
PUB-ID

Zitieren

Wachsmuth S, Brandt-Pook H, Socher G, Kummert F, Sagerer G. Multilevel Integration of Vision and Speech Understanding Using Bayseian Networks. In: Christensen HI, ed. Computer Vision Systems. Lecture Notes in Computer Science. Vol 1542. Berlin, Heidelberg: Springer; 1999: 231-254.
Wachsmuth, S., Brandt-Pook, H., Socher, G., Kummert, F., & Sagerer, G. (1999). Multilevel Integration of Vision and Speech Understanding Using Bayseian Networks. In H. I. Christensen (Ed.), Lecture Notes in Computer Science: Vol. 1542. Computer Vision Systems (pp. 231-254). Berlin, Heidelberg: Springer.
Wachsmuth, S., Brandt-Pook, H., Socher, G., Kummert, F., and Sagerer, G. (1999). “Multilevel Integration of Vision and Speech Understanding Using Bayseian Networks” in Computer Vision Systems, Christensen, H. I. ed. Lecture Notes in Computer Science, vol. 1542, (Berlin, Heidelberg: Springer), 231-254.
Wachsmuth, S., et al., 1999. Multilevel Integration of Vision and Speech Understanding Using Bayseian Networks. In H. I. Christensen, ed. Computer Vision Systems. Lecture Notes in Computer Science. no.1542 Berlin, Heidelberg: Springer, pp. 231-254.
S. Wachsmuth, et al., “Multilevel Integration of Vision and Speech Understanding Using Bayseian Networks”, Computer Vision Systems, H.I. Christensen, ed., Lecture Notes in Computer Science, vol. 1542, Berlin, Heidelberg: Springer, 1999, pp.231-254.
Wachsmuth, S., Brandt-Pook, H., Socher, G., Kummert, F., Sagerer, G.: Multilevel Integration of Vision and Speech Understanding Using Bayseian Networks. In: Christensen, H.I. (ed.) Computer Vision Systems. Lecture Notes in Computer Science. 1542, p. 231-254. Springer, Berlin, Heidelberg (1999).
Wachsmuth, Sven, Brandt-Pook, Hans, Socher, Gudrun, Kummert, Franz, and Sagerer, Gerhard. “Multilevel Integration of Vision and Speech Understanding Using Bayseian Networks”. Computer Vision Systems. Ed. Henrik I. Christensen. Berlin, Heidelberg: Springer, 1999.Vol. 1542. Lecture Notes in Computer Science. 231-254.
Link(s) zu Volltext(en)
Access Level
Restricted Closed Access

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar
ISBN Suche