MExiCo: A Library for Managing Multimodal Data Collections

Menke P, Cimiano P (2013)
In: Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Vargas-Sierra C (Ed); Procedia - Social and Behavioral Sciences, 95. Elsevier BV: 105-110.

Download
Restricted
Conference Paper | Published | English
Editor
Vargas-Sierra, Chelo
Abstract
We present MExiCo (short for "Multimodal Experiment Corpora"), a programming API and library for the management, creation and analysis of multimodal corpora and data collections. It is being created in the infrastructure project of a Collaborative Research Centre that investigates the phenomenon of alignment in communication, introduced by Pickering & Garrod (2004). Its 13 projects bring together researchers from many different fields (among them psychology, linguistics, computer science, cognitive science) who have produced a variety of data sets containing communicative phenomena. These cover multiple modalities (speech, gesture, gaze, facial expressions, etc.) and are represented and stored in different formats, and coded according to different theories. With interoperability as one of the major goals, our infrastructure project evaluated several theories and data models for corpus and data management, looking for a candidate that was able to • express microstructural entities and relations such as elements in transcripts, annotation documents, treebanks, etc.; • express macrostructural entities and relations such as the experimental setup (consisting of roles and types of participants, variables, number, type, and duration of trials, etc.) and the resources that together form a corpus or data collection (such as audio and video files, speech transcripts and annotation documents as a whole, etc.) • allow for multiple versions of data sets (e.g., for multiple annotation of the same phenomenon as a basis for agreement calculations) Our result was that none of these models (although perfectly eligible for special subsets of our data) was able to handle the entirety of our data collection. In many cases we identified the main problem being a different understanding of central terms such as „corpus“ or „transcript“. Although „corpus“ is usually defined as „a finite set of concrete linguistic utterances that serves as an empirical bases for linguistic research“ (Bußmann 1996:106), along with subsequent annotations, this definition is too narrow for our field. Even with the addition of an abstract timeline for anchoring multiple events (as in, among others, Bird & Liberman 2001, or Evert et al. 2003) we require an even more complex axis system that also supports multiple timelines (for cases where data sets are bound to multiple timelines for which no synchronisation has been defined yet), and also spatial systems (necessary for modeling, e.g., gestures, head movements, actions in dialogue games where spatial actions are of interest, as in, for instance, object arrangement games). On the basis of those theories we propose a generic data model capable of dealing with such heterogeneous data collection as present in our Collaborative Research Centre: MExiCo, which will be available to researchers in different ways: As a library to be used in console scripts, as a HTTP API that can be accessed as a web service, and, finally, as a backend of Phoibos, a web-based corpus management application (Menke & Mehler 2011, Menke & Cimiano 2012) where researchers can benefit from its functionality without being required to perform actual programming – although even this is not difficult: Being implemented in Ruby, MExiCo’s core functionality benefits from Ruby’s flexible syntax and is designed as a DSL (domain-specific language). This means that researchers can formulate queries, scripts and batch processes in an easy-to-understand language that attempts to be as close to human language as possible, with as few formal requirements of a programming language as possible.
Publishing Year
Conference
5th International Conference on Corpus Linguistics (CILC2013)
Location
Alicante, Spain
Conference Date
2013-03-14 – 2013-03-16
ISSN
PUB-ID

Cite this

Menke P, Cimiano P. MExiCo: A Library for Managing Multimodal Data Collections. In: Vargas-Sierra C, ed. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Procedia - Social and Behavioral Sciences. Vol 95. Elsevier BV; 2013: 105-110.
Menke, P., & Cimiano, P. (2013). MExiCo: A Library for Managing Multimodal Data Collections. In C. Vargas-Sierra (Ed.), Procedia - Social and Behavioral Sciences: Vol. 95. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013) (pp. 105-110). Elsevier BV. doi:10.1016/j.sbspro.2013.10.628
Menke, P., and Cimiano, P. (2013). “MExiCo: A Library for Managing Multimodal Data Collections” in Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013), Vargas-Sierra, C. ed. Procedia - Social and Behavioral Sciences, vol. 95, (Elsevier BV), 105-110.
Menke, P., & Cimiano, P., 2013. MExiCo: A Library for Managing Multimodal Data Collections. In C. Vargas-Sierra, ed. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Procedia - Social and Behavioral Sciences. no.95 Elsevier BV, pp. 105-110.
P. Menke and P. Cimiano, “MExiCo: A Library for Managing Multimodal Data Collections”, Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013), C. Vargas-Sierra, ed., Procedia - Social and Behavioral Sciences, vol. 95, Elsevier BV, 2013, pp.105-110.
Menke, P., Cimiano, P.: MExiCo: A Library for Managing Multimodal Data Collections. In: Vargas-Sierra, C. (ed.) Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Procedia - Social and Behavioral Sciences. 95, p. 105-110. Elsevier BV (2013).
Menke, Peter, and Cimiano, Philipp. “MExiCo: A Library for Managing Multimodal Data Collections”. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Ed. Chelo Vargas-Sierra. Elsevier BV, 2013.Vol. 95. Procedia - Social and Behavioral Sciences. 105-110.
Main File(s)
Access Level
Restricted Closed Access
Last Uploaded
2017-05-09T11:56:14Z

This data publication is cited in the following publications:
This publication cites the following data publications:
Confirmation
Letter of Confirmation
Letter of Confirmation

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Search this title in

Google Scholar