MExiCo: A Library for Managing Multimodal Data Collections

Menke P, Cimiano P (2013)
In: Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Vargas-Sierra C (Ed); Procedia - Social and Behavioral Sciences, 95. Elsevier BV: 105-110.

Konferenzbeitrag | Veröffentlicht | Englisch
 
Download
Restricted 1-s2.0-S1877042813041487-main.pdf
Herausgeber*in
Vargas-Sierra, Chelo
Abstract / Bemerkung
We present MExiCo (short for "Multimodal Experiment Corpora"), a programming API and library for the management, creation and analysis of multimodal corpora and data collections. It is being created in the infrastructure project of a Collaborative Research Centre that investigates the phenomenon of alignment in communication, introduced by Pickering & Garrod (2004). Its 13 projects bring together researchers from many different fields (among them psychology, linguistics, computer science, cognitive science) who have produced a variety of data sets containing communicative phenomena. These cover multiple modalities (speech, gesture, gaze, facial expressions, etc.) and are represented and stored in different formats, and coded according to different theories. With interoperability as one of the major goals, our infrastructure project evaluated several theories and data models for corpus and data management, looking for a candidate that was able to • express microstructural entities and relations such as elements in transcripts, annotation documents, treebanks, etc.; • express macrostructural entities and relations such as the experimental setup (consisting of roles and types of participants, variables, number, type, and duration of trials, etc.) and the resources that together form a corpus or data collection (such as audio and video files, speech transcripts and annotation documents as a whole, etc.) • allow for multiple versions of data sets (e.g., for multiple annotation of the same phenomenon as a basis for agreement calculations) Our result was that none of these models (although perfectly eligible for special subsets of our data) was able to handle the entirety of our data collection. In many cases we identified the main problem being a different understanding of central terms such as „corpus“ or „transcript“. Although „corpus“ is usually defined as „a finite set of concrete linguistic utterances that serves as an empirical bases for linguistic research“ (Bußmann 1996:106), along with subsequent annotations, this definition is too narrow for our field. Even with the addition of an abstract timeline for anchoring multiple events (as in, among others, Bird & Liberman 2001, or Evert et al. 2003) we require an even more complex axis system that also supports multiple timelines (for cases where data sets are bound to multiple timelines for which no synchronisation has been defined yet), and also spatial systems (necessary for modeling, e.g., gestures, head movements, actions in dialogue games where spatial actions are of interest, as in, for instance, object arrangement games). On the basis of those theories we propose a generic data model capable of dealing with such heterogeneous data collection as present in our Collaborative Research Centre: MExiCo, which will be available to researchers in different ways: As a library to be used in console scripts, as a HTTP API that can be accessed as a web service, and, finally, as a backend of Phoibos, a web-based corpus management application (Menke & Mehler 2011, Menke & Cimiano 2012) where researchers can benefit from its functionality without being required to perform actual programming – although even this is not difficult: Being implemented in Ruby, MExiCo’s core functionality benefits from Ruby’s flexible syntax and is designed as a DSL (domain-specific language). This means that researchers can formulate queries, scripts and batch processes in an easy-to-understand language that attempts to be as close to human language as possible, with as few formal requirements of a programming language as possible.
Stichworte
multimodality; corpus design; data management; sustainability
Erscheinungsjahr
2013
Titel des Konferenzbandes
Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013)
Serien- oder Zeitschriftentitel
Procedia - Social and Behavioral Sciences
Band
95
Seite(n)
105-110
Konferenz
5th International Conference on Corpus Linguistics (CILC2013)
Konferenzort
Alicante, Spain
Konferenzdatum
2013-03-14 – 2013-03-16
ISSN
1877-0428
Page URI
https://pub.uni-bielefeld.de/record/2561892

Zitieren

Menke P, Cimiano P. MExiCo: A Library for Managing Multimodal Data Collections. In: Vargas-Sierra C, ed. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Procedia - Social and Behavioral Sciences. Vol 95. Elsevier BV; 2013: 105-110.
Menke, P., & Cimiano, P. (2013). MExiCo: A Library for Managing Multimodal Data Collections. In C. Vargas-Sierra (Ed.), Procedia - Social and Behavioral Sciences: Vol. 95. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013) (pp. 105-110). Elsevier BV. doi:10.1016/j.sbspro.2013.10.628
Menke, Peter, and Cimiano, Philipp. 2013. “MExiCo: A Library for Managing Multimodal Data Collections”. In Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013), ed. Chelo Vargas-Sierra, 95:105-110. Procedia - Social and Behavioral Sciences. Elsevier BV.
Menke, P., and Cimiano, P. (2013). “MExiCo: A Library for Managing Multimodal Data Collections” in Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013), Vargas-Sierra, C. ed. Procedia - Social and Behavioral Sciences, vol. 95, (Elsevier BV), 105-110.
Menke, P., & Cimiano, P., 2013. MExiCo: A Library for Managing Multimodal Data Collections. In C. Vargas-Sierra, ed. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Procedia - Social and Behavioral Sciences. no.95 Elsevier BV, pp. 105-110.
P. Menke and P. Cimiano, “MExiCo: A Library for Managing Multimodal Data Collections”, Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013), C. Vargas-Sierra, ed., Procedia - Social and Behavioral Sciences, vol. 95, Elsevier BV, 2013, pp.105-110.
Menke, P., Cimiano, P.: MExiCo: A Library for Managing Multimodal Data Collections. In: Vargas-Sierra, C. (ed.) Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Procedia - Social and Behavioral Sciences. 95, p. 105-110. Elsevier BV (2013).
Menke, Peter, and Cimiano, Philipp. “MExiCo: A Library for Managing Multimodal Data Collections”. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Ed. Chelo Vargas-Sierra. Elsevier BV, 2013.Vol. 95. Procedia - Social and Behavioral Sciences. 105-110.
Volltext(e)
Name
1-s2.0-S1877042813041487-main.pdf
Access Level
Restricted Closed Access
Zuletzt Hochgeladen
2019-09-06T09:18:11Z
MD5 Prüfsumme
e9f4437149f1fc3a972a994d526790df


Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®
Suchen in

Google Scholar