MExiCo: A Library for Managing Multimodal Data Collections

Menke, Peter; Cimiano, Philipp

MExiCo: A Library for Managing Multimodal Data Collections

Menke P, Cimiano P (2013)
In: Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Vargas-Sierra C (Ed); Procedia - Social and Behavioral Sciences, 95. Elsevier BV: 105-110.

Konferenzbeitrag | Veröffentlicht | Englisch

Download

1-s2.0-S1877042813041487-main.pdf

DOI

https://doi.org/10.1016/j.sbspro.2013.10.628

Autor*in

Menke, Peter^UniBi; Cimiano, Philipp^UniBi

Herausgeber*in

Vargas-Sierra, Chelo

Einrichtung

Center of Excellence - Cognitive Interaction Technology CITEC
SFB 673 Alignment in Communication > X1 - Multimodal alignment corpora: ...
Technische Fakultät > AG Semantische Datenbanken
Fakultät für Linguistik und Literaturwissenschaft

Abstract / Bemerkung

We present MExiCo (short for "Multimodal Experiment Corpora"), a programming API and library for the management, creation and analysis of multimodal corpora and data collections. It is being created in the infrastructure project of a Collaborative Research Centre that investigates the phenomenon of alignment in communication, introduced by Pickering & Garrod (2004). Its 13 projects bring together researchers from many different fields (among them psychology, linguistics, computer science, cognitive science) who have produced a variety of data sets containing communicative phenomena. These cover multiple modalities (speech, gesture, gaze, facial expressions, etc.) and are represented and stored in different formats, and coded according to different theories. With interoperability as one of the major goals, our infrastructure project evaluated several theories and data models for corpus and data management, looking for a candidate that was able to • express microstructural entities and relations such as elements in transcripts, annotation documents, treebanks, etc.; • express macrostructural entities and relations such as the experimental setup (consisting of roles and types of participants, variables, number, type, and duration of trials, etc.) and the resources that together form a corpus or data collection (such as audio and video files, speech transcripts and annotation documents as a whole, etc.) • allow for multiple versions of data sets (e.g., for multiple annotation of the same phenomenon as a basis for agreement calculations) Our result was that none of these models (although perfectly eligible for special subsets of our data) was able to handle the entirety of our data collection. In many cases we identified the main problem being a different understanding of central terms such as „corpus“ or „transcript“. Although „corpus“ is usually defined as „a finite set of concrete linguistic utterances that serves as an empirical bases for linguistic research“ (Bußmann 1996:106), along with subsequent annotations, this definition is too narrow for our field. Even with the addition of an abstract timeline for anchoring multiple events (as in, among others, Bird & Liberman 2001, or Evert et al. 2003) we require an even more complex axis system that also supports multiple timelines (for cases where data sets are bound to multiple timelines for which no synchronisation has been defined yet), and also spatial systems (necessary for modeling, e.g., gestures, head movements, actions in dialogue games where spatial actions are of interest, as in, for instance, object arrangement games). On the basis of those theories we propose a generic data model capable of dealing with such heterogeneous data collection as present in our Collaborative Research Centre: MExiCo, which will be available to researchers in different ways: As a library to be used in console scripts, as a HTTP API that can be accessed as a web service, and, finally, as a backend of Phoibos, a web-based corpus management application (Menke & Mehler 2011, Menke & Cimiano 2012) where researchers can benefit from its functionality without being required to perform actual programming – although even this is not difficult: Being implemented in Ruby, MExiCo’s core functionality benefits from Ruby’s flexible syntax and is designed as a DSL (domain-specific language). This means that researchers can formulate queries, scripts and batch processes in an easy-to-understand language that attempts to be as close to human language as possible, with as few formal requirements of a programming language as possible.

Stichworte

multimodality; corpus design; data management; sustainability

Erscheinungsjahr

2013

Titel des Konferenzbandes

Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013)

Serien- oder Zeitschriftentitel

Procedia - Social and Behavioral Sciences

Band

Seite(n)

105-110

Konferenz

5th International Conference on Corpus Linguistics (CILC2013)

Konferenzort

Alicante, Spain

Konferenzdatum

2013-03-14 – 2013-03-16

ISSN

1877-0428

Page URI

https://pub.uni-bielefeld.de/record/2561892

Zitieren

Menke P, Cimiano P. MExiCo: A Library for Managing Multimodal Data Collections. In: Vargas-Sierra C, ed. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Procedia - Social and Behavioral Sciences. Vol 95. Elsevier BV; 2013: 105-110.

Menke, P., & Cimiano, P. (2013). MExiCo: A Library for Managing Multimodal Data Collections. In C. Vargas-Sierra (Ed.), Procedia - Social and Behavioral Sciences: Vol. 95. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013) (pp. 105-110). Elsevier BV. doi:10.1016/j.sbspro.2013.10.628

Menke, Peter, and Cimiano, Philipp. 2013. “MExiCo: A Library for Managing Multimodal Data Collections”. In Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013), ed. Chelo Vargas-Sierra, 95:105-110. Procedia - Social and Behavioral Sciences. Elsevier BV.

Menke, P., and Cimiano, P. (2013). “MExiCo: A Library for Managing Multimodal Data Collections” in Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013), Vargas-Sierra, C. ed. Procedia - Social and Behavioral Sciences, vol. 95, (Elsevier BV), 105-110.

Menke, P., & Cimiano, P., 2013. MExiCo: A Library for Managing Multimodal Data Collections. In C. Vargas-Sierra, ed. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Procedia - Social and Behavioral Sciences. no.95 Elsevier BV, pp. 105-110.

P. Menke and P. Cimiano, “MExiCo: A Library for Managing Multimodal Data Collections”, Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013), C. Vargas-Sierra, ed., Procedia - Social and Behavioral Sciences, vol. 95, Elsevier BV, 2013, pp.105-110.

Menke, P., Cimiano, P.: MExiCo: A Library for Managing Multimodal Data Collections. In: Vargas-Sierra, C. (ed.) Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Procedia - Social and Behavioral Sciences. 95, p. 105-110. Elsevier BV (2013).

Menke, Peter, and Cimiano, Philipp. “MExiCo: A Library for Managing Multimodal Data Collections”. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Ed. Chelo Vargas-Sierra. Elsevier BV, 2013.Vol. 95. Procedia - Social and Behavioral Sciences. 105-110.

Volltext(e)

Name

1-s2.0-S1877042813041487-main.pdf

Access Level

Closed Access

Zuletzt Hochgeladen

2019-09-06T09:18:11Z

MD5 Prüfsumme

e9f4437149f1fc3a972a994d526790df