Exploiting Wikipedia for cross-lingual and multilingual information retrieval

Sorg P, Cimiano P (2012)
Data & Knowledge Engineering 74: 26-45.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Abstract / Bemerkung
In this article we show how Wikipedia as a multilingual knowledge resource can be exploited for Cross-Language and Multilingual Information Retrieval (CLIR/MLIR). We describe an approach we call Cross-Language Explicit Semantic Analysis (CL-ESA) which indexes documents with respect to explicit interlingual concepts. These concepts are considered as interlingual and universal and in our case correspond either to Wikipedia articles or categories. Each concept is associated to a text signature in each language which can be used to estimate language-specific term distributions for each concept. This knowledge can then be used to calculate the strength of association between a term and a concept which is used to map documents into the concept space. With CL-ESA we are thus moving from a Bag-Of-Words model to a Bag-Of-Concepts model that allows language-independent document representations in the vector space spanned by interlingual and universal concepts. We show how different vector-based retrieval models and term weighting strategies can be used in conjunction with CL-ESA and experimentally analyze the performance of the different choices. We evaluate the approach on a mate retrieval task on two datasets: JRC-Acquis and Multext. We show that in the MLIR settings, CL-ESA benefits from a certain level of abstraction in the sense that using categories instead of articles as in the original ESA model delivers better results. (C) 2012 Elsevier B.V. All rights reserved.
Stichworte
Cross-Lingual Information Retrieval; Social Web; Wikipedia; Concept-based Information; Retrieval
Erscheinungsjahr
2012
Zeitschriftentitel
Data & Knowledge Engineering
Band
74
Seite(n)
26-45
ISSN
0169-023X
Page URI
https://pub.uni-bielefeld.de/record/2518076

Zitieren

Sorg P, Cimiano P. Exploiting Wikipedia for cross-lingual and multilingual information retrieval. Data & Knowledge Engineering. 2012;74:26-45.
Sorg, P., & Cimiano, P. (2012). Exploiting Wikipedia for cross-lingual and multilingual information retrieval. Data & Knowledge Engineering, 74, 26-45. doi:10.1016/j.datak.2012.02.003
Sorg, P., and Cimiano, Philipp. 2012. “Exploiting Wikipedia for cross-lingual and multilingual information retrieval”. Data & Knowledge Engineering 74: 26-45.
Sorg, P., and Cimiano, P. (2012). Exploiting Wikipedia for cross-lingual and multilingual information retrieval. Data & Knowledge Engineering 74, 26-45.
Sorg, P., & Cimiano, P., 2012. Exploiting Wikipedia for cross-lingual and multilingual information retrieval. Data & Knowledge Engineering, 74, p 26-45.
P. Sorg and P. Cimiano, “Exploiting Wikipedia for cross-lingual and multilingual information retrieval”, Data & Knowledge Engineering, vol. 74, 2012, pp. 26-45.
Sorg, P., Cimiano, P.: Exploiting Wikipedia for cross-lingual and multilingual information retrieval. Data & Knowledge Engineering. 74, 26-45 (2012).
Sorg, P., and Cimiano, Philipp. “Exploiting Wikipedia for cross-lingual and multilingual information retrieval”. Data & Knowledge Engineering 74 (2012): 26-45.
Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®
Suchen in

Google Scholar