Structural classifiers of text types: Towards a novel model of text representation

Mehler A, Geibel P, Pustylnikov O (2007)
LDV-Forum : Zeitschrift für Computerlinguistik und Sprachtechnologie ; GLDV-Journal for Computational Linguistics and Language Technology 22(2): 51-66.

Download
Restricted mehler_geibel_pustylnikov_2007.pdf
Zeitschriftenaufsatz | Veröffentlicht | Englisch
Autor
; ;
Abstract / Bemerkung
Texts can be distinguished in terms of their content, function, structure or layout (Brinker, 1992; Bateman et al., 2001; Joachims, 2002; Power et al., 2003). These reference points do not open necessarily orthogonal perspectives on text classification. As part of explorative data analysis, text classification aims at automatically dividing sets of textual objects into classes of maximum internal homogeneity and external heterogeneity. This paper deals with classifying texts into text types whose instances serve more or less homogeneous functions. Other than mainstream approaches, which rely on the vector space model (Sebastiani, 2002) or some of its descendants (Baeza-Yates and Ribeiro-Neto, 1999) and, thus, on content-related lexical features, we solely refer to structural differentiae. That is, we explore patterns of text structure as determinants of class membership. Our starting point are tree-like text representations which induce feature vectors and tree kernels. These kernels are utilized in supervised learning based on cross-validation as a method of model selection (Hastie et al., 2001) by example of a corpus of press communication. For a subset of categories we show that classification can be performed very well by structural differentia only.
Erscheinungsjahr
Zeitschriftentitel
LDV-Forum : Zeitschrift für Computerlinguistik und Sprachtechnologie ; GLDV-Journal for Computational Linguistics and Language Technology
Band
22
Zeitschriftennummer
2
Seite
51-66
ISSN
PUB-ID

Zitieren

Mehler A, Geibel P, Pustylnikov O. Structural classifiers of text types: Towards a novel model of text representation. LDV-Forum : Zeitschrift für Computerlinguistik und Sprachtechnologie ; GLDV-Journal for Computational Linguistics and Language Technology. 2007;22(2):51-66.
Mehler, A., Geibel, P., & Pustylnikov, O. (2007). Structural classifiers of text types: Towards a novel model of text representation. LDV-Forum : Zeitschrift für Computerlinguistik und Sprachtechnologie ; GLDV-Journal for Computational Linguistics and Language Technology, 22(2), 51-66.
Mehler, A., Geibel, P., and Pustylnikov, O. (2007). Structural classifiers of text types: Towards a novel model of text representation. LDV-Forum : Zeitschrift für Computerlinguistik und Sprachtechnologie ; GLDV-Journal for Computational Linguistics and Language Technology 22, 51-66.
Mehler, A., Geibel, P., & Pustylnikov, O., 2007. Structural classifiers of text types: Towards a novel model of text representation. LDV-Forum : Zeitschrift für Computerlinguistik und Sprachtechnologie ; GLDV-Journal for Computational Linguistics and Language Technology, 22(2), p 51-66.
A. Mehler, P. Geibel, and O. Pustylnikov, “Structural classifiers of text types: Towards a novel model of text representation”, LDV-Forum : Zeitschrift für Computerlinguistik und Sprachtechnologie ; GLDV-Journal for Computational Linguistics and Language Technology, vol. 22, 2007, pp. 51-66.
Mehler, A., Geibel, P., Pustylnikov, O.: Structural classifiers of text types: Towards a novel model of text representation. LDV-Forum : Zeitschrift für Computerlinguistik und Sprachtechnologie ; GLDV-Journal for Computational Linguistics and Language Technology. 22, 51-66 (2007).
Mehler, Alexander, Geibel, Peter, and Pustylnikov, Olga. “Structural classifiers of text types: Towards a novel model of text representation”. LDV-Forum : Zeitschrift für Computerlinguistik und Sprachtechnologie ; GLDV-Journal for Computational Linguistics and Language Technology 22.2 (2007): 51-66.
Volltext(e)
Name
mehler_geibel_pustylnikov_2007.pdf
Access Level
Restricted Closed Access
Zuletzt Hochgeladen
2012-03-29T15:41:01Z

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar