Dissimilarity-based learning for complex data

Mokbel B (2016)
Bielefeld: Universität Bielefeld.

Download
OA
Bielefeld Dissertation | English
Supervisor
Hammer, BarbaraUniBi ; Sperduti, Alessandro
Abstract
Rapid advances of information technology have entailed an ever increasing amount of digital data, which raises the demand for powerful data mining and machine learning tools. Due to modern methods for gathering, preprocessing, and storing information, the collected data become more and more complex: a simple vectorial representation, and comparison in terms of the Euclidean distance is often no longer appropriate to capture relevant aspects in the data. Instead, problem-adapted similarity or dissimilarity measures refer directly to the given encoding scheme, allowing to treat information constituents in a relational manner. This thesis addresses several challenges of complex data sets and their representation in the context of machine learning. The goal is to investigate possible remedies, and propose corresponding improvements of established methods, accompanied by examples from various application domains. The main scientific contributions are the following: (I) Many well-established machine learning techniques are restricted to vectorial input data only. Therefore, we propose the extension of two popular prototype-based clustering and classification algorithms to non-negative symmetric dissimilarity matrices. (II) Some dissimilarity measures incorporate a fine-grained parameterization, which allows to configure the comparison scheme with respect to the given data and the problem at hand. However, finding adequate parameters can be hard or even impossible for human users, due to the intricate effects of parameter changes and the lack of detailed prior knowledge. Therefore, we propose to integrate a metric learning scheme into a dissimilarity-based classifier, which can automatically adapt the parameters of a sequence alignment measure according to the given classification task. (III) A valuable instrument to make complex data sets accessible are dimensionality reduction techniques, which can provide an approximate low-dimensional embedding of the given data set, and, as a special case, a planar map to visualize the data's neighborhood structure. To assess the reliability of such an embedding, we propose the extension of a well-known quality measure to enable a fine-grained, tractable quantitative analysis, which can be integrated into a visualization. This tool can also help to compare different dissimilarity measures (and parameter settings), if ground truth is not available. (IV) All techniques are demonstrated on real-world examples from a variety of application domains, including bioinformatics, motion capturing, music, and education.
Year
PUB-ID

Cite this

Mokbel B. Dissimilarity-based learning for complex data. Bielefeld: Universität Bielefeld; 2016.
Mokbel, B. (2016). Dissimilarity-based learning for complex data. Bielefeld: Universität Bielefeld.
Mokbel, B. (2016). Dissimilarity-based learning for complex data. Bielefeld: Universität Bielefeld.
Mokbel, B., 2016. Dissimilarity-based learning for complex data, Bielefeld: Universität Bielefeld.
B. Mokbel, Dissimilarity-based learning for complex data, Bielefeld: Universität Bielefeld, 2016.
Mokbel, B.: Dissimilarity-based learning for complex data. Universität Bielefeld, Bielefeld (2016).
Mokbel, Bassam. Dissimilarity-based learning for complex data. Bielefeld: Universität Bielefeld, 2016.
Main File(s)
Access Level
OA Open Access
Last Uploaded
2016-09-29T11:59:57Z
MD5 Checksum
d9e2ca8a051e166e8bc33a714d929bb9

This data publication is cited in the following publications:
This publication cites the following data publications:

Export

0 Marked Publications

Open Data PUB

Search this title in

Google Scholar