Learning with high dimensional data and preprocessing in non-stationary environments
Heusinger M (2023)
Bielefeld: Universität Bielefeld.
Bielefelder E-Dissertation | Englisch
Download
Autor*in
Heusinger, Moritz
Gutachter*in / Betreuer*in
Einrichtung
Abstract / Bemerkung
The internet of things generates huge amounts of multidimensional
sensor readings. The analysis of these high dimensional data is chal-
lenging and not sufficiently addressed. In this thesis, methods to
analyze such data are developed, evaluated and discussed.
There are currently several open issues in the context of stream
analysis. These issues include the adaption to Concept Drift, which is a
shift in the data distribution and needs to be handled by classification
algorithms. Furthermore, real world data is often high dimensional,
while current research focuses mainly on low dimensional problems
which do not represent the real applications well. Especially, more
complex stream algorithm become very slow when operating in higher
dimensional spaces. Finally, real world data is often not linear separa-
ble. For this scenario, the field of kernel learning exists, which allows
to transform the data into a kernel space, where it becomes linear
separable. In the streaming context, this field has not gained much
attention yet.
We address the mentioned problems by three major contributions:
__A__ In the first part of this thesis, we introduce sparse prototype
based algorithms, which can adapt fast to Concept Drift by using
momentum-based gradient descent techniques. The algorithms
outperform their base versions on synthetic and real world prob-
lems as well as being competitive to other state-of-the-art al-
gorithms, while having the advantage of interpretability and
sparsity. Furthermore, one of these algorithms is combined with
a statistical test to handle a greater variety of drifts.
__B__ To reduce the complexity of high dimensional data streams, the
Random Projection technique is analyzed in non-stationary envi-
ronments. It is shown, that the Johnson-Lindenstrauss Lemma
also holds for stream classification tasks. Further, performance
comparisons of different classifiers on the projected and the orig-
inal space are provided, and it is shown how Random Projection
can help tackle problems of non-stationary environments. To do
so, a method is proposed, which allows to transform a problem
of changing dimensionality into a distribution change, which can
be handled by Concept Drift detectors.
__C__ Besides prototype vectors, sparse representations can also be
obtained by creating coresets. Thus, the final Chapter provides
techniques which use coresets to maintain a Minimum Enclos-
ing Ball in stream settings. These methods have the advantage
to also work on non-linear data by choosing a suitable kernel.
Specifically, a stream coreset based classifier is proposed, which
performs well on a variety of tested streams. While the classi-
fier has the downside of using different balls for each class, a
viicombined multiclass Core Vector Machine on data streams is
provided, which performs better on multiclass problems. Finally,
it is shown, that coresets can be used to detect Concept Drift,
outperforming many state-of-the-art algorithms with a downside
of higher runtime.
Jahr
2023
Seite(n)
149
Urheberrecht / Lizenzen
Page URI
https://pub.uni-bielefeld.de/record/2981005
Zitieren
Heusinger M. Learning with high dimensional data and preprocessing in non-stationary environments. Bielefeld: Universität Bielefeld; 2023.
Heusinger, M. (2023). Learning with high dimensional data and preprocessing in non-stationary environments. Bielefeld: Universität Bielefeld. https://doi.org/10.4119/unibi/2981005
Heusinger, Moritz. 2023. Learning with high dimensional data and preprocessing in non-stationary environments. Bielefeld: Universität Bielefeld.
Heusinger, M. (2023). Learning with high dimensional data and preprocessing in non-stationary environments. Bielefeld: Universität Bielefeld.
Heusinger, M., 2023. Learning with high dimensional data and preprocessing in non-stationary environments, Bielefeld: Universität Bielefeld.
M. Heusinger, Learning with high dimensional data and preprocessing in non-stationary environments, Bielefeld: Universität Bielefeld, 2023.
Heusinger, M.: Learning with high dimensional data and preprocessing in non-stationary environments. Universität Bielefeld, Bielefeld (2023).
Heusinger, Moritz. Learning with high dimensional data and preprocessing in non-stationary environments. Bielefeld: Universität Bielefeld, 2023.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Volltext(e)
Name
Access Level
Open Access
Zuletzt Hochgeladen
2023-07-18T13:46:23Z
MD5 Prüfsumme
1175b32d8161f93df3bd529ab8d1c4b9