Relevance learning for redundant features
Pfannschmidt L (2021)
Bielefeld: Universität Bielefeld.
Bielefelder E-Dissertation | Englisch
Download
thesis.pdf
960.59 KB
Autor*in
Gutachter*in / Betreuer*in
Einrichtung
Abstract / Bemerkung
Feature selection is a widely used strategy in machine learning
for the reduction of feature sets to their relevant essence to
improve predictions and performance. It is also employed for
knowledge discovery in applied disciplines such as biology and
medicine to find potentially causal factors. But machine learning
models often do not represent a unique solution to a given problem,
especially in high dimensional settings where redundant
factors are likely and spurious correlations exist.
Basing decisions about causal elements on feature selection is therefore inaccurate or wrong when not considering the presence of redundant but also relevant features. Most existing selection algorithms are specifically removing redundancies and not suitable for the task of all-relevant feature selection, or they require careful parametrization and are hard to interpret, which makes them difficult to use.
This thesis is focused on feature selection methods for the analytical use case to facilitate understanding of potential causal factors, for linear and non-linear problems. We propose several new algorithms and methods for all-relevant feature selection to improve knowledge discovery, enabled by statistical methods to improve the accuracy of existing solutions and allow the differentiation between different types of relevance. Furthermore, we offer a new heuristic to automatically group related features together, and we analyse the definition of relevance in the context of privileged information, where data is only available in training.
We also introduce software implementations, which were specifically designed to be modular, efficient and able to parallelize for applications in high dimensional problems. The methods and implementations were evaluated on a wide range of synthetic and real datasets to show their performance in comparison with existing algorithms.
Basing decisions about causal elements on feature selection is therefore inaccurate or wrong when not considering the presence of redundant but also relevant features. Most existing selection algorithms are specifically removing redundancies and not suitable for the task of all-relevant feature selection, or they require careful parametrization and are hard to interpret, which makes them difficult to use.
This thesis is focused on feature selection methods for the analytical use case to facilitate understanding of potential causal factors, for linear and non-linear problems. We propose several new algorithms and methods for all-relevant feature selection to improve knowledge discovery, enabled by statistical methods to improve the accuracy of existing solutions and allow the differentiation between different types of relevance. Furthermore, we offer a new heuristic to automatically group related features together, and we analyse the definition of relevance in the context of privileged information, where data is only available in training.
We also introduce software implementations, which were specifically designed to be modular, efficient and able to parallelize for applications in high dimensional problems. The methods and implementations were evaluated on a wide range of synthetic and real datasets to show their performance in comparison with existing algorithms.
Jahr
2021
Seite(n)
120
Urheberrecht / Lizenzen
Page URI
https://pub.uni-bielefeld.de/record/2959861
Zitieren
Pfannschmidt L. Relevance learning for redundant features. Bielefeld: Universität Bielefeld; 2021.
Pfannschmidt, L. (2021). Relevance learning for redundant features. Bielefeld: Universität Bielefeld. https://doi.org/10.4119/unibi/2959861
Pfannschmidt, Lukas. 2021. Relevance learning for redundant features. Bielefeld: Universität Bielefeld.
Pfannschmidt, L. (2021). Relevance learning for redundant features. Bielefeld: Universität Bielefeld.
Pfannschmidt, L., 2021. Relevance learning for redundant features, Bielefeld: Universität Bielefeld.
L. Pfannschmidt, Relevance learning for redundant features, Bielefeld: Universität Bielefeld, 2021.
Pfannschmidt, L.: Relevance learning for redundant features. Universität Bielefeld, Bielefeld (2021).
Pfannschmidt, Lukas. Relevance learning for redundant features. Bielefeld: Universität Bielefeld, 2021.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Volltext(e)
Name
thesis.pdf
960.59 KB
Access Level
Open Access
Zuletzt Hochgeladen
2021-12-12T18:47:01Z
MD5 Prüfsumme
5f9416f168e0ebca02f9f0bbe51df3d0