Active data selection in supervised and unsupervised learning

Hasenjäger M (2000)
Bielefeld: Bielefeld University.

Download
OA
Bielefeld Dissertation | English
Author
Supervisor
Ritter, Helge
Abstract
In the context of computer science, learning is applied in situations that are so complex that conventional programming techniques are either unavailable or not practical. But often empirical knowledge on the phenomenon/process under consideration is available in the form of data from repeated measurements. Learning then means extracting the basic regularities from these empirical data and thus can be seen as building an abstraction of the phenomenon that yields a complete and robust description of its interesting aspects. The success of the learning process depends on a number of factors such as the concrete form of the learning system, the procedure used for learning and, finally, the data that are at the learner's disposal. It is this last point of data selection that is the main focus of this thesis. In general, the data are selected at random in such a way that they capture the interesting aspects of the phenomenon. This is not necessarily the most efficient way of data acquisition, since the data selection procedure does not receive feedback from the learner. The data may therefore not be in tune with his current state of knowledge and the learning process is not as efficient as it could be. In this thesis, we discuss a new paradigm for learning that aims at improving the efficiency of neural network training procedures: active learning. Here, the learner is enabled to make use of the information that is already available to select those training data that he expects to be most informative. In this case, the learner is no longer a passive recipient of information, but takes an active role in the selection of the training data. After a review of the state of the art in active learning, we turn to active learning in binary classification tasks. Here, we study in detail an approach to the problem that is based on concepts from information theory. We then develop a new heuristic algorithm for data selection in local models, a class of learners that up to now has not been considered in this context. Finally, we extend the area of application of active learning techniques to unsupervised learning: we propose an algorithm for active data selection in topographic pairwise clustering that is founded on statistical decision theory. Our results show that active learning may be computationally expensive but that, in comparison to random data selection, these active strategies lead to a considerable reduction in the number of necessary training samples. This makes active data selection a viable alternative, especially when the cost of data acquisition is high.
Year
PUB-ID

Cite this

Hasenjäger M. Active data selection in supervised and unsupervised learning. Bielefeld: Bielefeld University; 2000.
Hasenjäger, M. (2000). Active data selection in supervised and unsupervised learning. Bielefeld: Bielefeld University.
Hasenjäger, M. (2000). Active data selection in supervised and unsupervised learning. Bielefeld: Bielefeld University.
Hasenjäger, M., 2000. Active data selection in supervised and unsupervised learning, Bielefeld: Bielefeld University.
M. Hasenjäger, Active data selection in supervised and unsupervised learning, Bielefeld: Bielefeld University, 2000.
Hasenjäger, M.: Active data selection in supervised and unsupervised learning. Bielefeld University, Bielefeld (2000).
Hasenjäger, Martina. Active data selection in supervised and unsupervised learning. Bielefeld: Bielefeld University, 2000.
Main File(s)
File Name
Access Level
OA Open Access

This data publication is cited in the following publications:
This publication cites the following data publications:

Export

0 Marked Publications

Open Data PUB

Search this title in

Google Scholar