Dealing with Inaccurate and Incomplete Labels in Industrial Streaming Data
Castellani A (2024)
Bielefeld: Universität Bielefeld.
Bielefelder E-Dissertation | Englisch
Download
Castellani_thesis_full.pdf
14.27 MB
Autor*in
Gutachter*in / Betreuer*in
Hammer, BarbaraUniBi ;
Schmitt, Sebastian;
Alippi, Cesare
Abstract / Bemerkung
The pressure to increase the energetic efficiency of industrial facilities has led to a strong
increase in the number of installed measurement sensors. These collect large volumes
of data that need to be processed and analyzed. As manual data processing methods are
not appropriate due to the sheer amount of data, automated and intelligent solutions are
needed. Machine learning techniques are a viable option for processing large volumes
of data and are capable to capture complex relationships within it. However, obtaining
meaningfully annotated data is a real challenge and typically incurs large costs. Especially,
in an industrial setting where the generic data type is streaming data which is constantly
evolving. Thus, devising machine learning models which are able to perform a desired task
in industrial environments with few labelled data samples and drifting data features poses a
severe challenge. In this thesis, we will address two main technical challenges in the field of
analyzing industrial streaming data: (1) how to efficiently train models with only partially
labeled data, and (2) how to train models when a sizable fraction of the label information
is not correct or is changing over time. We propose several strategies how to deal with
these questions and evaluate their performance on stationary and non-stationary benchmark
data sets, as well as real-world industrial application data. As one central aspect of the
approaches, we propose to use constrained embedding representations for the raw input
data. These representations are shown to be efficient for dealing with limited annotated
data by analysis of the labeled and unlabeled data based on similarities in the embedding
space. They allow for robust semi-supervised training of deep neural networks in the
presence of label noise, and even gradually correct the mislabeled samples during training.
Similarly, connecting these latent representations to a network performing predefined tasks
is shown to be useful for accurate concept drift detection. Another core aspect of these
approaches is their capability to handle sparsely labeled data in streaming environments.
By propagating the available labels to unlabeled samples, based on their proximity in the
embedding space and the time of arrival of the labels, we can successfully train accurate
models with incomplete and delayed labels in resource-constrained settings. Our work
shows that the proposed methods are very effective for analyzing streaming data with sparse
and even incorrect or delayed labels, as well as concept drift. We apply our methods in
real-world industrial data in different tasks, such as robust anomaly detection with few
labeled samples, predictive modeling of industrial machinery with presence of label noise
and delayed labels, and also semi-supervised concept drift detection.
Jahr
2024
Seite(n)
144
Urheberrecht / Lizenzen
Page URI
https://pub.uni-bielefeld.de/record/2985975
Zitieren
Castellani A. Dealing with Inaccurate and Incomplete Labels in Industrial Streaming Data. Bielefeld: Universität Bielefeld; 2024.
Castellani, A. (2024). Dealing with Inaccurate and Incomplete Labels in Industrial Streaming Data. Bielefeld: Universität Bielefeld. https://doi.org/10.4119/unibi/2985975
Castellani, Andrea. 2024. Dealing with Inaccurate and Incomplete Labels in Industrial Streaming Data. Bielefeld: Universität Bielefeld.
Castellani, A. (2024). Dealing with Inaccurate and Incomplete Labels in Industrial Streaming Data. Bielefeld: Universität Bielefeld.
Castellani, A., 2024. Dealing with Inaccurate and Incomplete Labels in Industrial Streaming Data, Bielefeld: Universität Bielefeld.
A. Castellani, Dealing with Inaccurate and Incomplete Labels in Industrial Streaming Data, Bielefeld: Universität Bielefeld, 2024.
Castellani, A.: Dealing with Inaccurate and Incomplete Labels in Industrial Streaming Data. Universität Bielefeld, Bielefeld (2024).
Castellani, Andrea. Dealing with Inaccurate and Incomplete Labels in Industrial Streaming Data. Bielefeld: Universität Bielefeld, 2024.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Public Domain Dedication (CC0 1.0):
Volltext(e)
Name
Castellani_thesis_full.pdf
14.27 MB
Access Level
Open Access
Zuletzt Hochgeladen
2024-01-09T15:24:22Z
MD5 Prüfsumme
48a26d5f386e3cd52092a327479ce414