On Deep Network Optimization

Kenneweg P (2025)
Bielefeld: Universität Bielefeld.

Bielefelder E-Dissertation | Englisch
 
Download
OA 16.44 MB
Gutachter*in / Betreuer*in
Abstract / Bemerkung
Optimization is a fundamental task in machine learning, critical for training large and deep networks that have become prevalent in recent years. These networks demand significant computational resources, motivating research into improving the efficiency and effectiveness of the optimization process. We focus on two optimization aspects in this thesis:
First, Improving the gradient descent procedure used in every deep learning pipeline using the classical field of line search methods. Second, improving learning procedures for transformers by means of individual learning rates, attention heads and debiasing steps.
To be more precise, current back-propagation schemes are dominated by stochastic gradient techniques which depend on the step size as a hyperparameter, the first part of this research investigates line search methods as more general schemes which determine the optimum learning rate automatically. We do this by analyzing failure cases and the loss landscape, thereby proposing several enhancements to the current state-of-the-art techniques in deep learning. Specifically, we introduce adaptations to the classical line search to better handle mini-batching, a numerical approach for integrating line searches with other optimizers, and a layer-wise line search algorithm. Our proposed methods demonstrate superior performance across various tasks and architectures, outperforming existing methods.
Transformers constitute a particularly prominent deep architecture which is used in diverse tasks including natural language processing, vision, time series predictions and many more. The second part of this thesis addresses several issues when training or fine-tuning transformer-based architectures. We explore optimization avenues such as mitigating catastrophic forgetting in transformers, addressing biases in large language models, and improving transformer classification head design. For each issue, we identify specific problems, propose enhancements, and validate them through extensive experiments.
Our findings indicate that applying these line search methods and transformer improvements, whether individually or in combination, significantly advances the state-of-the-art in optimizing deep networks, particularly transformers. This research not only provides a solid foundation for future studies but also has broad impli- cations for the development of more efficient and robust neural network training methodologies.
The contributions of this thesis lie in both theoretical advancements and practical applications, offering new insights and tools for optimizing deep learning processes.
Jahr
2025
Seite(n)
107
Page URI
https://pub.uni-bielefeld.de/record/3002246

Zitieren

Kenneweg P. On Deep Network Optimization. Bielefeld: Universität Bielefeld; 2025.
Kenneweg, P. (2025). On Deep Network Optimization. Bielefeld: Universität Bielefeld. https://doi.org/10.4119/unibi/3002246
Kenneweg, Philip. 2025. On Deep Network Optimization. Bielefeld: Universität Bielefeld.
Kenneweg, P. (2025). On Deep Network Optimization. Bielefeld: Universität Bielefeld.
Kenneweg, P., 2025. On Deep Network Optimization, Bielefeld: Universität Bielefeld.
P. Kenneweg, On Deep Network Optimization, Bielefeld: Universität Bielefeld, 2025.
Kenneweg, P.: On Deep Network Optimization. Universität Bielefeld, Bielefeld (2025).
Kenneweg, Philip. On Deep Network Optimization. Bielefeld: Universität Bielefeld, 2025.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Volltext(e)
Name
16.44 MB
Access Level
OA Open Access
Zuletzt Hochgeladen
2025-04-07T07:43:54Z
MD5 Prüfsumme
8f847f0ab1b558913d64f64ac4d1a729


Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar