On Deep Network Optimization
Kenneweg P (2025)
Bielefeld: Universität Bielefeld.
Bielefelder E-Dissertation | Englisch
Download

Autor*in
Gutachter*in / Betreuer*in
Einrichtung
Abstract / Bemerkung
Optimization is a fundamental task in machine learning, critical for training large
and deep networks that have become prevalent in recent years. These networks
demand significant computational resources, motivating research into improving the
efficiency and effectiveness of the optimization process. We focus on two optimization
aspects in this thesis:
First, Improving the gradient descent procedure used in every deep learning pipeline using the classical field of line search methods. Second, improving learning procedures for transformers by means of individual learning rates, attention heads and debiasing steps.
To be more precise, current back-propagation schemes are dominated by stochastic gradient techniques which depend on the step size as a hyperparameter, the first part of this research investigates line search methods as more general schemes which determine the optimum learning rate automatically. We do this by analyzing failure cases and the loss landscape, thereby proposing several enhancements to the current state-of-the-art techniques in deep learning. Specifically, we introduce adaptations to the classical line search to better handle mini-batching, a numerical approach for integrating line searches with other optimizers, and a layer-wise line search algorithm. Our proposed methods demonstrate superior performance across various tasks and architectures, outperforming existing methods.
Transformers constitute a particularly prominent deep architecture which is used in diverse tasks including natural language processing, vision, time series predictions and many more. The second part of this thesis addresses several issues when training or fine-tuning transformer-based architectures. We explore optimization avenues such as mitigating catastrophic forgetting in transformers, addressing biases in large language models, and improving transformer classification head design. For each issue, we identify specific problems, propose enhancements, and validate them through extensive experiments.
Our findings indicate that applying these line search methods and transformer improvements, whether individually or in combination, significantly advances the state-of-the-art in optimizing deep networks, particularly transformers. This research not only provides a solid foundation for future studies but also has broad impli- cations for the development of more efficient and robust neural network training methodologies.
The contributions of this thesis lie in both theoretical advancements and practical applications, offering new insights and tools for optimizing deep learning processes.
First, Improving the gradient descent procedure used in every deep learning pipeline using the classical field of line search methods. Second, improving learning procedures for transformers by means of individual learning rates, attention heads and debiasing steps.
To be more precise, current back-propagation schemes are dominated by stochastic gradient techniques which depend on the step size as a hyperparameter, the first part of this research investigates line search methods as more general schemes which determine the optimum learning rate automatically. We do this by analyzing failure cases and the loss landscape, thereby proposing several enhancements to the current state-of-the-art techniques in deep learning. Specifically, we introduce adaptations to the classical line search to better handle mini-batching, a numerical approach for integrating line searches with other optimizers, and a layer-wise line search algorithm. Our proposed methods demonstrate superior performance across various tasks and architectures, outperforming existing methods.
Transformers constitute a particularly prominent deep architecture which is used in diverse tasks including natural language processing, vision, time series predictions and many more. The second part of this thesis addresses several issues when training or fine-tuning transformer-based architectures. We explore optimization avenues such as mitigating catastrophic forgetting in transformers, addressing biases in large language models, and improving transformer classification head design. For each issue, we identify specific problems, propose enhancements, and validate them through extensive experiments.
Our findings indicate that applying these line search methods and transformer improvements, whether individually or in combination, significantly advances the state-of-the-art in optimizing deep networks, particularly transformers. This research not only provides a solid foundation for future studies but also has broad impli- cations for the development of more efficient and robust neural network training methodologies.
The contributions of this thesis lie in both theoretical advancements and practical applications, offering new insights and tools for optimizing deep learning processes.
Jahr
2025
Seite(n)
107
Urheberrecht / Lizenzen
Page URI
https://pub.uni-bielefeld.de/record/3002246
Zitieren
Kenneweg P. On Deep Network Optimization. Bielefeld: Universität Bielefeld; 2025.
Kenneweg, P. (2025). On Deep Network Optimization. Bielefeld: Universität Bielefeld. https://doi.org/10.4119/unibi/3002246
Kenneweg, Philip. 2025. On Deep Network Optimization. Bielefeld: Universität Bielefeld.
Kenneweg, P. (2025). On Deep Network Optimization. Bielefeld: Universität Bielefeld.
Kenneweg, P., 2025. On Deep Network Optimization, Bielefeld: Universität Bielefeld.
P. Kenneweg, On Deep Network Optimization, Bielefeld: Universität Bielefeld, 2025.
Kenneweg, P.: On Deep Network Optimization. Universität Bielefeld, Bielefeld (2025).
Kenneweg, Philip. On Deep Network Optimization. Bielefeld: Universität Bielefeld, 2025.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Public Domain Dedication (CC0 1.0):
Volltext(e)
Name
main.pdf
16.44 MB
Access Level

Zuletzt Hochgeladen
2025-04-07T07:43:54Z
MD5 Prüfsumme
8f847f0ab1b558913d64f64ac4d1a729