On Deep Network Optimization

Kenneweg, Philip

On Deep Network Optimization

Kenneweg P (2025)
Bielefeld: Universität Bielefeld.

Bielefelder E-Dissertation | Englisch

Download

main.pdf 16.44 MB

DOI

https://doi.org/10.4119/unibi/3002246

URN

urn:nbn:de:0070-pub-30022462

Autor*in

Kenneweg, Philip^UniBi

Gutachter*in / Betreuer*in

Hammer, Barbara^UniBi

Einrichtung

Technische Fakultät > AG Machine Learning
Center of Excellence - Cognitive Interaction Technology CITEC

Abstract / Bemerkung

Optimization is a fundamental task in machine learning, critical for training large and deep networks that have become prevalent in recent years. These networks demand significant computational resources, motivating research into improving the efficiency and effectiveness of the optimization process. We focus on two optimization aspects in this thesis:
First, Improving the gradient descent procedure used in every deep learning pipeline using the classical field of line search methods. Second, improving learning procedures for transformers by means of individual learning rates, attention heads and debiasing steps.
To be more precise, current back-propagation schemes are dominated by stochastic gradient techniques which depend on the step size as a hyperparameter, the first part of this research investigates line search methods as more general schemes which determine the optimum learning rate automatically. We do this by analyzing failure cases and the loss landscape, thereby proposing several enhancements to the current state-of-the-art techniques in deep learning. Specifically, we introduce adaptations to the classical line search to better handle mini-batching, a numerical approach for integrating line searches with other optimizers, and a layer-wise line search algorithm. Our proposed methods demonstrate superior performance across various tasks and architectures, outperforming existing methods.
Transformers constitute a particularly prominent deep architecture which is used in diverse tasks including natural language processing, vision, time series predictions and many more. The second part of this thesis addresses several issues when training or fine-tuning transformer-based architectures. We explore optimization avenues such as mitigating catastrophic forgetting in transformers, addressing biases in large language models, and improving transformer classification head design. For each issue, we identify specific problems, propose enhancements, and validate them through extensive experiments.
Our findings indicate that applying these line search methods and transformer improvements, whether individually or in combination, significantly advances the state-of-the-art in optimizing deep networks, particularly transformers. This research not only provides a solid foundation for future studies but also has broad impli- cations for the development of more efficient and robust neural network training methodologies.
The contributions of this thesis lie in both theoretical advancements and practical applications, offering new insights and tools for optimizing deep learning processes.

Jahr

2025

Seite(n)

107

Urheberrecht / Lizenzen

Creative Commons Public Domain Dedication (CC0 1.0)

Page URI

https://pub.uni-bielefeld.de/record/3002246

Zitieren

Kenneweg P. On Deep Network Optimization. Bielefeld: Universität Bielefeld; 2025.

Kenneweg, P. (2025). On Deep Network Optimization. Bielefeld: Universität Bielefeld. https://doi.org/10.4119/unibi/3002246

Kenneweg, Philip. 2025. On Deep Network Optimization. Bielefeld: Universität Bielefeld.

Kenneweg, P. (2025). On Deep Network Optimization. Bielefeld: Universität Bielefeld.

Kenneweg, P., 2025. On Deep Network Optimization, Bielefeld: Universität Bielefeld.

P. Kenneweg, On Deep Network Optimization, Bielefeld: Universität Bielefeld, 2025.

Kenneweg, P.: On Deep Network Optimization. Universität Bielefeld, Bielefeld (2025).

Kenneweg, Philip. On Deep Network Optimization. Bielefeld: Universität Bielefeld, 2025.

Alle Dateien verfügbar unter der/den folgenden Lizenz(en):

Creative Commons Public Domain Dedication (CC0 1.0):

https://creativecommons.org/publicdomain/zero/1.0/deed.de
https://creativecommons.org/publicdomain/zero/1.0/legalcode

Volltext(e)

Name

main.pdf 16.44 MB

Access Level

Open Access

Zuletzt Hochgeladen

2025-04-07T07:43:54Z

MD5 Prüfsumme

8f847f0ab1b558913d64f64ac4d1a729

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar

PUB - Publikationen an der Universität Bielefeld

On Deep Network Optimization

Zitieren