Faster Convergence for Transformer Fine-tuning with Line Search Methods
Kenneweg P, Galli L, Kenneweg T, Hammer B (2023)
In: 2023 International Joint Conference on Neural Networks (IJCNN). IEEE: 1-8.
Konferenzbeitrag
| Veröffentlicht | Englisch
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Einrichtung
Abstract / Bemerkung
Recent works have shown that line search methods greatly increase performance of traditional stochastic gradient descent methods on a variety of datasets and architectures [1], [2]. In this work we succeed in extending line search methods to the novel and highly popular Transformer architecture and dataset domains in natural language processing. More specifically, we combine the Armijo line search with the Adam optimizer and extend it by subdividing the networks architecture into sensible units and perform the line search separately on these local units. Our optimization method outperforms the traditional Adam optimizer and achieves significant performance improvements for small data sets or small training budgets, while performing equal or better for other tested cases. Our work is publicly available as a python package, which provides a hyperparameter-free pytorch optimizer that is compatible with arbitrary network architectures.
Erscheinungsjahr
2023
Titel des Konferenzbandes
2023 International Joint Conference on Neural Networks (IJCNN)
Seite(n)
1-8
Konferenz
2023 International Joint Conference on Neural Networks (IJCNN)
Konferenzort
Gold Coast, Australia
Konferenzdatum
2023-06-18 – 2023-06-23
eISBN
978-1-6654-8867-9
Page URI
https://pub.uni-bielefeld.de/record/2984047
Zitieren
Kenneweg P, Galli L, Kenneweg T, Hammer B. Faster Convergence for Transformer Fine-tuning with Line Search Methods. In: 2023 International Joint Conference on Neural Networks (IJCNN). IEEE; 2023: 1-8.
Kenneweg, P., Galli, L., Kenneweg, T., & Hammer, B. (2023). Faster Convergence for Transformer Fine-tuning with Line Search Methods. 2023 International Joint Conference on Neural Networks (IJCNN), 1-8. IEEE. https://doi.org/10.1109/IJCNN54540.2023.10192001
Kenneweg, Philip, Galli, Leonardo, Kenneweg, Tristan, and Hammer, Barbara. 2023. “Faster Convergence for Transformer Fine-tuning with Line Search Methods”. In 2023 International Joint Conference on Neural Networks (IJCNN), 1-8. IEEE.
Kenneweg, P., Galli, L., Kenneweg, T., and Hammer, B. (2023). “Faster Convergence for Transformer Fine-tuning with Line Search Methods” in 2023 International Joint Conference on Neural Networks (IJCNN) (IEEE), 1-8.
Kenneweg, P., et al., 2023. Faster Convergence for Transformer Fine-tuning with Line Search Methods. In 2023 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1-8.
P. Kenneweg, et al., “Faster Convergence for Transformer Fine-tuning with Line Search Methods”, 2023 International Joint Conference on Neural Networks (IJCNN), IEEE, 2023, pp.1-8.
Kenneweg, P., Galli, L., Kenneweg, T., Hammer, B.: Faster Convergence for Transformer Fine-tuning with Line Search Methods. 2023 International Joint Conference on Neural Networks (IJCNN). p. 1-8. IEEE (2023).
Kenneweg, Philip, Galli, Leonardo, Kenneweg, Tristan, and Hammer, Barbara. “Faster Convergence for Transformer Fine-tuning with Line Search Methods”. 2023 International Joint Conference on Neural Networks (IJCNN). IEEE, 2023. 1-8.