Critical assessment of transformer-based AI models for German clinical notes

Lentzen, Manuel; Madan, Sumit; Lage-Rupprecht, Vanessa; Kühnel, Lisa; Fluck, Juliane; Jacobs, Marc; Mittermaier, Mirja; Witzenrath, Martin; Brunecker, Peter; Hofmann-Apitius, Martin; Weber, Joachim; Froehlich, Holger

Critical assessment of transformer-based AI models for German clinical notes

Lentzen M, Madan S, Lage-Rupprecht V, Kühnel L, Fluck J, Jacobs M, Mittermaier M, Witzenrath M, Brunecker P, Hofmann-Apitius M, Weber J, et al. (2022)
JAMIA Open 5(4): ooac087.

Zeitschriftenaufsatz | Veröffentlicht | Englisch

Download

Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!

DOI

https://doi.org/10.1093/jamiaopen/ooac087

Autor*in

Lentzen, Manuel; Madan, Sumit; Lage-Rupprecht, Vanessa; Kühnel, Lisa^UniBi ; Fluck, Juliane; Jacobs, Marc; Mittermaier, Mirja; Witzenrath, Martin; Brunecker, Peter; Hofmann-Apitius, Martin; Weber, Joachim; Froehlich, Holger

Einrichtung

Technische Fakultät > BIBI

Abstract / Bemerkung

Lay Summary In 2022, the majority of clinical documents are still written as free text. Assuming that these records are consistently and correctly transformed into structured data, they present an opportunity for optimized health-economic purposes as well as personalized patient care. Deep-learning methods, particularly transformer-based models, have recently received much attention as they excel in a variety of fields; however, the majority of applications are currently only available in English. Although there are general-language models in German, none have been developed specifically for biomedical or clinical documents. In this context, this study systematically compared 8 previously published general-language models and 3 newly trained biomedical domain models in information extraction and document classification tasks. Our findings show that while training entirely new models with currently available data has proven ineffective, adapting existing models for biomedical language holds a lot of promise. Furthermore, we found out that even models that have not been specifically developed for biomedical applications can achieve excellent results in the specified fields. Objective Healthcare data such as clinical notes are primarily recorded in an unstructured manner. If adequately translated into structured data, they can be utilized for health economics and set the groundwork for better individualized patient care. To structure clinical notes, deep-learning methods, particularly transformer-based models like Bidirectional Encoder Representations from Transformers (BERT), have recently received much attention. Currently, biomedical applications are primarily focused on the English language. While general-purpose German-language models such as GermanBERT and GottBERT have been published, adaptations for biomedical data are unavailable. This study evaluated the suitability of existing and novel transformer-based models for the German biomedical and clinical domain. Materials and Methods We used 8 transformer-based models and pre-trained 3 new models on a newly generated biomedical corpus, and systematically compared them with each other. We annotated a new dataset of clinical notes and used it with 4 other corpora (BRONCO150, CLEF eHealth 2019 Task 1, GGPONC, and JSynCC) to perform named entity recognition (NER) and document classification tasks. Results General-purpose language models can be used effectively for biomedical and clinical natural language processing (NLP) tasks, still, our newly trained BioGottBERT model outperformed GottBERT on both clinical NER tasks. However, training new biomedical models from scratch proved ineffective. Discussion The domain-adaptation strategy's potential is currently limited due to a lack of pre-training data. Since general-purpose language models are only marginally inferior to domain-specific models, both options are suitable for developing German-language biomedical applications. Conclusion General-purpose language models perform remarkably well on biomedical and clinical NLP tasks. If larger corpora become available in the future, domain-adapting these models may improve performances.

Stichworte

clinical concept extraction; natural language processing; transformer-based models

Erscheinungsjahr

2022

Zeitschriftentitel

JAMIA Open

Band

Ausgabe

Art.-Nr.

ooac087

eISSN

2574-2531

Page URI

https://pub.uni-bielefeld.de/record/2967396

Zitieren

Lentzen M, Madan S, Lage-Rupprecht V, et al. Critical assessment of transformer-based AI models for German clinical notes. JAMIA Open . 2022;5(4): ooac087.

Lentzen, M., Madan, S., Lage-Rupprecht, V., Kühnel, L., Fluck, J., Jacobs, M., Mittermaier, M., et al. (2022). Critical assessment of transformer-based AI models for German clinical notes. JAMIA Open , 5(4), ooac087. https://doi.org/10.1093/jamiaopen/ooac087

Lentzen, Manuel, Madan, Sumit, Lage-Rupprecht, Vanessa, Kühnel, Lisa, Fluck, Juliane, Jacobs, Marc, Mittermaier, Mirja, et al. 2022. “Critical assessment of transformer-based AI models for German clinical notes”. JAMIA Open 5 (4): ooac087.

Lentzen, M., Madan, S., Lage-Rupprecht, V., Kühnel, L., Fluck, J., Jacobs, M., Mittermaier, M., Witzenrath, M., Brunecker, P., Hofmann-Apitius, M., et al. (2022). Critical assessment of transformer-based AI models for German clinical notes. JAMIA Open 5:ooac087.

Lentzen, M., et al., 2022. Critical assessment of transformer-based AI models for German clinical notes. JAMIA Open , 5(4): ooac087.

M. Lentzen, et al., “Critical assessment of transformer-based AI models for German clinical notes”, JAMIA Open , vol. 5, 2022, : ooac087.

Lentzen, M., Madan, S., Lage-Rupprecht, V., Kühnel, L., Fluck, J., Jacobs, M., Mittermaier, M., Witzenrath, M., Brunecker, P., Hofmann-Apitius, M., Weber, J., Froehlich, H.: Critical assessment of transformer-based AI models for German clinical notes. JAMIA Open . 5, : ooac087 (2022).

Lentzen, Manuel, Madan, Sumit, Lage-Rupprecht, Vanessa, Kühnel, Lisa, Fluck, Juliane, Jacobs, Marc, Mittermaier, Mirja, Witzenrath, Martin, Brunecker, Peter, Hofmann-Apitius, Martin, Weber, Joachim, and Froehlich, Holger. “Critical assessment of transformer-based AI models for German clinical notes”. JAMIA Open 5.4 (2022): ooac087.

Daten bereitgestellt von European Bioinformatics Institute (EBI)