Critical assessment of transformer-based AI models for German clinical notes

Lentzen M, Madan S, Lage-Rupprecht V, Kühnel L, Fluck J, Jacobs M, Mittermaier M, Witzenrath M, Brunecker P, Hofmann-Apitius M, Weber J, et al. (2022)
JAMIA Open 5(4): ooac087.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Lentzen, Manuel; Madan, Sumit; Lage-Rupprecht, Vanessa; Kühnel, LisaUniBi ; Fluck, Juliane; Jacobs, Marc; Mittermaier, Mirja; Witzenrath, Martin; Brunecker, Peter; Hofmann-Apitius, Martin; Weber, Joachim; Froehlich, Holger
Abstract / Bemerkung
Lay Summary In 2022, the majority of clinical documents are still written as free text. Assuming that these records are consistently and correctly transformed into structured data, they present an opportunity for optimized health-economic purposes as well as personalized patient care. Deep-learning methods, particularly transformer-based models, have recently received much attention as they excel in a variety of fields; however, the majority of applications are currently only available in English. Although there are general-language models in German, none have been developed specifically for biomedical or clinical documents. In this context, this study systematically compared 8 previously published general-language models and 3 newly trained biomedical domain models in information extraction and document classification tasks. Our findings show that while training entirely new models with currently available data has proven ineffective, adapting existing models for biomedical language holds a lot of promise. Furthermore, we found out that even models that have not been specifically developed for biomedical applications can achieve excellent results in the specified fields. Objective Healthcare data such as clinical notes are primarily recorded in an unstructured manner. If adequately translated into structured data, they can be utilized for health economics and set the groundwork for better individualized patient care. To structure clinical notes, deep-learning methods, particularly transformer-based models like Bidirectional Encoder Representations from Transformers (BERT), have recently received much attention. Currently, biomedical applications are primarily focused on the English language. While general-purpose German-language models such as GermanBERT and GottBERT have been published, adaptations for biomedical data are unavailable. This study evaluated the suitability of existing and novel transformer-based models for the German biomedical and clinical domain. Materials and Methods We used 8 transformer-based models and pre-trained 3 new models on a newly generated biomedical corpus, and systematically compared them with each other. We annotated a new dataset of clinical notes and used it with 4 other corpora (BRONCO150, CLEF eHealth 2019 Task 1, GGPONC, and JSynCC) to perform named entity recognition (NER) and document classification tasks. Results General-purpose language models can be used effectively for biomedical and clinical natural language processing (NLP) tasks, still, our newly trained BioGottBERT model outperformed GottBERT on both clinical NER tasks. However, training new biomedical models from scratch proved ineffective. Discussion The domain-adaptation strategy's potential is currently limited due to a lack of pre-training data. Since general-purpose language models are only marginally inferior to domain-specific models, both options are suitable for developing German-language biomedical applications. Conclusion General-purpose language models perform remarkably well on biomedical and clinical NLP tasks. If larger corpora become available in the future, domain-adapting these models may improve performances.
Stichworte
clinical concept extraction; natural language processing; transformer-based models
Erscheinungsjahr
2022
Zeitschriftentitel
JAMIA Open
Band
5
Ausgabe
4
Art.-Nr.
ooac087
eISSN
2574-2531
Page URI
https://pub.uni-bielefeld.de/record/2967396

Zitieren

Lentzen M, Madan S, Lage-Rupprecht V, et al. Critical assessment of transformer-based AI models for German clinical notes. JAMIA Open . 2022;5(4): ooac087.
Lentzen, M., Madan, S., Lage-Rupprecht, V., Kühnel, L., Fluck, J., Jacobs, M., Mittermaier, M., et al. (2022). Critical assessment of transformer-based AI models for German clinical notes. JAMIA Open , 5(4), ooac087. https://doi.org/10.1093/jamiaopen/ooac087
Lentzen, Manuel, Madan, Sumit, Lage-Rupprecht, Vanessa, Kühnel, Lisa, Fluck, Juliane, Jacobs, Marc, Mittermaier, Mirja, et al. 2022. “Critical assessment of transformer-based AI models for German clinical notes”. JAMIA Open 5 (4): ooac087.
Lentzen, M., Madan, S., Lage-Rupprecht, V., Kühnel, L., Fluck, J., Jacobs, M., Mittermaier, M., Witzenrath, M., Brunecker, P., Hofmann-Apitius, M., et al. (2022). Critical assessment of transformer-based AI models for German clinical notes. JAMIA Open 5:ooac087.
Lentzen, M., et al., 2022. Critical assessment of transformer-based AI models for German clinical notes. JAMIA Open , 5(4): ooac087.
M. Lentzen, et al., “Critical assessment of transformer-based AI models for German clinical notes”, JAMIA Open , vol. 5, 2022, : ooac087.
Lentzen, M., Madan, S., Lage-Rupprecht, V., Kühnel, L., Fluck, J., Jacobs, M., Mittermaier, M., Witzenrath, M., Brunecker, P., Hofmann-Apitius, M., Weber, J., Froehlich, H.: Critical assessment of transformer-based AI models for German clinical notes. JAMIA Open . 5, : ooac087 (2022).
Lentzen, Manuel, Madan, Sumit, Lage-Rupprecht, Vanessa, Kühnel, Lisa, Fluck, Juliane, Jacobs, Marc, Mittermaier, Mirja, Witzenrath, Martin, Brunecker, Peter, Hofmann-Apitius, Martin, Weber, Joachim, and Froehlich, Holger. “Critical assessment of transformer-based AI models for German clinical notes”. JAMIA Open 5.4 (2022): ooac087.

Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

References

Daten bereitgestellt von Europe PubMed Central.

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®
Quellen

PMID: 36380848
PubMed | Europe PMC

Suchen in

Google Scholar