“UDE DIATOMS in the Wild 2024”: a new image dataset of freshwater diatoms for training deep learning models

Venkataramanan A, Kloster M, Burfeid-Castellanos A, Dani M, Mayombo NAS, Vidakovic D, Langenkämper D, Tan M, Pradalier C, Nattkemper TW, Laviale M, et al. (2024)
GigaScience 13.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
OA 2.23 MB
Autor*in
Venkataramanan, Aishwarya; Kloster, MichaelUniBi; Burfeid-Castellanos, Andrea; Dani, Mimoza; Mayombo, Ntambwe A S; Vidakovic, Danijela; Langenkämper, DanielUniBi ; Tan, MingkunUniBi; Pradalier, Cedric; Nattkemper, Tim WilhelmUniBi ; Laviale, Martin; Beszteri, Bánk
Abstract / Bemerkung

**Background**
Diatoms are microalgae with finely ornamented microscopic silica shells. Their taxonomic identification by light microscopy is routinely used as part of community ecological research as well as ecological status assessment of aquatic ecosystems, and a need for digitalization of these methods has long been recognized. Alongside their high taxonomic and morphological diversity, several other factors make diatoms highly challenging for deep learning–based identification using light microscopy images. These include (i) an unusually high intraclass variability combined with small between-class differences, (ii) a rather different visual appearance of specimens depending on their orientation on the microscope slide, and (iii) the limited availability of diatom experts for accurate taxonomic annotation.

**Findings**
We present the largest diatom image dataset thus far, aimed at facilitating the application and benchmarking of innovative deep learning methods to the diatom identification problem on realistic research data, “UDE DIATOMS in the Wild 2024.” The dataset contains 83,570 images of 611 diatom taxa, 101 of which are represented by at least 100 examples and 144 by at least 50 examples each. We showcase this dataset in 2 innovative analyses that address individual aspects of the above challenges using subclustering to deal with visually heterogeneous classes, out-of-distribution sample detection, and semi-supervised learning.

**Conclusions**
The problem of image-based identification of diatoms is both important for environmental research and challenging from the machine learning perspective. By making available the so far largest image dataset, accompanied by innovative analyses, this contribution will facilitate addressing these points by the scientific community.

Erscheinungsjahr
2024
Zeitschriftentitel
GigaScience
Band
13
eISSN
2047-217X
Page URI
https://pub.uni-bielefeld.de/record/2994713

Zitieren

Venkataramanan A, Kloster M, Burfeid-Castellanos A, et al. “UDE DIATOMS in the Wild 2024”: a new image dataset of freshwater diatoms for training deep learning models. GigaScience. 2024;13.
Venkataramanan, A., Kloster, M., Burfeid-Castellanos, A., Dani, M., Mayombo, N. A. S., Vidakovic, D., Langenkämper, D., et al. (2024). “UDE DIATOMS in the Wild 2024”: a new image dataset of freshwater diatoms for training deep learning models. GigaScience, 13. https://doi.org/10.1093/gigascience/giae087
Venkataramanan, Aishwarya, Kloster, Michael, Burfeid-Castellanos, Andrea, Dani, Mimoza, Mayombo, Ntambwe A S, Vidakovic, Danijela, Langenkämper, Daniel, et al. 2024. ““UDE DIATOMS in the Wild 2024”: a new image dataset of freshwater diatoms for training deep learning models”. GigaScience 13.
Venkataramanan, A., Kloster, M., Burfeid-Castellanos, A., Dani, M., Mayombo, N. A. S., Vidakovic, D., Langenkämper, D., Tan, M., Pradalier, C., Nattkemper, T. W., et al. (2024). “UDE DIATOMS in the Wild 2024”: a new image dataset of freshwater diatoms for training deep learning models. GigaScience 13.
Venkataramanan, A., et al., 2024. “UDE DIATOMS in the Wild 2024”: a new image dataset of freshwater diatoms for training deep learning models. GigaScience, 13.
A. Venkataramanan, et al., ““UDE DIATOMS in the Wild 2024”: a new image dataset of freshwater diatoms for training deep learning models”, GigaScience, vol. 13, 2024.
Venkataramanan, A., Kloster, M., Burfeid-Castellanos, A., Dani, M., Mayombo, N.A.S., Vidakovic, D., Langenkämper, D., Tan, M., Pradalier, C., Nattkemper, T.W., Laviale, M., Beszteri, B.: “UDE DIATOMS in the Wild 2024”: a new image dataset of freshwater diatoms for training deep learning models. GigaScience. 13, (2024).
Venkataramanan, Aishwarya, Kloster, Michael, Burfeid-Castellanos, Andrea, Dani, Mimoza, Mayombo, Ntambwe A S, Vidakovic, Danijela, Langenkämper, Daniel, Tan, Mingkun, Pradalier, Cedric, Nattkemper, Tim Wilhelm, Laviale, Martin, and Beszteri, Bánk. ““UDE DIATOMS in the Wild 2024”: a new image dataset of freshwater diatoms for training deep learning models”. GigaScience 13 (2024).
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Volltext(e)
Name
Access Level
OA Open Access
Zuletzt Hochgeladen
2024-11-29T06:58:26Z
MD5 Prüfsumme
4387b3334e7fa73ab280e70e2abe21eb


Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

References

Daten bereitgestellt von Europe PubMed Central.

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®
Quellen

PMID: 39607983
PubMed | Europe PMC

Suchen in

Google Scholar