“UDE DIATOMS in the Wild 2024”: a new image dataset of freshwater diatoms for training deep learning models
Venkataramanan A, Kloster M, Burfeid-Castellanos A, Dani M, Mayombo NAS, Vidakovic D, Langenkämper D, Tan M, Pradalier C, Nattkemper TW, Laviale M, et al. (2024)
GigaScience 13.
**Background**
Diatoms are microalgae with finely ornamented microscopic silica shells. Their taxonomic identification by light microscopy is routinely used as part of community ecological research as well as ecological status assessment of aquatic ecosystems, and a need for digitalization of these methods has long been recognized. Alongside their high taxonomic and morphological diversity, several other factors make diatoms highly challenging for deep learning–based identification using light microscopy images. These include (i) an unusually high intraclass variability combined with small between-class differences, (ii) a rather different visual appearance of specimens depending on their orientation on the microscope slide, and (iii) the limited availability of diatom experts for accurate taxonomic annotation.
**Findings**
We present the largest diatom image dataset thus far, aimed at facilitating the application and benchmarking of innovative deep learning methods to the diatom identification problem on realistic research data, “UDE DIATOMS in the Wild 2024.” The dataset contains 83,570 images of 611 diatom taxa, 101 of which are represented by at least 100 examples and 144 by at least 50 examples each. We showcase this dataset in 2 innovative analyses that address individual aspects of the above challenges using subclustering to deal with visually heterogeneous classes, out-of-distribution sample detection, and semi-supervised learning.
**Conclusions**
The problem of image-based identification of diatoms is both important for environmental research and challenging from the machine learning perspective. By making available the so far largest image dataset, accompanied by innovative analyses, this contribution will facilitate addressing these points by the scientific community.
Zitieren
Daten bereitgestellt von European Bioinformatics Institute (EBI)
Zitationen in Europe PMC
Daten bereitgestellt von Europe PubMed Central.
References
Daten bereitgestellt von Europe PubMed Central.
Markieren/ Markierung löschen
Markierte Publikationen
Web of Science
Dieser Datensatz im Web of Science®PMID: 39607983
PubMed | Europe PMC