Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation
Belmann P, Osterholz B, Kleinboelting N, Pühler A, Schlüter A, Sczyrba A (2024)
bioRxiv.
Preprint
| Veröffentlicht | Englisch
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Belmann, PeterUniBi;
Osterholz, BenediktUniBi;
Kleinboelting, Nils;
Pühler, AlfredUniBi ;
Schlüter, AndreasUniBi ;
Sczyrba, AlexanderUniBi
Einrichtung
Abstract / Bemerkung
The metagenome analysis of complex environments with thousands of datasets, such as those available in the Sequence Read Archive, requires immense computational resources to complete the computational work within an acceptable time frame. Such large-scale analyses require that the underlying infrastructure is used efficiently. In addition, any analysis should be fully reproducible and the workflow must be publicly available to allow other researchers to understand the reasoning behind computed results. Here, we introduce the Metagenomics-Toolkit, a scalable, data agnostic workflow that automates the analysis of short and long metagenomic reads obtained from Illumina or Oxford Nanopore Technology devices, respectively. The Metagenomics-Toolkit offers not only standard features expected in a metagenome workflow, such as quality control, assembly, binning, and annotation, but also distinctive features, such as plasmid identification based on various tools, the recovery of unassembled microbial community members and the discovery of microbial interdependencies through a combination of dereplication, co-occurrence, and genome-scale metabolic modeling. Furthermore, the Metagenomics-Toolkit includes a machine learning-optimized assembly step that tailors the peak RAM value requested by a metagenome assembler to match actual requirements, thereby minimizing the dependency on dedicated high-memory hardware. While the Metagenomics Toolkit can be executed on user workstations, it also offers several optimizations for an efficient cloud-based cluster execution. We compare the Metagenomics-Toolkit to five commonly used metagenomics workflows and demonstrate the capabilities of the Metagenomics-Toolkit by executing it on 757 metagenome datasets from sewage samples for an investigation of a possible sewage core microbiome. The Metagenomics-Toolkit is open source and available at https://github.com/metagenomics/metagenomics-tk.
Erscheinungsjahr
2024
Zeitschriftentitel
bioRxiv
Urheberrecht / Lizenzen
Page URI
https://pub.uni-bielefeld.de/record/2993878
Zitieren
Belmann P, Osterholz B, Kleinboelting N, Pühler A, Schlüter A, Sczyrba A. Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation. bioRxiv. 2024.
Belmann, P., Osterholz, B., Kleinboelting, N., Pühler, A., Schlüter, A., & Sczyrba, A. (2024). Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation. bioRxiv. https://doi.org/10.1101/2024.10.22.619569
Belmann, Peter, Osterholz, Benedikt, Kleinboelting, Nils, Pühler, Alfred, Schlüter, Andreas, and Sczyrba, Alexander. 2024. “Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation”. bioRxiv.
Belmann, P., Osterholz, B., Kleinboelting, N., Pühler, A., Schlüter, A., and Sczyrba, A. (2024). Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation. bioRxiv.
Belmann, P., et al., 2024. Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation. bioRxiv.
P. Belmann, et al., “Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation”, bioRxiv, 2024.
Belmann, P., Osterholz, B., Kleinboelting, N., Pühler, A., Schlüter, A., Sczyrba, A.: Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation. bioRxiv. (2024).
Belmann, Peter, Osterholz, Benedikt, Kleinboelting, Nils, Pühler, Alfred, Schlüter, Andreas, and Sczyrba, Alexander. “Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation”. bioRxiv (2024).