On the use of sequence-quality information in OTU clustering

Müller, Robert; Nebel, Markus

On the use of sequence-quality information in OTU clustering

Müller R, Nebel M (2021)
PeerJ 9: e11717.

Zeitschriftenaufsatz | Veröffentlicht | Englisch

Download

Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!

DOI

https://doi.org/10.7717/peerj.11717

Autor*in

Müller, Robert^UniBi ; Nebel, Markus^UniBi

Einrichtung

Technische Fakultät > AG Algorithmik und Bioinformatik

Abstract / Bemerkung

Background High-throughput sequencing has become an essential technology in life science research. Despite continuous improvements in technology, the produced sequences are still not entirely accurate. Consequently, the sequences are usually equipped with error probabilities. The quality information is already employed to find better solutions to a number of bioinformatics problems (e.g. read mapping). Data processing pipelines benefit in particular (especially when incorporating the quality information early), since enhanced outcomes of one step can improve all subsequent ones. Preprocessing steps, thus, quite regularly consider the sequence quality to fix errors or discard low-quality data. Other steps, however, like clustering sequences into operational taxonomic units (OTUs), a common task in the analysis of microbial communities, are typically performed without making use of the available quality information. Results In this paper, we present quality-aware clustering methods inspired by quality-weighted alignments and model-based denoising, and explore their applicability to OTU clustering. We implemented the quality-aware methods in a revised version of our de novo clustering tool GeFaST and evaluated their clustering quality and performance on mock-community data sets. Quality-weighted alignments were able to improve the clustering quality of GeFaST by up to 10%. The examination of the model-supported methods provided a more diverse picture, hinting at a narrower applicability, but they were able to attain similar improvements. Considering the quality information enlarged both runtime and memory consumption, even though the increase of the former depended heavily on the applied method and clustering threshold. Conclusions The quality-aware methods expand the iterative, de novo clustering approach by new clustering and cluster refinement methods. Our results indicate that OTU clustering constitutes yet another analysis step benefiting from the integration of quality information. Beyond the shown potential, the quality-aware methods offer a range of opportunities for fine-tuning and further extensions.

Erscheinungsjahr

2021

Zeitschriftentitel

PeerJ

Band

Art.-Nr.

e11717

Urheberrecht / Lizenzen

Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0)

eISSN

2167-8359

Page URI

https://pub.uni-bielefeld.de/record/2956771

Zitieren

Müller R, Nebel M. On the use of sequence-quality information in OTU clustering. PeerJ. 2021;9: e11717.

Müller, R., & Nebel, M. (2021). On the use of sequence-quality information in OTU clustering. PeerJ, 9, e11717. https://doi.org/10.7717/peerj.11717

Müller, Robert, and Nebel, Markus. 2021. “On the use of sequence-quality information in OTU clustering”. PeerJ 9: e11717.

Müller, R., and Nebel, M. (2021). On the use of sequence-quality information in OTU clustering. PeerJ 9:e11717.

Müller, R., & Nebel, M., 2021. On the use of sequence-quality information in OTU clustering. PeerJ, 9: e11717.

R. Müller and M. Nebel, “On the use of sequence-quality information in OTU clustering”, PeerJ, vol. 9, 2021, : e11717.

Müller, R., Nebel, M.: On the use of sequence-quality information in OTU clustering. PeerJ. 9, : e11717 (2021).

Müller, Robert, and Nebel, Markus. “On the use of sequence-quality information in OTU clustering”. PeerJ 9 (2021): e11717.

Daten bereitgestellt von European Bioinformatics Institute (EBI)

Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

References

Daten bereitgestellt von Europe PubMed Central.

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 34458017
PubMed | Europe PMC

Suchen in

Google Scholar

PUB - Publikationen an der Universität Bielefeld