Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach"

Müller R, Nebel M (2018)
Bielefeld University.

Datenpublikation
 
Download
OA 62.08 MB
OA eldermet_subsamples_5.tar.bz2 11.03 MB
OA eldermet_subsamples_10.tar.bz2 22.05 MB
Alle
Abstract / Bemerkung
Data sets and results of the comparative analyses of [GeFaST](https://github.com/romueller/gefast) performed in "[GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2349-1)" (. The scripts for the analyses are available [here](https://github.com/romueller/gefast-paper-analysis).

**dereplicated.tar.bz2**: Data sets used in the analyses of performance (_ELDERMET_ [1]) and clustering quality (_even_ & _uneven_ [2], _ELDERMET_). The original data sets (see below) have been dereplicated and sequences containing ambiguous bases (IUPAC code _n_ resp. _N_) have been deleted.
* _even_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2) * _uneven_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2) * _ELDERMET_: [http://www.ebi.ac.uk/ena/data/view/SRP003158](http://www.ebi.ac.uk/ena/data/view/SRP003158) The analysis of the clustering quality also requires the [reference data set](https://raw.githubusercontent.com/torognes/vsearch-eval/master/cluster/data/rrna_reference.fasta).

**eldermet\_subsamples\_X.tar.bz2**: Each archive contains three random subsamples of _ELDERMET_ of size X, with X being the percentage of sequences from _eldermet\_derep.fasta_ (in _dereplicated.tar.bz2_) in the subsample.

**uneven\_subsamples\_80.tar.bz2**: Archive containing five random subsamples of _uneven_, each containing 80 % of the sequences from _uneven\_derep.fasta_ (in _dereplicated.tar.bz2_).

**even\_subsamples\_80.tar.bz2**: Archive containing five random subsamples of _even_, each containing 80 % of the sequences from _even\_derep.fasta_ (in _dereplicated.tar.bz2_).

**eldermet\_reduced\_subsamples\_80.tar.bz2**: Archive containing the reduced _ELDERMET_ data set and five random subsamples of it, each containing 80 % of the sequences from _eldermet\_derep.reduced.fasta_, plus the corresponding taxonomic assignments.

**results.tar.bz2**: Results files containing the measurements of performance resp. clustering quality.
* _eldermet-performance-measurements.csv_: runtime and memory consumption for different thresholds * _eldermet-subsampling-measurements.csv_: runtime and memory consumption for different data set sizes * _eldermet-sub-fixed-red-log.csv_: runtime and memory consumption for different thresholds (on subsamples of reduced data set) * _eldermet-sub-fixed-red-metrics.csv_: clustering quality for different thresholds (on subsamples of reduced data set) * _even\_0.95-metrics.csv_: clustering quality for different thresholds, 95 % ground truth * _even\_0.97-metrics.csv_: clustering quality for different thresholds, 97 % ground truth * _even\_0.99-metrics.csv_: clustering quality for different thresholds, 99 % ground truth * _uneven\_0.95-metrics.csv_: clustering quality for different thresholds, 95 % ground truth * _uneven\_0.97-metrics.csv_: clustering quality for different thresholds, 97 % ground truth * _uneven\_0.99-metrics.csv_: clustering quality for different thresholds, 99 % ground truth * _uneven-sub-fixed-metrics.csv_: clustering quality for different thresholds (on subsamples) * _even-sub-fixed-metrics.csv_: clustering quality for different thresholds (on subsamples)

References: [1] Claesson M.J., Cusack S., O'Sullivan O., Greene-Diniz R., de Weerd H., Flannery E., Marchesi J.R., Falush D., Dinan T., Fitzgerald G., Stanton C., van Sinderen D., O'Connor M., Harnedy N., O'Connor K., Henry C., O'Mahony D., Fitzgerald A.P., Shanahan F., Twomey C., Hill C., Ross R.P., O'Toole P.W.: Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proceedings of the National Academy of Sciences 108 (Supplement 1), 4586-4591 (2011). doi: [10.1073/pnas.1000097107](https://doi.org/10.1073/pnas.1000097107) [2] Mahé F., Rognes T., Quince C., de Vargas C., Dunthorn M.: Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2, 593 (2014). doi: [10.7717/peerj.593](https://doi.org/10.7717/peerj.593)
Erscheinungsjahr
2018
Copyright und Lizenzen
Page URI
https://pub.uni-bielefeld.de/record/2918928

Zitieren

Müller R, Nebel M. Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University; 2018.
Müller, R., & Nebel, M. (2018). Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University. https://doi.org/10.4119/unibi/2918928
Müller, Robert, and Nebel, Markus. 2018. Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University.
Müller, R., and Nebel, M. (2018). Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University.
Müller, R., & Nebel, M., 2018. Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach", Bielefeld University.
R. Müller and M. Nebel, Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach", Bielefeld University, 2018.
Müller, R., Nebel, M.: Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University (2018).
Müller, Robert, and Nebel, Markus. Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University, 2018.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
150e9e9a8ef7e2f0aaad5b1b1235010b
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
c6dd1cfa37d0161c004b72b28f819511
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
c689defcf74b1bcd8da8d23911a34b05
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
48dfec5f54580b35134887a7c047606a
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
6c1de7f90195215f188ee79b7746e0e1
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
6c575d4c49ab9bab518fa711b6ceb2d2
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
a2c93cf19d9469dd4d439923f4bb17b5
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
5ba90afc833f8a67dd9ea3f4ca06614b
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
97d1d6e8a6976c866fe77981f822a483
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
c10b0add3364da0242664403b53b9082
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
ccfd6baa5215dfb8f811d4913358bba1
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
96fe6feb471b772ff1070bebe47bee40
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
50f2ef13058193c8c0451bea4ec187a1
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
4931149ede73efa364f0d2687810197e
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
57862cf2129e84effe9149ca0a2dff65
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
1cc1663a043dec85f62afbd1d182a66f
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
55fbee72aa3804d6c910d5ac1f9f7e74
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
fdb91e96f120a8fd1c1c58a03540412d
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
acd9ff9e3e8d4c8d868c92735b3057f0
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
ba37acac4baba5a8f03b16fba8aa5021
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
80c2bf23dc9baed9131c8dd95bed0e69
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
eacd41b4d2efc162030c53b9b4726801
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
4442052e62cdd9c8f9990cf1700068cd
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
8b681f9a7ab1ae15370be2d26e402e16
Name
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
94a5cca09e90528cea373e0bcc0ef41a


Material in PUB:
Wird zitiert von
Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar