Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach"

Müller R, Nebel M (2018) : Bielefeld University. doi:10.4119/unibi/2918928.

Download
OA 62.08 MB
OA eldermet_subsamples_5.tar.bz2 11.03 MB
OA eldermet_subsamples_10.tar.bz2 22.05 MB
All
Research Data
Abstract
Data sets and results of the comparative analyses of [GeFaST](https://github.com/romueller/gefast) performed in "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach" (submitted). The scripts for the analyses are available [here](https://github.com/romueller/gefast-paper-analysis).

**dereplicated.tar.bz2**: Data sets used in the analyses of performance (_ELDERMET_ [1]) and clustering quality (_even_ & _uneven_ [2], _ELDERMET_). The original data sets (see below) have been dereplicated and sequences containing ambiguous bases (IUPAC code _n_ resp. _N_) have been deleted.
* _even_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2) * _uneven_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2) * _ELDERMET_: [http://www.ebi.ac.uk/ena/data/view/SRP003158](http://www.ebi.ac.uk/ena/data/view/SRP003158) The analysis of the clustering quality also requires the [reference data set](https://raw.githubusercontent.com/torognes/vsearch-eval/master/cluster/data/rrna_reference.fasta).

**eldermet\_subsamples\_X.tar.bz2**: Each archive contains three random subsamples of _ELDERMET_ of size X, with X being the percentage of sequences from _eldermet\_derep.fasta_ (in _dereplicated.tar.bz2_) in the subsample.

**uneven\_subsamples\_80.tar.bz2**: Archive containing five random subsamples of _uneven_, each containing 80 % of the sequences from _uneven\_derep.fasta_ (in _dereplicated.tar.bz2_).

**even\_subsamples\_80.tar.bz2**: Archive containing five random subsamples of _even_, each containing 80 % of the sequences from _even\_derep.fasta_ (in _dereplicated.tar.bz2_).

**eldermet\_reduced\_subsamples\_80.tar.bz2**: Archive containing the reduced _ELDERMET_ data set and five random subsamples of it, each containing 80 % of the sequences from _eldermet\_derep.reduced.fasta_, plus the corresponding taxonomic assignments.

**results.tar.bz2**: Results files containing the measurements of performance resp. clustering quality.
* _eldermet-performance-measurements.csv_: runtime and memory consumption for different thresholds * _eldermet-subsampling-measurements.csv_: runtime and memory consumption for different data set sizes * _eldermet-sub-fixed-red-log.csv_: runtime and memory consumption for different thresholds (on subsamples of reduced data set) * _eldermet-sub-fixed-red-metrics.csv_: clustering quality for different thresholds (on subsamples of reduced data set) * _even\_0.95-metrics.csv_: clustering quality for different thresholds, 95 % ground truth * _even\_0.97-metrics.csv_: clustering quality for different thresholds, 97 % ground truth * _even\_0.99-metrics.csv_: clustering quality for different thresholds, 99 % ground truth * _uneven\_0.95-metrics.csv_: clustering quality for different thresholds, 95 % ground truth * _uneven\_0.97-metrics.csv_: clustering quality for different thresholds, 97 % ground truth * _uneven\_0.99-metrics.csv_: clustering quality for different thresholds, 99 % ground truth * _uneven-sub-fixed-metrics.csv_: clustering quality for different thresholds (on subsamples) * _even-sub-fixed-metrics.csv_: clustering quality for different thresholds (on subsamples)

References: [1] Claesson M.J., Cusack S., O'Sullivan O., Greene-Diniz R., de Weerd H., Flannery E., Marchesi J.R., Falush D., Dinan T., Fitzgerald G., Stanton C., van Sinderen D., O'Connor M., Harnedy N., O'Connor K., Henry C., O'Mahony D., Fitzgerald A.P., Shanahan F., Twomey C., Hill C., Ross R.P., O'Toole P.W.: Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proceedings of the National Academy of Sciences 108 (Supplement 1), 4586-4591 (2011). doi: [10.1073/pnas.1000097107](https://doi.org/10.1073/pnas.1000097107) [2] Mahé F., Rognes T., Quince C., de Vargas C., Dunthorn M.: Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2, 593 (2014). doi: [10.7717/peerj.593](https://doi.org/10.7717/peerj.593)
Publishing Year
Data Re-Use License
This Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach" is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/
PUB-ID

Cite this

Müller R, Nebel M. (2018): Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University. doi:10.4119/unibi/2918928.
Müller, R., & Nebel, M. (2018). Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University. doi:10.4119/unibi/2918928
Müller, R., and Nebel, M. (2018). Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University. doi:10.4119/unibi/2918928.
Müller, R., & Nebel, M., 2018. Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University. doi:10.4119/unibi/2918928
R. Müller and M. Nebel, Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University, 2018. doi:10.4119/unibi/2918928.
Müller, R., Nebel, M.: Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University (2018). doi:10.4119/unibi/2918928.
Müller, Robert, and Nebel, Markus. Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University, 2018. doi:10.4119/unibi/2918928
All files available under the following license(s):
Main File(s)
File Name
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:05:53Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T21:56:59Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T21:58:07Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T21:58:27Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T21:59:06Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:03:18Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:03:38Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:05:56Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:13:10Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:13:31Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:17:37Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:18:07Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:24:35Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:24:37Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:30:30Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:30:52Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:41:06Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:41:06Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:48:38Z
Access Level
OA Open Access
Last Uploaded
2018-03-30T22:49:07Z
Access Level
OA Open Access
Last Uploaded
2018-03-31T02:46:15Z
Access Level
OA Open Access
Last Uploaded
2018-03-31T03:10:28Z
File Name
Access Level
OA Open Access
Last Uploaded
2018-03-31T03:10:28Z
Access Level
OA Open Access
Last Uploaded
2018-03-31T03:10:28Z
File Name
Access Level
OA Open Access
Last Uploaded
2018-03-31T03:39:25Z

This data publication is cited in the following publications:
This publication cites the following data publications:

Export

0 Marked Publications

Open Data PUB

Search this title in

Google Scholar