Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach"

Müller R, Nebel M (2018) : Bielefeld University. doi:10.4119/unibi/2918928.

Download
OA 62.08 MB
OA eldermet_subsamples_5.tar.bz2 11.03 MB
OA eldermet_subsamples_10.tar.bz2 22.05 MB
Alle
Datenpublikation
Daten vorhanden für diesen Nachweis
Abstract / Bemerkung
Data sets and results of the comparative analyses of [GeFaST](https://github.com/romueller/gefast) performed in "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach" (submitted). The scripts for the analyses are available [here](https://github.com/romueller/gefast-paper-analysis).

**dereplicated.tar.bz2**: Data sets used in the analyses of performance (_ELDERMET_ [1]) and clustering quality (_even_ & _uneven_ [2], _ELDERMET_). The original data sets (see below) have been dereplicated and sequences containing ambiguous bases (IUPAC code _n_ resp. _N_) have been deleted.
* _even_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2) * _uneven_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2) * _ELDERMET_: [http://www.ebi.ac.uk/ena/data/view/SRP003158](http://www.ebi.ac.uk/ena/data/view/SRP003158) The analysis of the clustering quality also requires the [reference data set](https://raw.githubusercontent.com/torognes/vsearch-eval/master/cluster/data/rrna_reference.fasta).

**eldermet\_subsamples\_X.tar.bz2**: Each archive contains three random subsamples of _ELDERMET_ of size X, with X being the percentage of sequences from _eldermet\_derep.fasta_ (in _dereplicated.tar.bz2_) in the subsample.

**uneven\_subsamples\_80.tar.bz2**: Archive containing five random subsamples of _uneven_, each containing 80 % of the sequences from _uneven\_derep.fasta_ (in _dereplicated.tar.bz2_).

**even\_subsamples\_80.tar.bz2**: Archive containing five random subsamples of _even_, each containing 80 % of the sequences from _even\_derep.fasta_ (in _dereplicated.tar.bz2_).

**eldermet\_reduced\_subsamples\_80.tar.bz2**: Archive containing the reduced _ELDERMET_ data set and five random subsamples of it, each containing 80 % of the sequences from _eldermet\_derep.reduced.fasta_, plus the corresponding taxonomic assignments.

**results.tar.bz2**: Results files containing the measurements of performance resp. clustering quality.
* _eldermet-performance-measurements.csv_: runtime and memory consumption for different thresholds * _eldermet-subsampling-measurements.csv_: runtime and memory consumption for different data set sizes * _eldermet-sub-fixed-red-log.csv_: runtime and memory consumption for different thresholds (on subsamples of reduced data set) * _eldermet-sub-fixed-red-metrics.csv_: clustering quality for different thresholds (on subsamples of reduced data set) * _even\_0.95-metrics.csv_: clustering quality for different thresholds, 95 % ground truth * _even\_0.97-metrics.csv_: clustering quality for different thresholds, 97 % ground truth * _even\_0.99-metrics.csv_: clustering quality for different thresholds, 99 % ground truth * _uneven\_0.95-metrics.csv_: clustering quality for different thresholds, 95 % ground truth * _uneven\_0.97-metrics.csv_: clustering quality for different thresholds, 97 % ground truth * _uneven\_0.99-metrics.csv_: clustering quality for different thresholds, 99 % ground truth * _uneven-sub-fixed-metrics.csv_: clustering quality for different thresholds (on subsamples) * _even-sub-fixed-metrics.csv_: clustering quality for different thresholds (on subsamples)

References: [1] Claesson M.J., Cusack S., O'Sullivan O., Greene-Diniz R., de Weerd H., Flannery E., Marchesi J.R., Falush D., Dinan T., Fitzgerald G., Stanton C., van Sinderen D., O'Connor M., Harnedy N., O'Connor K., Henry C., O'Mahony D., Fitzgerald A.P., Shanahan F., Twomey C., Hill C., Ross R.P., O'Toole P.W.: Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proceedings of the National Academy of Sciences 108 (Supplement 1), 4586-4591 (2011). doi: [10.1073/pnas.1000097107](https://doi.org/10.1073/pnas.1000097107) [2] Mahé F., Rognes T., Quince C., de Vargas C., Dunthorn M.: Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2, 593 (2014). doi: [10.7717/peerj.593](https://doi.org/10.7717/peerj.593)
Erscheinungsjahr
PUB-ID

Zitieren

Müller R, Nebel M. (2018): Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University. doi:10.4119/unibi/2918928.
Müller, R., & Nebel, M. (2018). Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University. doi:10.4119/unibi/2918928
Müller, R., and Nebel, M. (2018). Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University. doi:10.4119/unibi/2918928.
Müller, R., & Nebel, M., 2018. Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University. doi:10.4119/unibi/2918928
R. Müller and M. Nebel, Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University, 2018. doi:10.4119/unibi/2918928.
Müller, R., Nebel, M.: Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University (2018). doi:10.4119/unibi/2918928.
Müller, Robert, and Nebel, Markus. Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University, 2018. doi:10.4119/unibi/2918928
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:05:53Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T21:56:59Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T21:58:07Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T21:58:27Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T21:59:06Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:03:18Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:03:38Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:05:56Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:13:10Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:13:31Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:17:37Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:18:07Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:24:35Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:24:37Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:30:30Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:30:52Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:41:06Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:41:06Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:48:38Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-30T22:49:07Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-31T02:46:15Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-31T03:10:28Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-31T03:10:28Z
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-31T03:10:28Z
Name
Access Level
OA Open Access
Zuletzt Hochgeladen
2018-03-31T03:39:25Z

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar