Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach"
Müller R, Nebel M (2018)
Bielefeld University.
Datenpublikation
Download
dereplicated.tar.bz2
62.08 MB
eldermet_subsamples_5.tar.bz2 11.03 MB
eldermet_subsamples_10.tar.bz2 22.05 MB
Alle
eldermet_subsamples_5.tar.bz2 11.03 MB
eldermet_subsamples_10.tar.bz2 22.05 MB
Alle
Creator
Einrichtung
Abstract / Bemerkung
Data sets and results of the comparative analyses of [GeFaST](https://github.com/romueller/gefast) performed in "[GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2349-1)" (. The scripts for the analyses are available [here](https://github.com/romueller/gefast-paper-analysis).
**dereplicated.tar.bz2**: Data sets used in the analyses of performance (_ELDERMET_ [1]) and clustering quality (_even_ & _uneven_ [2], _ELDERMET_). The original data sets (see below) have been dereplicated and sequences containing ambiguous bases (IUPAC code _n_ resp. _N_) have been deleted.
* _even_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2) * _uneven_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2) * _ELDERMET_: [http://www.ebi.ac.uk/ena/data/view/SRP003158](http://www.ebi.ac.uk/ena/data/view/SRP003158) The analysis of the clustering quality also requires the [reference data set](https://raw.githubusercontent.com/torognes/vsearch-eval/master/cluster/data/rrna_reference.fasta).
**eldermet\_subsamples\_X.tar.bz2**: Each archive contains three random subsamples of _ELDERMET_ of size X, with X being the percentage of sequences from _eldermet\_derep.fasta_ (in _dereplicated.tar.bz2_) in the subsample.
**uneven\_subsamples\_80.tar.bz2**: Archive containing five random subsamples of _uneven_, each containing 80 % of the sequences from _uneven\_derep.fasta_ (in _dereplicated.tar.bz2_).
**even\_subsamples\_80.tar.bz2**: Archive containing five random subsamples of _even_, each containing 80 % of the sequences from _even\_derep.fasta_ (in _dereplicated.tar.bz2_).
**eldermet\_reduced\_subsamples\_80.tar.bz2**: Archive containing the reduced _ELDERMET_ data set and five random subsamples of it, each containing 80 % of the sequences from _eldermet\_derep.reduced.fasta_, plus the corresponding taxonomic assignments.
**results.tar.bz2**: Results files containing the measurements of performance resp. clustering quality.
* _eldermet-performance-measurements.csv_: runtime and memory consumption for different thresholds * _eldermet-subsampling-measurements.csv_: runtime and memory consumption for different data set sizes * _eldermet-sub-fixed-red-log.csv_: runtime and memory consumption for different thresholds (on subsamples of reduced data set) * _eldermet-sub-fixed-red-metrics.csv_: clustering quality for different thresholds (on subsamples of reduced data set) * _even\_0.95-metrics.csv_: clustering quality for different thresholds, 95 % ground truth * _even\_0.97-metrics.csv_: clustering quality for different thresholds, 97 % ground truth * _even\_0.99-metrics.csv_: clustering quality for different thresholds, 99 % ground truth * _uneven\_0.95-metrics.csv_: clustering quality for different thresholds, 95 % ground truth * _uneven\_0.97-metrics.csv_: clustering quality for different thresholds, 97 % ground truth * _uneven\_0.99-metrics.csv_: clustering quality for different thresholds, 99 % ground truth * _uneven-sub-fixed-metrics.csv_: clustering quality for different thresholds (on subsamples) * _even-sub-fixed-metrics.csv_: clustering quality for different thresholds (on subsamples)
References: [1] Claesson M.J., Cusack S., O'Sullivan O., Greene-Diniz R., de Weerd H., Flannery E., Marchesi J.R., Falush D., Dinan T., Fitzgerald G., Stanton C., van Sinderen D., O'Connor M., Harnedy N., O'Connor K., Henry C., O'Mahony D., Fitzgerald A.P., Shanahan F., Twomey C., Hill C., Ross R.P., O'Toole P.W.: Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proceedings of the National Academy of Sciences 108 (Supplement 1), 4586-4591 (2011). doi: [10.1073/pnas.1000097107](https://doi.org/10.1073/pnas.1000097107) [2] Mahé F., Rognes T., Quince C., de Vargas C., Dunthorn M.: Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2, 593 (2014). doi: [10.7717/peerj.593](https://doi.org/10.7717/peerj.593)
**dereplicated.tar.bz2**: Data sets used in the analyses of performance (_ELDERMET_ [1]) and clustering quality (_even_ & _uneven_ [2], _ELDERMET_). The original data sets (see below) have been dereplicated and sequences containing ambiguous bases (IUPAC code _n_ resp. _N_) have been deleted.
* _even_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/even.fasta.bz2) * _uneven_: [http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2](http://sbr2.sb-roscoff.fr/download/externe/de/fmahe/uneven.fasta.bz2) * _ELDERMET_: [http://www.ebi.ac.uk/ena/data/view/SRP003158](http://www.ebi.ac.uk/ena/data/view/SRP003158) The analysis of the clustering quality also requires the [reference data set](https://raw.githubusercontent.com/torognes/vsearch-eval/master/cluster/data/rrna_reference.fasta).
**eldermet\_subsamples\_X.tar.bz2**: Each archive contains three random subsamples of _ELDERMET_ of size X, with X being the percentage of sequences from _eldermet\_derep.fasta_ (in _dereplicated.tar.bz2_) in the subsample.
**uneven\_subsamples\_80.tar.bz2**: Archive containing five random subsamples of _uneven_, each containing 80 % of the sequences from _uneven\_derep.fasta_ (in _dereplicated.tar.bz2_).
**even\_subsamples\_80.tar.bz2**: Archive containing five random subsamples of _even_, each containing 80 % of the sequences from _even\_derep.fasta_ (in _dereplicated.tar.bz2_).
**eldermet\_reduced\_subsamples\_80.tar.bz2**: Archive containing the reduced _ELDERMET_ data set and five random subsamples of it, each containing 80 % of the sequences from _eldermet\_derep.reduced.fasta_, plus the corresponding taxonomic assignments.
**results.tar.bz2**: Results files containing the measurements of performance resp. clustering quality.
* _eldermet-performance-measurements.csv_: runtime and memory consumption for different thresholds * _eldermet-subsampling-measurements.csv_: runtime and memory consumption for different data set sizes * _eldermet-sub-fixed-red-log.csv_: runtime and memory consumption for different thresholds (on subsamples of reduced data set) * _eldermet-sub-fixed-red-metrics.csv_: clustering quality for different thresholds (on subsamples of reduced data set) * _even\_0.95-metrics.csv_: clustering quality for different thresholds, 95 % ground truth * _even\_0.97-metrics.csv_: clustering quality for different thresholds, 97 % ground truth * _even\_0.99-metrics.csv_: clustering quality for different thresholds, 99 % ground truth * _uneven\_0.95-metrics.csv_: clustering quality for different thresholds, 95 % ground truth * _uneven\_0.97-metrics.csv_: clustering quality for different thresholds, 97 % ground truth * _uneven\_0.99-metrics.csv_: clustering quality for different thresholds, 99 % ground truth * _uneven-sub-fixed-metrics.csv_: clustering quality for different thresholds (on subsamples) * _even-sub-fixed-metrics.csv_: clustering quality for different thresholds (on subsamples)
References: [1] Claesson M.J., Cusack S., O'Sullivan O., Greene-Diniz R., de Weerd H., Flannery E., Marchesi J.R., Falush D., Dinan T., Fitzgerald G., Stanton C., van Sinderen D., O'Connor M., Harnedy N., O'Connor K., Henry C., O'Mahony D., Fitzgerald A.P., Shanahan F., Twomey C., Hill C., Ross R.P., O'Toole P.W.: Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proceedings of the National Academy of Sciences 108 (Supplement 1), 4586-4591 (2011). doi: [10.1073/pnas.1000097107](https://doi.org/10.1073/pnas.1000097107) [2] Mahé F., Rognes T., Quince C., de Vargas C., Dunthorn M.: Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2, 593 (2014). doi: [10.7717/peerj.593](https://doi.org/10.7717/peerj.593)
Erscheinungsjahr
2018
Copyright und Lizenzen
Page URI
https://pub.uni-bielefeld.de/record/2918928
Zitieren
Müller R, Nebel M. Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University; 2018.
Müller, R., & Nebel, M. (2018). Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University. https://doi.org/10.4119/unibi/2918928
Müller, Robert, and Nebel, Markus. 2018. Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University.
Müller, R., and Nebel, M. (2018). Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University.
Müller, R., & Nebel, M., 2018. Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach", Bielefeld University.
R. Müller and M. Nebel, Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach", Bielefeld University, 2018.
Müller, R., Nebel, M.: Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University (2018).
Müller, Robert, and Nebel, Markus. Evaluation data for "GeFaST: An improved method for OTU assignment by generalising Swarm's fastidious clustering approach". Bielefeld University, 2018.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Open Database License (ODbL) v1.0:
Volltext(e)
Name
dereplicated.tar.bz2
62.08 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
150e9e9a8ef7e2f0aaad5b1b1235010b
Name
eldermet_subsamples_5.tar.bz2
11.03 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
c6dd1cfa37d0161c004b72b28f819511
Name
eldermet_subsamples_10.tar.bz2
22.05 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
c689defcf74b1bcd8da8d23911a34b05
Name
eldermet_subsamples_15.tar.bz2
33.09 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
48dfec5f54580b35134887a7c047606a
Name
eldermet_subsamples_20.tar.bz2
44.12 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
6c1de7f90195215f188ee79b7746e0e1
Name
eldermet_subsamples_25.tar.bz2
55.15 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
6c575d4c49ab9bab518fa711b6ceb2d2
Name
eldermet_subsamples_30.tar.bz2
66.16 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
a2c93cf19d9469dd4d439923f4bb17b5
Name
eldermet_subsamples_35.tar.bz2
77.17 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
5ba90afc833f8a67dd9ea3f4ca06614b
Name
eldermet_subsamples_40.tar.bz2
88.22 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
97d1d6e8a6976c866fe77981f822a483
Name
eldermet_subsamples_45.tar.bz2
99.22 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
c10b0add3364da0242664403b53b9082
Name
eldermet_subsamples_50.tar.bz2
110.28 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
ccfd6baa5215dfb8f811d4913358bba1
Name
eldermet_subsamples_55.tar.bz2
121.31 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
96fe6feb471b772ff1070bebe47bee40
Name
eldermet_subsamples_60.tar.bz2
132.30 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
50f2ef13058193c8c0451bea4ec187a1
Name
eldermet_subsamples_65.tar.bz2
143.34 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
4931149ede73efa364f0d2687810197e
Name
eldermet_subsamples_70.tar.bz2
154.41 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
57862cf2129e84effe9149ca0a2dff65
Name
eldermet_subsamples_75.tar.bz2
165.40 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
1cc1663a043dec85f62afbd1d182a66f
Name
eldermet_subsamples_80.tar.bz2
176.41 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
55fbee72aa3804d6c910d5ac1f9f7e74
Name
eldermet_subsamples_85.tar.bz2
187.48 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
fdb91e96f120a8fd1c1c58a03540412d
Name
eldermet_subsamples_90.tar.bz2
198.49 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
acd9ff9e3e8d4c8d868c92735b3057f0
Name
eldermet_subsamples_95.tar.bz2
209.53 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
ba37acac4baba5a8f03b16fba8aa5021
Name
eldermet_subsamples_100.tar.bz2
220.55 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
80c2bf23dc9baed9131c8dd95bed0e69
Name
uneven_subsamples_80.tar.bz2
4.55 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
eacd41b4d2efc162030c53b9b4726801
Name
even_subsamples_80.tar.bz2
12.82 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
4442052e62cdd9c8f9990cf1700068cd
Name
eldermet_reduced_subsamples_80.tar.bz2
158.66 MB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
8b681f9a7ab1ae15370be2d26e402e16
Name
results.tar.bz2
61.26 KB
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:58Z
MD5 Prüfsumme
94a5cca09e90528cea373e0bcc0ef41a
Material in PUB:
Wird zitiert von
GeFaST: An improved method for OTU assignment by generalising Swarm’s fastidious clustering approach
Müller R, Nebel M (2018)
BMC Bioinformatics 19(1): 321.
Müller R, Nebel M (2018)
BMC Bioinformatics 19(1): 321.