Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets

Hoffmann N, Keck M, Neuweger H, Wilhelm M, Högy P, Niehaus K, Stoye J (2012)
BMC Bioinformatics 13(1): 21.

Download
OA
Journal Article | Original Article | Published | English
Abstract
Background Modern analytical methods in biology and chemistry use separation techniques coupled to sensitive detectors, such as gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS). These hyphenated methods provide high-dimensional data. Comparing such data manually to find corresponding signals is a laborious task, as each experiment usually consists of thousands of individual scans, each containing hundreds or even thousands of distinct signals. In order to allow for successful identification of metabolites or proteins within such data, especially in the context of metabolomics and proteomics, an accurate alignment and matching of corresponding features between two or more experiments is required. Such a matching algorithm should capture fluctuations in the chromatographic system which lead to non-linear distortions on the time axis, as well as systematic changes in recorded intensities. Many different algorithms for the retention time alignment of GC-MS and LC-MS data have been proposed and published, but all of them focus either on aligning previously extracted peak features or on aligning and comparing the complete raw data containing all available features. Results In this paper we introduce two algorithms for retention time alignment of multiple GC-MS datasets: multiple alignment by bidirectional best hits peak assignment and cluster extension (BIPACE) and center-star multiple alignment by pairwise partitioned dynamic time warping (CEMAPP-DTW). We show how the similarity-based peak group matching method BIPACE may be used for multiple alignment calculation individually and how it can be used as a preprocessing step for the pairwise alignments performed by CEMAPP-DTW. We evaluate the algorithms individually and in combination on a previously published small GC-MS dataset studying the Leishmania parasite and on a larger GC-MS dataset studying grains of wheat (Triticum aestivum). Conclusions We have shown that BIPACE achieves very high precision and recall and a very low number of false positive peak assignments on both evaluation datasets. CEMAPP-DTW finds a high number of true positives when executed on its own, but achieves even better results when BIPACE is used to constrain its search space. The source code of both algorithms is included in the OpenSource software framework Maltcms, which is available from http://maltcms.sf.net webcite. The evaluation scripts of the present study are available from the same source.
Publishing Year
ISSN
Financial disclosure
Article Processing Charge funded by the Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of Bielefeld University.
PUB-ID

Cite this

Hoffmann N, Keck M, Neuweger H, et al. Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets. BMC Bioinformatics. 2012;13(1):21.
Hoffmann, N., Keck, M., Neuweger, H., Wilhelm, M., Högy, P., Niehaus, K., & Stoye, J. (2012). Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets. BMC Bioinformatics, 13(1), 21. doi:10.1186/1471-2105-13-21
Hoffmann, N., Keck, M., Neuweger, H., Wilhelm, M., Högy, P., Niehaus, K., and Stoye, J. (2012). Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets. BMC Bioinformatics 13, 21.
Hoffmann, N., et al., 2012. Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets. BMC Bioinformatics, 13(1), p 21.
N. Hoffmann, et al., “Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets”, BMC Bioinformatics, vol. 13, 2012, pp. 21.
Hoffmann, N., Keck, M., Neuweger, H., Wilhelm, M., Högy, P., Niehaus, K., Stoye, J.: Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets. BMC Bioinformatics. 13, 21 (2012).
Hoffmann, Nils, Keck, Matthias, Neuweger, Heiko, Wilhelm, Mathias, Högy, Petra, Niehaus, Karsten, and Stoye, Jens. “Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets”. BMC Bioinformatics 13.1 (2012): 21.
Main File(s)
Access Level
OA Open Access
Last Uploaded
2015-12-22T10:06:15Z

This data publication is cited in the following publications:
This publication cites the following data publications:

9 Citations in Europe PMC

Data provided by Europe PubMed Central.

Supporting metabolomics with adaptable software: design architectures for the end-user.
Sarpe V, Schriemer DC., Curr. Opin. Biotechnol. 43(), 2017
PMID: 27870998
A hybrid retention time alignment algorithm for SWATH-MS data.
Wu L, Amon S, Lam H., Proteomics 16(15-16), 2016
PMID: 27302277
Further development of biomarkers in amyotrophic lateral sclerosis.
Blasco H, Vourc'h P, Pradat PF, Gordon PH, Andres CR, Corcia P., Expert Rev. Mol. Diagn. 16(8), 2016
PMID: 27275785
Comparative analysis of targeted metabolomics: dominance-based rough set approach versus orthogonal partial least square-discriminant analysis.
Blasco H, Blaszczynski J, Billaut JC, Nadal-Desbarats L, Pradat PF, Devos D, Moreau C, Andres CR, Emond P, Corcia P, Slowinski R., J Biomed Inform 53(), 2015
PMID: 25499899
BiPACE 2D--graph-based multiple alignment for comprehensive 2D gas chromatography-mass spectrometry.
Hoffmann N, Wilhelm M, Doebbe A, Niehaus K, Stoye J., Bioinformatics 30(7), 2014
PMID: 24363380
Nonlinear alignment of chromatograms by means of moving window fast Fourier transfrom cross-correlation.
Li Z, Wang JJ, Huang J, Zhang ZM, Lu HM, Zheng YB, Zhan DJ, Liang YZ., J Sep Sci 36(9-10), 2013
PMID: 23436496

35 References

Data provided by Europe PubMed Central.

Multiple alignment of continuous time series
AUTHOR UNKNOWN, 2005
Time-series alignment by non-negative multiple generalized canonical correlation analysis.
Fischer B, Roth V, Buhmann JM., BMC Bioinformatics 8 Suppl 10(), 2007
PMID: 18269698
Reducibility Among Combinatorial Problems
AUTHOR UNKNOWN, 1972
Complexity Results on Graphs with Few Cliques
AUTHOR UNKNOWN, 2007
Optimized time alignment algorithm for LC-MS data: correlation optimized warping using component detection algorithm-selected mass chromatograms.
Christin C, Smilde AK, Hoefsloot HC, Suits F, Bischoff R, Horvatovich PL., Anal. Chem. 80(18), 2008
PMID: 18715018
Symmetric time warping, Boltzmann pair probabilities and functional genomics.
Clote P, Straubhaar J., J Math Biol 53(1), 2006
PMID: 16791652
Effects of atmospheric CO2 enrichment on biomass, yield and low molecular weight metabolites in wheat grain
AUTHOR UNKNOWN, 2010
Effects of elevated CO2 on grain yield and quality of wheat: results from a 3-year free-air CO2 enrichment experiment.
Hogy P, Wieser H, Kohler P, Schwadorf K, Breuer J, Franzaring J, Muntifering R, Fangmeier A., Plant Biol (Stuttg) 11 Suppl 1(), 2009
PMID: 19778369
MeltDB: a software platform for the analysis and integration of metabolomics experiment data.
Neuweger H, Albaum SP, Dondrup M, Persicke M, Watt T, Niehaus K, Stoye J, Goesmann A., Bioinformatics 24(23), 2008
PMID: 18765459

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 22920415
PubMed | Europe PMC

Search this title in

Google Scholar