mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets

Wilhelm M, Kirchner M, Steen JAJ, Steen H (2012)
Molecular & Cellular Proteomics 11(1): O111.011379.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Wilhelm, MathiasUniBi; Kirchner, Marc; Steen, Judith A. J.; Steen, Hanno
Abstract / Bemerkung
Across a host of MS-driven-omics fields, researchers witness the acquisition of ever increasing amounts of high throughput MS data and face the need for their compact yet efficiently accessible storage. Addressing the need for an open data exchange format, the Proteomics Standards Initiative and the Seattle Proteome Center at the Institute for Systems Biology independently developed the mzData and mzXML formats, respectively. In a subsequent joint effort, they defined an ontology and associated controlled vocabulary that specifies the contents of MS data files, implemented as the newer mzML format. All three formats are based on XML and are thus not particularly efficient in either storage space requirements or read/write speed. This contribution introduces mz5, a complete reimplementation of the mzML ontology that is based on the efficient, industrial strength storage backend HDF5. Compared with the current mzML standard, this strategy yields an average file size reduction to similar to 54% and increases linear read and write speeds similar to 3-4-fold. The format is implemented as part of the ProteoWizard project and is available under a permissive Apache license. Additional information and download links are available from Molecular & Cellular Proteomics 11: 10.1074/mcp.O111.011379, 1-5, 2012.
Molecular & Cellular Proteomics
Page URI


Wilhelm M, Kirchner M, Steen JAJ, Steen H. mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets. Molecular & Cellular Proteomics. 2012;11(1): O111.011379.
Wilhelm, M., Kirchner, M., Steen, J. A. J., & Steen, H. (2012). mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets. Molecular & Cellular Proteomics, 11(1), O111.011379. doi:10.1074/mcp.O111.011379
Wilhelm, Mathias, Kirchner, Marc, Steen, Judith A. J., and Steen, Hanno. 2012. “mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets”. Molecular & Cellular Proteomics 11 (1): O111.011379.
Wilhelm, M., Kirchner, M., Steen, J. A. J., and Steen, H. (2012). mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets. Molecular & Cellular Proteomics 11:O111.011379.
Wilhelm, M., et al., 2012. mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets. Molecular & Cellular Proteomics, 11(1): O111.011379.
M. Wilhelm, et al., “mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets”, Molecular & Cellular Proteomics, vol. 11, 2012, : O111.011379.
Wilhelm, M., Kirchner, M., Steen, J.A.J., Steen, H.: mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets. Molecular & Cellular Proteomics. 11, : O111.011379 (2012).
Wilhelm, Mathias, Kirchner, Marc, Steen, Judith A. J., and Steen, Hanno. “mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets”. Molecular & Cellular Proteomics 11.1 (2012): O111.011379.

11 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

BASIS: High-performance bioinformatics platform for processing of large-scale mass spectrometry imaging data in chemically augmented histology.
Veselkov K, Sleeman J, Claude E, Vissers JPC, Galea D, Mroz A, Laponogov I, Towers M, Tonge R, Mirnezami R, Takats Z, Nicholson JK, Langridge JI., Sci Rep 8(1), 2018
PMID: 29511258
multiplierz v2.0: A Python-based ecosystem for shared access and analysis of native mass spectrometry data.
Alexander WM, Ficarro SB, Adelmant G, Marto JA., Proteomics 17(15-16), 2017
PMID: 28686798
JS-MS: a cross-platform, modular javascript viewer for mass spectrometry signals.
Rosen J, Handy K, Gillan A, Smith R., BMC Bioinformatics 18(1), 2017
PMID: 29110634
mzDB: a file format using multiple indexing strategies for the efficient analysis of large LC-MS/MS and SWATH-MS data sets.
Bouyssié D, Dubois M, Nasso S, Gonzalez de Peredo A, Burlet-Schiltz O, Aebersold R, Monsarrat B., Mol Cell Proteomics 14(3), 2015
PMID: 25505153
Thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry.
Franken H, Mathieson T, Childs D, Sweetman GM, Werner T, Tögel I, Doce C, Gade S, Bantscheff M, Drewes G, Reinhard FB, Huber W, Savitski MM., Nat Protoc 10(10), 2015
PMID: 26379230
Open source libraries and frameworks for mass spectrometry based proteomics: a developer's perspective.
Perez-Riverol Y, Wang R, Hermjakob H, Müller M, Vesada V, Vizcaíno JA., Biochim Biophys Acta 1844(1 pt a), 2014
PMID: 23467006
Numerical compression schemes for proteomics mass spectrometry data.
Teleman J, Dowsey AW, Gonzalez-Galarza FF, Perkins S, Pratt B, Röst HL, Malmström L, Malmström J, Jones AR, Deutsch EW, Levander F., Mol Cell Proteomics 13(6), 2014
PMID: 24677029
A guide for integration of proteomic data standards into laboratory workflows.
Medina-Aunon JA, Krishna R, Ghali F, Albar JP, Jones AJ., Proteomics 13(3-4), 2013
PMID: 23319203
The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary.
Mayer G, Montecchi-Palazzi L, Ovelleiro D, Jones AR, Binz PA, Deutsch EW, Chambers M, Kallhardt M, Levander F, Shofstahl J, Orchard S, Vizcaíno JA, Hermjakob H, Stephan C, Meyer HE, Eisenacher M, HUPO-PSI Group., Database (Oxford) 2013(), 2013
PMID: 23482073
Preparing to work with big data in proteomics - a report on the HUPO-PSI Spring Workshop: April 15-17, 2013, Liverpool, UK.
Orchard S, Binz PA, Jones AR, Vizcaino JA, Deutsch EW, Hermjakob H., Proteomics 13(20), 2013
PMID: 24108681

21 References

Daten bereitgestellt von Europe PubMed Central.

ProteoWizard: open source software for rapid proteomics tools development.
Kessner D, Chambers M, Burke R, Agus D, Mallick P., Bioinformatics 24(21), 2008
PMID: 18606607
A guided tour of the Trans-Proteomic Pipeline.
Deutsch EW, Mendoza L, Shteynberg D, Farrah T, Lam H, Tasman N, Sun Z, Nilsson E, Pratt B, Prazen B, Eng JK, Martin DB, Nesvizhskii AI, Aebersold R., Proteomics 10(6), 2010
PMID: 20101611
OpenMS and TOPP: open source software for LC-MS data analysis.
Bertsch A, Gropl C, Reinert K, Kohlbacher O., Methods Mol. Biol. 696(), 2011
PMID: 21063960
A common open representation of mass spectrometry data and its application to proteomics research.
Pedrioli PG, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti RH, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu W, Aebersold R., Nat. Biotechnol. 22(11), 2004
PMID: 15529173
Five years of progress in the Standardization of Proteomics Data 4th Annual Spring Workshop of the HUPO-Proteomics Standards Initiative April 23-25, 2007 Ecole Nationale Superieure (ENS), Lyon, France.
Orchard S, Montechi-Palazzi L, Deutsch EW, Binz PA, Jones AR, Paton N, Pizarro A, Creasy DM, Wojcik J, Hermjakob H., Proteomics 7(19), 2007
PMID: 17907277

mzML--a community standard for mass spectrometry data.
Martens L, Chambers M, Sturm M, Kessner D, Levander F, Shofstahl J, Tang WH, Rompp A, Neumann S, Pizarro AD, Montecchi-Palazzi L, Tasman N, Coleman M, Reisinger F, Souda P, Hermjakob H, Binz PA, Deutsch EW., Mol. Cell Proteomics 10(1), 2010
PMID: 20716697
Data deposition as an integral part of the publication process
Orchard S.., 2009
PRIDE: new developments and new datasets.
Jones P, Cote RG, Cho SY, Klie S, Martens L, Quinn AF, Thorneycroft D, Hermjakob H., Nucleic Acids Res. 36(Database issue), 2007
PMID: 18033805
Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry.
Desiere F, Deutsch EW, Nesvizhskii AI, Mallick P, King NL, Eng JK, Aderem A, Boyle R, Brunner E, Donohoe S, Fausto N, Hafen E, Hood L, Katze MG, Kennedy KA, Kregenow F, Lee H, Lin B, Martin D, Ranish JA, Rawlings DJ, Samelson LE, Shiio Y, Watts JD, Wollscheid B, Wright ME, Yan W, Yang L, Yi EC, Zhang H, Aebersold R., Genome Biol. 6(1), 2004
PMID: 15642101
Hierarchical data format version 5
Adaptive informatics for multifactorial and high-content biological data.
Millard BL, Niepel M, Menden MP, Muhlich JL, Sorger PK., Nat. Methods 8(6), 2011
PMID: 21516115
LOFAR and HDF5: Toward a new radio data standard
Anderson K., Alexov A., Baehren L., Griessmeier J., Wise M., Renting A.., 2010
The ALPS project release 2.0: Open source software for strongly correlated systems
Bauer B., Carr L., Evertz H., Feiguin A., Freire J., Fuchs S., Gamper L., Gukelberger J., Gull E., Guertler S., Hehn A., Igarashi R., Isakov S., Koop D., Ma P., Mates P., Matsuo H., Parcollet O., Pawlowski G., Picon J., Pollet L., Santos E., Scarola V., Schollwöck U., Silva C., Surer B., Todo S., Trebst S., Troyer M., Wall M., Werner P., Wessel S.., 2011
Unifying Biological Image Formats with HDF5.
Dougherty MT, Folk MJ, Zadok E, Bernstein HJ, Bernstein FC, Eliceiri KW, Benger W, Best C., Commun ACM 52(10), 2009
PMID: 21218176
Tuning HDF5 for lustre file systems
Howison M., Koziol Q., Knaak D., Mainzer J., Shalf J.., 2010
Protein identification by spectral networks analysis.
Bandeira N, Tsur D, Frank A, Pevzner PA., Proc. Natl. Acad. Sci. U.S.A. 104(15), 2007
PMID: 17404225
Development and validation of a spectral library searching method for peptide identification from MS/MS.
Lam H, Deutsch EW, Eddes JS, Eng JK, King N, Stein SE, Aebersold R., Proteomics 7(5), 2007
PMID: 17295354
mzServer: Web-based programmatic access for mass spectrometry data analysis
Askenazi M., Webber J., Marto J.., 2011
mzResults: An interactive viewer for interrogation and distribution of proteomics results
Webber J., Askenazi M., Marto J.., 2011

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

PMID: 21960719
PubMed | Europe PMC

Suchen in

Google Scholar