mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets

Wilhelm M, Kirchner M, Steen JAJ, Steen H (2012)
Molecular & Cellular Proteomics 11(1).

Journal Article | Published | English

No fulltext has been uploaded

; ; ;
Across a host of MS-driven-omics fields, researchers witness the acquisition of ever increasing amounts of high throughput MS data and face the need for their compact yet efficiently accessible storage. Addressing the need for an open data exchange format, the Proteomics Standards Initiative and the Seattle Proteome Center at the Institute for Systems Biology independently developed the mzData and mzXML formats, respectively. In a subsequent joint effort, they defined an ontology and associated controlled vocabulary that specifies the contents of MS data files, implemented as the newer mzML format. All three formats are based on XML and are thus not particularly efficient in either storage space requirements or read/write speed. This contribution introduces mz5, a complete reimplementation of the mzML ontology that is based on the efficient, industrial strength storage backend HDF5. Compared with the current mzML standard, this strategy yields an average file size reduction to similar to 54% and increases linear read and write speeds similar to 3-4-fold. The format is implemented as part of the ProteoWizard project and is available under a permissive Apache license. Additional information and download links are available from Molecular & Cellular Proteomics 11: 10.1074/mcp.O111.011379, 1-5, 2012.
Publishing Year

Cite this

Wilhelm M, Kirchner M, Steen JAJ, Steen H. mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets. Molecular & Cellular Proteomics. 2012;11(1).
Wilhelm, M., Kirchner, M., Steen, J. A. J., & Steen, H. (2012). mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets. Molecular & Cellular Proteomics, 11(1).
Wilhelm, M., Kirchner, M., Steen, J. A. J., and Steen, H. (2012). mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets. Molecular & Cellular Proteomics 11.
Wilhelm, M., et al., 2012. mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets. Molecular & Cellular Proteomics, 11(1).
M. Wilhelm, et al., “mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets”, Molecular & Cellular Proteomics, vol. 11, 2012.
Wilhelm, M., Kirchner, M., Steen, J.A.J., Steen, H.: mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets. Molecular & Cellular Proteomics. 11, (2012).
Wilhelm, Mathias, Kirchner, Marc, Steen, Judith A. J., and Steen, Hanno. “mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets”. Molecular & Cellular Proteomics 11.1 (2012).
This data publication is cited in the following publications:
This publication cites the following data publications:

6 Citations in Europe PMC

Data provided by Europe PubMed Central.

Thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry.
Franken H, Mathieson T, Childs D, Sweetman GM, Werner T, Togel I, Doce C, Gade S, Bantscheff M, Drewes G, Reinhard FB, Huber W, Savitski MM., Nat Protoc 10(10), 2015
PMID: 26379230
mzDB: a file format using multiple indexing strategies for the efficient analysis of large LC-MS/MS and SWATH-MS data sets.
Bouyssie D, Dubois M, Nasso S, Gonzalez de Peredo A, Burlet-Schiltz O, Aebersold R, Monsarrat B., Mol. Cell Proteomics 14(3), 2015
PMID: 25505153
Numerical compression schemes for proteomics mass spectrometry data.
Teleman J, Dowsey AW, Gonzalez-Galarza FF, Perkins S, Pratt B, Rost HL, Malmstrom L, Malmstrom J, Jones AR, Deutsch EW, Levander F., Mol. Cell Proteomics 13(6), 2014
PMID: 24677029
Open source libraries and frameworks for mass spectrometry based proteomics: a developer's perspective.
Perez-Riverol Y, Wang R, Hermjakob H, Muller M, Vesada V, Vizcaino JA., Biochim. Biophys. Acta 1844(1 Pt A), 2014
PMID: 23467006
The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary.
Mayer G, Montecchi-Palazzi L, Ovelleiro D, Jones AR, Binz PA, Deutsch EW, Chambers M, Kallhardt M, Levander F, Shofstahl J, Orchard S, Vizcaino JA, Hermjakob H, Stephan C, Meyer HE, Eisenacher M; HUPO-PSI Group., Database (Oxford) 2013(), 2013
PMID: 23482073

21 References

Data provided by Europe PubMed Central.

ProteoWizard: open source software for rapid proteomics tools development.
Kessner D, Chambers M, Burke R, Agus D, Mallick P., Bioinformatics 24(21), 2008
PMID: 18606607
A guided tour of the Trans-Proteomic Pipeline.
Deutsch EW, Mendoza L, Shteynberg D, Farrah T, Lam H, Tasman N, Sun Z, Nilsson E, Pratt B, Prazen B, Eng JK, Martin DB, Nesvizhskii AI, Aebersold R., Proteomics 10(6), 2010
PMID: 20101611
OpenMS and TOPP: open source software for LC-MS data analysis.
Bertsch A, Gropl C, Reinert K, Kohlbacher O., Methods Mol. Biol. 696(), 2011
PMID: 21063960
A common open representation of mass spectrometry data and its application to proteomics research.
Pedrioli PG, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti RH, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu W, Aebersold R., Nat. Biotechnol. 22(11), 2004
PMID: 15529173
Five years of progress in the Standardization of Proteomics Data 4th Annual Spring Workshop of the HUPO-Proteomics Standards Initiative April 23-25, 2007 Ecole Nationale Superieure (ENS), Lyon, France.
Orchard S, Montechi-Palazzi L, Deutsch EW, Binz PA, Jones AR, Paton N, Pizarro A, Creasy DM, Wojcik J, Hermjakob H., Proteomics 7(19), 2007
PMID: 17907277

mzML--a community standard for mass spectrometry data.
Martens L, Chambers M, Sturm M, Kessner D, Levander F, Shofstahl J, Tang WH, Rompp A, Neumann S, Pizarro AD, Montecchi-Palazzi L, Tasman N, Coleman M, Reisinger F, Souda P, Hermjakob H, Binz PA, Deutsch EW., Mol. Cell Proteomics 10(1), 2011
PMID: 20716697
Data deposition as an integral part of the publication process
Orchard S.., 2009
PRIDE: new developments and new datasets.
Jones P, Cote RG, Cho SY, Klie S, Martens L, Quinn AF, Thorneycroft D, Hermjakob H., Nucleic Acids Res. 36(Database issue), 2008
PMID: 18033805
Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry.
Desiere F, Deutsch EW, Nesvizhskii AI, Mallick P, King NL, Eng JK, Aderem A, Boyle R, Brunner E, Donohoe S, Fausto N, Hafen E, Hood L, Katze MG, Kennedy KA, Kregenow F, Lee H, Lin B, Martin D, Ranish JA, Rawlings DJ, Samelson LE, Shiio Y, Watts JD, Wollscheid B, Wright ME, Yan W, Yang L, Yi EC, Zhang H, Aebersold R., Genome Biol. 6(1), 2005
PMID: 15642101
Hierarchical data format version 5
Adaptive informatics for multifactorial and high-content biological data.
Millard BL, Niepel M, Menden MP, Muhlich JL, Sorger PK., Nat. Methods 8(6), 2011
PMID: 21516115
LOFAR and HDF5: Toward a new radio data standard
Anderson K., Alexov A., Baehren L., Griessmeier J., Wise M., Renting A.., 2010
The ALPS project release 2.0: Open source software for strongly correlated systems
Bauer B., Carr L., Evertz H., Feiguin A., Freire J., Fuchs S., Gamper L., Gukelberger J., Gull E., Guertler S., Hehn A., Igarashi R., Isakov S., Koop D., Ma P., Mates P., Matsuo H., Parcollet O., Pawlowski G., Picon J., Pollet L., Santos E., Scarola V., Schollwöck U., Silva C., Surer B., Todo S., Trebst S., Troyer M., Wall M., Werner P., Wessel S.., 2011
Unifying Biological Image Formats with HDF5.
Dougherty MT, Folk MJ, Zadok E, Bernstein HJ, Bernstein FC, Eliceiri KW, Benger W, Best C., Commun ACM 52(10), 2009
PMID: 21218176
Tuning HDF5 for lustre file systems
Howison M., Koziol Q., Knaak D., Mainzer J., Shalf J.., 2010
Protein identification by spectral networks analysis.
Bandeira N, Tsur D, Frank A, Pevzner PA., Proc. Natl. Acad. Sci. U.S.A. 104(15), 2007
PMID: 17404225
Development and validation of a spectral library searching method for peptide identification from MS/MS.
Lam H, Deutsch EW, Eddes JS, Eng JK, King N, Stein SE, Aebersold R., Proteomics 7(5), 2007
PMID: 17295354
mzServer: Web-based programmatic access for mass spectrometry data analysis
Askenazi M., Webber J., Marto J.., 2011
mzResults: An interactive viewer for interrogation and distribution of proteomics results
Webber J., Askenazi M., Marto J.., 2011


0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®


PMID: 21960719
PubMed | Europe PMC

Search this title in

Google Scholar