XML schemas for common bioinformatic data types and their application in workflow systems

Seibel PN, Krüger J, Hartmeier S, Schwarzer K, Löwenthal K, Mersch H, Dandekar T, Giegerich R (2006)
BMC Bioinformatics 7(1).

Download
OA
Journal Article | Published | English
Author
; ; ; ; ; ; ;
Abstract
Background: Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results: Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net. Conclusion: The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.
Publishing Year
ISSN
PUB-ID

Cite this

Seibel PN, Krüger J, Hartmeier S, et al. XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics. 2006;7(1).
Seibel, P. N., Krüger, J., Hartmeier, S., Schwarzer, K., Löwenthal, K., Mersch, H., Dandekar, T., et al. (2006). XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics, 7(1).
Seibel, P. N., Krüger, J., Hartmeier, S., Schwarzer, K., Löwenthal, K., Mersch, H., Dandekar, T., and Giegerich, R. (2006). XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics 7.
Seibel, P.N., et al., 2006. XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics, 7(1).
P.N. Seibel, et al., “XML schemas for common bioinformatic data types and their application in workflow systems”, BMC Bioinformatics, vol. 7, 2006.
Seibel, P.N., Krüger, J., Hartmeier, S., Schwarzer, K., Löwenthal, K., Mersch, H., Dandekar, T., Giegerich, R.: XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics. 7, (2006).
Seibel, Philipp N., Krüger, Jan, Hartmeier, Sven, Schwarzer, Knut, Löwenthal, Kai, Mersch, Henning, Dandekar, Thomas, and Giegerich, Robert. “XML schemas for common bioinformatic data types and their application in workflow systems”. BMC Bioinformatics 7.1 (2006).
Main File(s)
Access Level
OA Open Access

This data publication is cited in the following publications:
This publication cites the following data publications:

12 Citations in Europe PMC

Data provided by Europe PubMed Central.

Experiences with workflows for automating data-intensive bioinformatics.
Spjuth O, Bongcam-Rudloff E, Hernandez GC, Forer L, Giovacchini M, Guimera RV, Kallio A, Korpelainen E, Kandula MM, Krachunov M, Kreil DP, Kulev O, Labaj PP, Lampa S, Pireddu L, Schonherr S, Siretskiy A, Vassilev D., Biol. Direct 10(), 2015
PMID: 26282399
Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package.
El-Kalioby M, Abouelhoda M, Kruger J, Giegerich R, Sczyrba A, Wall DP, Tonellato P., BMC Bioinformatics 13 Suppl 17(), 2012
PMID: 23281941
Conveyor: a workflow engine for bioinformatic analyses.
Linke B, Giegerich R, Goesmann A., Bioinformatics 27(7), 2011
PMID: 21278189
BioXSD: the common data-exchange format for everyday bioinformatics web services.
Kalas M, Puntervoll P, Joseph A, Bartaseviciute E, Topfer A, Venkataraman P, Pettifer S, Bryne JC, Ison J, Blanchet C, Rapacki K, Jonassen I., Bioinformatics 26(18), 2010
PMID: 20823319
Towards interoperable and reproducible QSAR analyses: Exchange of datasets.
Spjuth O, Willighagen EL, Guha R, Eklund M, Wikberg JE., J Cheminform 2(1), 2010
PMID: 20591161
Trends in modeling Biomedical Complex Systems.
Milanesi L, Romano P, Castellani G, Remondini D, Lio P., BMC Bioinformatics 10 Suppl 12(), 2009
PMID: 19828068
Techniques for integrating -omics data.
Akula SP, Miriyala RN, Thota H, Rao AA, Gedela S., Bioinformation 3(6), 2009
PMID: 19255651
A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML).
Kottmann R, Gray T, Murphy S, Kagan L, Kravitz S, Lombardot T, Field D, Glockner FO; Genomic Standards Consortium., OMICS 12(2), 2008
PMID: 18479204
GeneFisher-P: variations of GeneFisher as processes in Bio-jETI.
Lamprecht AL, Margaria T, Steffen B, Sczyrba A, Hartmeier S, Giegerich R., BMC Bioinformatics 9 Suppl 4(), 2008
PMID: 18460174
Integrating sequence and structural biology with DAS.
Prlic A, Down TA, Kulesha E, Finn RD, Kahari A, Hubbard TJ., BMC Bioinformatics 8(), 2007
PMID: 17850653
RNA Movies 2: sequential animation of RNA secondary structures.
Kaiser A, Kruger J, Evers DJ., Nucleic Acids Res. 35(Web Server issue), 2007
PMID: 17567618
4SALE--a tool for synchronous RNA sequence and secondary structure alignment and editing.
Seibel PN, Muller T, Dandekar T, Schultz J, Wolf M., BMC Bioinformatics 7(), 2006
PMID: 17101042

62 References

Data provided by Europe PubMed Central.

HOBIT website
AUTHOR UNKNOWN, 0
e2g: an interactive web-based server for efficiently mapping large EST and cDNA sets to genomic sequences.
Kruger J, Sczyrba A, Kurtz S, Giegerich R., Nucleic Acids Res. 32(Web Server issue), 2004
PMID: 15215398
RepeatMasker
AUTHOR UNKNOWN, 0
Replacing Suffix Trees with Enhanced Suffix Arrays
Abouelhoda M, Kurtz S, Ohlebusch E., 2004
Vmatch
AUTHOR UNKNOWN, 0
Prediction of complete gene structures in human genomic DNA.
Burge C, Karlin S., J. Mol. Biol. 268(1), 1997
PMID: 9149143
Comparison of bioinformatic XML schemas
AUTHOR UNKNOWN, 0
The internal described spacer 2 database – a web server for (not only) low level phylogentic analyses
Schultz J, Müller T, Achtziger M, Seibel PN, Dandekar T, Wolf M., 2006
A common core of secondary structure of the internal transcribed spacer 2 (ITS2) throughout the Eukaryota.
Schultz J, Maisel S, Gerlach D, Muller T, Wolf M., RNA 11(4), 2005
PMID: 15769870
Homology modeling revealed more than 20,000 rRNA internal transcribed spacer 2 (ITS2) secondary structures.
Wolf M, Achtziger M, Schultz J, Dandekar T, Muller T., RNA 11(11), 2005
PMID: 16244129
REPuter: the manifold applications of repeat analysis on a genomic scale.
Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R., Nucleic Acids Res. 29(22), 2001
PMID: 11713313

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 17087823
PubMed | Europe PMC

Search this title in

Google Scholar