XML schemas for common bioinformatic data types and their application in workflow systems

Seibel PN, Krüger J, Hartmeier S, Schwarzer K, Löwenthal K, Mersch H, Dandekar T, Giegerich R (2006)
BMC Bioinformatics 7(1): 490.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
Seibel, Philipp N.; Krüger, JanUniBi; Hartmeier, SvenUniBi; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, RobertUniBi
Abstract / Bemerkung
Background: Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results: Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net. Conclusion: The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.
BMC Bioinformatics
Page URI


Seibel PN, Krüger J, Hartmeier S, et al. XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics. 2006;7(1): 490.
Seibel, P. N., Krüger, J., Hartmeier, S., Schwarzer, K., Löwenthal, K., Mersch, H., Dandekar, T., et al. (2006). XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics, 7(1), 490. https://doi.org/10.1186/1471-2105-7-490
Seibel, Philipp N., Krüger, Jan, Hartmeier, Sven, Schwarzer, Knut, Löwenthal, Kai, Mersch, Henning, Dandekar, Thomas, and Giegerich, Robert. 2006. “XML schemas for common bioinformatic data types and their application in workflow systems”. BMC Bioinformatics 7 (1): 490.
Seibel, P. N., Krüger, J., Hartmeier, S., Schwarzer, K., Löwenthal, K., Mersch, H., Dandekar, T., and Giegerich, R. (2006). XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics 7:490.
Seibel, P.N., et al., 2006. XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics, 7(1): 490.
P.N. Seibel, et al., “XML schemas for common bioinformatic data types and their application in workflow systems”, BMC Bioinformatics, vol. 7, 2006, : 490.
Seibel, P.N., Krüger, J., Hartmeier, S., Schwarzer, K., Löwenthal, K., Mersch, H., Dandekar, T., Giegerich, R.: XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics. 7, : 490 (2006).
Seibel, Philipp N., Krüger, Jan, Hartmeier, Sven, Schwarzer, Knut, Löwenthal, Kai, Mersch, Henning, Dandekar, Thomas, and Giegerich, Robert. “XML schemas for common bioinformatic data types and their application in workflow systems”. BMC Bioinformatics 7.1 (2006): 490.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Access Level
OA Open Access
Zuletzt Hochgeladen
MD5 Prüfsumme

13 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data.
Zhang C, Bijlard J, Staiger C, Scollen S, van Enckevort D, Hoogstrate Y, Senf A, Hiltemann S, Repo S, Pipping W, Bierkens M, Payralbe S, Stringer B, Heringa J, Stubbs A, Bonino Da Silva Santos LO, Belien J, Weistra W, Azevedo R, van Bochove K, Meijer G, Boiten JW, Rambla J, Fijneman R, Spalding JD, Abeln S., F1000Res 6(), 2017
PMID: 29123641
Experiences with workflows for automating data-intensive bioinformatics.
Spjuth O, Bongcam-Rudloff E, Hernández GC, Forer L, Giovacchini M, Guimera RV, Kallio A, Korpelainen E, Kańduła MM, Krachunov M, Kreil DP, Kulev O, Łabaj PP, Lampa S, Pireddu L, Schönherr S, Siretskiy A, Vassilev D., Biol Direct 10(), 2015
PMID: 26282399
Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package.
El-Kalioby M, Abouelhoda M, Krüger J, Giegerich R, Sczyrba A, Wall DP, Tonellato P., BMC Bioinformatics 13 Suppl 17(), 2012
PMID: 23281941
Conveyor: a workflow engine for bioinformatic analyses.
Linke B, Giegerich R, Goesmann A., Bioinformatics 27(7), 2011
PMID: 21278189
Towards interoperable and reproducible QSAR analyses: Exchange of datasets.
Spjuth O, Willighagen EL, Guha R, Eklund M, Wikberg JE., J Cheminform 2(1), 2010
PMID: 20591161
BioXSD: the common data-exchange format for everyday bioinformatics web services.
Kalas M, Puntervoll P, Joseph A, Bartaseviciūte E, Töpfer A, Venkataraman P, Pettifer S, Bryne JC, Ison J, Blanchet C, Rapacki K, Jonassen I., Bioinformatics 26(18), 2010
PMID: 20823319
Techniques for integrating -omics data.
Akula SP, Miriyala RN, Thota H, Rao AA, Gedela S., Bioinformation 3(6), 2009
PMID: 19255651
Trends in modeling Biomedical Complex Systems.
Milanesi L, Romano P, Castellani G, Remondini D, Liò P., BMC Bioinformatics 10 Suppl 12(), 2009
PMID: 19828068
GeneFisher-P: variations of GeneFisher as processes in Bio-jETI.
Lamprecht AL, Margaria T, Steffen B, Sczyrba A, Hartmeier S, Giegerich R., BMC Bioinformatics 9 Suppl 4(), 2008
PMID: 18460174
A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML).
Kottmann R, Gray T, Murphy S, Kagan L, Kravitz S, Lombardot T, Field D, Glöckner FO, Genomic Standards Consortium., OMICS 12(2), 2008
PMID: 18479204
RNA Movies 2: sequential animation of RNA secondary structures.
Kaiser A, Krüger J, Evers DJ., Nucleic Acids Res 35(web server issue), 2007
PMID: 17567618
Integrating sequence and structural biology with DAS.
Prlić A, Down TA, Kulesha E, Finn RD, Kähäri A, Hubbard TJ., BMC Bioinformatics 8(), 2007
PMID: 17850653
4SALE--a tool for synchronous RNA sequence and secondary structure alignment and editing.
Seibel PN, Müller T, Dandekar T, Schultz J, Wolf M., BMC Bioinformatics 7(), 2006
PMID: 17101042

62 References

Daten bereitgestellt von Europe PubMed Central.

MIPS: analysis and annotation of proteins from whole genomes in 2005.
Mewes HW, Frishman D, Mayer KF, Munsterkotter M, Noubibou O, Pagel P, Rattei T, Oesterheld M, Ruepp A, Stumpflen V., Nucleic Acids Res. 34(Database issue), 2006
PMID: 16381839
GenDB--an open source genome annotation system for prokaryote genomes.
Meyer F, Goesmann A, McHardy AC, Bartels D, Bekel T, Clausen J, Kalinowski J, Linke B, Rupp O, Giegerich R, Puhler A., Nucleic Acids Res. 31(8), 2003
PMID: 12682369
The PEDANT genome database in 2005.
Riley ML, Schmidt T, Wagner C, Mewes HW, Frishman D., Nucleic Acids Res. 33(Database issue), 2005
PMID: 15608204
Taverna: a tool for the composition and enactment of bioinformatics workflows.
Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P., Bioinformatics 20(17), 2004
PMID: 15201187
Wildfire: distributed, Grid-enabled workflow construction and execution.
Tang F, Chua CL, Ho LY, Lim YP, Issac P, Krishnan A., BMC Bioinformatics 6(), 2005
PMID: 15788106
Pegasys: software for executing and integrating analyses of biological sequences.
Shah SP, He DY, Sawkins JN, Druce JC, Quon G, Lett D, Zheng GX, Xu T, Ouellette BF., BMC Bioinformatics 5(), 2004
PMID: 15096276
Creating a bioinformatics nation.
Stein L., Nature 417(6885), 2002
PMID: 12000935
Designing and executing scientific workflows with a programmable integrator.
Chagoyen M, Kurul ME, De-Alarcon PA, Carazo JM, Gupta A., Bioinformatics 20(13), 2004
PMID: 15059834
Eclair--a web service for unravelling species origin of sequences sampled from mixed host interfaces.
Rudd S, Tetko IV., Nucleic Acids Res. 33(Web Server issue), 2005
PMID: 15980572
SOAP-based services provided by the European Bioinformatics Institute.
Pillai S, Silventoinen V, Kallio K, Senger M, Sobhany S, Tate J, Velankar S, Golovin A, Henrick K, Rice P, Stoehr P, Lopez R., Nucleic Acids Res. 33(Web Server issue), 2005
PMID: 15980463
Biosphere: the interoperation of web services in microarray cluster analysis.
Cheung KH, de Knikker R, Guo Y, Zhong G, Hager J, Yip KY, Kwan AK, Li P, Cheung DW., Appl. Bioinformatics 3(4), 2004
PMID: 15702956
AliasServer: a web server to handle multiple aliases used to refer to proteins.
Iragne F, Barre A, Goffard N, De Daruvar A., Bioinformatics 20(14), 2004
PMID: 15059813
Soap-HT-BLAST: high throughput BLAST based on Web services.
Wang J, Mu Q., Bioinformatics 19(14), 2003
PMID: 14512365
Biological SOAP servers and web services provided by the public sequence data bank.
Sugawara H, Miyazaki S., Nucleic Acids Res. 31(13), 2003
PMID: 12824432
INCLUSive: A web portal and service registry for microarray and regulatory sequence analysis.
Coessens B, Thijs G, Aerts S, Marchal K, De Smet F, Engelen K, Glenisson P, Moreau Y, Mathys J, De Moor B., Nucleic Acids Res. 31(13), 2003
PMID: 12824346
BioMOBY: an open source biological web services proposal.
Wilkinson MD, Links M., Brief. Bioinformatics 3(4), 2002
PMID: 12511062
Using the FASTA program to search protein and DNA sequence databases.
Pearson WR., Methods Mol. Biol. 24(), 1994
PMID: 8205202
Fast folding and comparison of RNA secondary structures
Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P., 1994
XML Schema language
WebServices technology based on XML
Database resources of the National Center for Biotechnology Information.
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Helmberg W, Kapustin Y, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR, Ostell J, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Suzek TO, Tatusov R, Tatusova TA, Wagner L, Yaschenko E., Nucleic Acids Res. 34(Database issue), 2006
PMID: 16381840
NCBI Entrez Programming Utilities
XML for Molecular Biology as compiled by Paul Gordon
The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.
Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C, Roechert B, Poux S, Jung E, Mersch H, Kersey P, Lappe M, Li Y, Zeng R, Rana D, Nikolski M, Husi H, Brun C, Shanker K, Grant SG, Sander C, Bork P, Zhu W, Pandey A, Brazma A, Jacq B, Vidal M, Sherman D, Legrain P, Cesareni G, Xenarios I, Eisenberg D, Steipe B, Hogue C, Apweiler R., Nat. Biotechnol. 22(2), 2004
PMID: 14755292
ProML--the protein markup language for specification of protein sequences, structures and families.
Hanisch D, Zimmer R, Lengauer T., In Silico Biol. (Gedrukt) 2(3), 2002
PMID: 12542416
Design and implementation of microarray gene expression markup language (MAGE-ML).
Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M, Swiatek M, Marks WL, Goncalves J, Markel S, Iordan D, Shojatalab M, Pizarro A, White J, Hubley R, Deutsch E, Senger M, Aronow BJ, Robinson A, Bassett D, Stoeckert CJ Jr, Brazma A., Genome Biol. 3(9), 2002
PMID: 12225585
The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models.
Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr JH, Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U, Le Novere N, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J; SBML Forum., Bioinformatics 19(4), 2003
PMID: 12611808
BioSchemas Subversion Repository
Sourceforge website
BioSchemas SourceForge Project Site
DIALIGN: multiple DNA and protein sequence alignment at BiBiServ.
Morgenstern B., Nucleic Acids Res. 32(Web Server issue), 2004
PMID: 15215344
RNAshapes: an integrated RNA analysis package based on abstract shapes.
Steffen P, Voss B, Rehmsmeier M, Reeder J, Giegerich R., Bioinformatics 22(4), 2005
PMID: 16357029
Computer prediction of RNA structure.
Zuker M., Meth. Enzymol. 180(), 1989
PMID: 2482418
Local similarity in RNA secondary structures.
Hochsmann M, Toller T, Giegerich R, Kurtz S., Proc IEEE Comput Soc Bioinform Conf 2(), 2003
PMID: 16452790
Secondary structure prediction for aligned RNA sequences.
Hofacker IL, Fekete M, Stadler PF., J. Mol. Biol. 319(5), 2002
PMID: 12079347
XSL Transformations (XSLT)
RNAML: a standard syntax for exchanging RNA information.
Waugh A, Gendron P, Altman R, Brown JW, Case D, Gautheret D, Harvey SC, Leontis N, Westbrook J, Westhof E, Zuker M, Major F., RNA 8(6), 2002
PMID: 12088144
PDBML: the representation of archival macromolecular structure data in XML.
Westbrook J, Ito N, Nakamura H, Henrick K, Berman HM., Bioinformatics 21(7), 2004
PMID: 15509603
BioDOM WebService
The Bio* toolkits--a brief overview.
Mangalam H., Brief. Bioinformatics 3(3), 2002
PMID: 12230038
HOBIT website
e2g: an interactive web-based server for efficiently mapping large EST and cDNA sets to genomic sequences.
Kruger J, Sczyrba A, Kurtz S, Giegerich R., Nucleic Acids Res. 32(Web Server issue), 2004
PMID: 15215398
Replacing Suffix Trees with Enhanced Suffix Arrays
Abouelhoda M, Kurtz S, Ohlebusch E., 2004
Prediction of complete gene structures in human genomic DNA.
Burge C, Karlin S., J. Mol. Biol. 268(1), 1997
PMID: 9149143
Comparison of bioinformatic XML schemas
The internal described spacer 2 database – a web server for (not only) low level phylogentic analyses
Schultz J, Müller T, Achtziger M, Seibel PN, Dandekar T, Wolf M., 2006
A common core of secondary structure of the internal transcribed spacer 2 (ITS2) throughout the Eukaryota.
Schultz J, Maisel S, Gerlach D, Muller T, Wolf M., RNA 11(4), 2005
PMID: 15769870
Homology modeling revealed more than 20,000 rRNA internal transcribed spacer 2 (ITS2) secondary structures.
Wolf M, Achtziger M, Schultz J, Dandekar T, Muller T., RNA 11(11), 2005
PMID: 16244129
REPuter: the manifold applications of repeat analysis on a genomic scale.
Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R., Nucleic Acids Res. 29(22), 2001
PMID: 11713313

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

PMID: 17087823
PubMed | Europe PMC

Suchen in

Google Scholar