XML schemas for common bioinformatic data types and their application in workflow systems

Seibel, Philipp N.; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert

XML schemas for common bioinformatic data types and their application in workflow systems

Seibel PN, Krüger J, Hartmeier S, Schwarzer K, Löwenthal K, Mersch H, Dandekar T, Giegerich R (2006)
BMC Bioinformatics 7(1): 490.

Zeitschriftenaufsatz | Veröffentlicht | Englisch

Download

1471_2105_7_490.pdf

DOI

https://doi.org/10.1186/1471-2105-7-490

URN

urn:nbn:de:0070-pub-17738718

Autor*in

Seibel, Philipp N.; Krüger, Jan^UniBi; Hartmeier, Sven^UniBi; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert^UniBi

Einrichtung

Centrum für Biotechnologie > Arbeitsgruppe R. Giegerich
Technische Fakultät > AG Praktische Informatik
Centrum für Biotechnologie > Institut für Bioinformatik

Abstract / Bemerkung

Background: Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results: Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net. Conclusion: The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.

Erscheinungsjahr

2006

Zeitschriftentitel

BMC Bioinformatics

Band

Ausgabe

Art.-Nr.

490

ISSN

1471-2105

Page URI

https://pub.uni-bielefeld.de/record/1773871

Zitieren

Seibel PN, Krüger J, Hartmeier S, et al. XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics. 2006;7(1): 490.

Seibel, P. N., Krüger, J., Hartmeier, S., Schwarzer, K., Löwenthal, K., Mersch, H., Dandekar, T., et al. (2006). XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics, 7(1), 490. https://doi.org/10.1186/1471-2105-7-490

Seibel, Philipp N., Krüger, Jan, Hartmeier, Sven, Schwarzer, Knut, Löwenthal, Kai, Mersch, Henning, Dandekar, Thomas, and Giegerich, Robert. 2006. “XML schemas for common bioinformatic data types and their application in workflow systems”. BMC Bioinformatics 7 (1): 490.

Seibel, P. N., Krüger, J., Hartmeier, S., Schwarzer, K., Löwenthal, K., Mersch, H., Dandekar, T., and Giegerich, R. (2006). XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics 7:490.

Seibel, P.N., et al., 2006. XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics, 7(1): 490.

P.N. Seibel, et al., “XML schemas for common bioinformatic data types and their application in workflow systems”, BMC Bioinformatics, vol. 7, 2006, : 490.

Seibel, P.N., Krüger, J., Hartmeier, S., Schwarzer, K., Löwenthal, K., Mersch, H., Dandekar, T., Giegerich, R.: XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics. 7, : 490 (2006).

Seibel, Philipp N., Krüger, Jan, Hartmeier, Sven, Schwarzer, Knut, Löwenthal, Kai, Mersch, Henning, Dandekar, Thomas, and Giegerich, Robert. “XML schemas for common bioinformatic data types and their application in workflow systems”. BMC Bioinformatics 7.1 (2006): 490.

Alle Dateien verfügbar unter der/den folgenden Lizenz(en):

Copyright Statement:

Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]

Volltext(e)

Name

1471_2105_7_490.pdf

Access Level

Open Access

Zuletzt Hochgeladen

2019-09-06T08:48:09Z

MD5 Prüfsumme

b1f5b68b6f81104ec97287f4f389e64b

Daten bereitgestellt von European Bioinformatics Institute (EBI)

13 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data.
Zhang C, Bijlard J, Staiger C, Scollen S, van Enckevort D, Hoogstrate Y, Senf A, Hiltemann S, Repo S, Pipping W, Bierkens M, Payralbe S, Stringer B, Heringa J, Stubbs A, Bonino Da Silva Santos LO, Belien J, Weistra W, Azevedo R, van Bochove K, Meijer G, Boiten JW, Rambla J, Fijneman R, Spalding JD, Abeln S., F1000Res 6(), 2017
PMID: 29123641

Experiences with workflows for automating data-intensive bioinformatics.
Spjuth O, Bongcam-Rudloff E, Hernández GC, Forer L, Giovacchini M, Guimera RV, Kallio A, Korpelainen E, Kańduła MM, Krachunov M, Kreil DP, Kulev O, Łabaj PP, Lampa S, Pireddu L, Schönherr S, Siretskiy A, Vassilev D., Biol Direct 10(), 2015
PMID: 26282399

Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package.
El-Kalioby M, Abouelhoda M, Krüger J, Giegerich R, Sczyrba A, Wall DP, Tonellato P., BMC Bioinformatics 13 Suppl 17(), 2012
PMID: 23281941

Conveyor: a workflow engine for bioinformatic analyses.
Linke B, Giegerich R, Goesmann A., Bioinformatics 27(7), 2011
PMID: 21278189

Towards interoperable and reproducible QSAR analyses: Exchange of datasets.
Spjuth O, Willighagen EL, Guha R, Eklund M, Wikberg JE., J Cheminform 2(1), 2010
PMID: 20591161

BioXSD: the common data-exchange format for everyday bioinformatics web services.
Kalas M, Puntervoll P, Joseph A, Bartaseviciūte E, Töpfer A, Venkataraman P, Pettifer S, Bryne JC, Ison J, Blanchet C, Rapacki K, Jonassen I., Bioinformatics 26(18), 2010
PMID: 20823319

Techniques for integrating -omics data.
Akula SP, Miriyala RN, Thota H, Rao AA, Gedela S., Bioinformation 3(6), 2009
PMID: 19255651

Trends in modeling Biomedical Complex Systems.
Milanesi L, Romano P, Castellani G, Remondini D, Liò P., BMC Bioinformatics 10 Suppl 12(), 2009
PMID: 19828068

GeneFisher-P: variations of GeneFisher as processes in Bio-jETI.
Lamprecht AL, Margaria T, Steffen B, Sczyrba A, Hartmeier S, Giegerich R., BMC Bioinformatics 9 Suppl 4(), 2008
PMID: 18460174

A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML).
Kottmann R, Gray T, Murphy S, Kagan L, Kravitz S, Lombardot T, Field D, Glöckner FO, Genomic Standards Consortium., OMICS 12(2), 2008
PMID: 18479204

RNA Movies 2: sequential animation of RNA secondary structures.
Kaiser A, Krüger J, Evers DJ., Nucleic Acids Res 35(web server issue), 2007
PMID: 17567618

Integrating sequence and structural biology with DAS.
Prlić A, Down TA, Kulesha E, Finn RD, Kähäri A, Hubbard TJ., BMC Bioinformatics 8(), 2007
PMID: 17850653

4SALE--a tool for synchronous RNA sequence and secondary structure alignment and editing.
Seibel PN, Müller T, Dandekar T, Schultz J, Wolf M., BMC Bioinformatics 7(), 2006
PMID: 17101042

62 References

Daten bereitgestellt von Europe PubMed Central.

MIPS: analysis and annotation of proteins from whole genomes in 2005.
Mewes HW, Frishman D, Mayer KF, Munsterkotter M, Noubibou O, Pagel P, Rattei T, Oesterheld M, Ruepp A, Stumpflen V., Nucleic Acids Res. 34(Database issue), 2006
PMID: 16381839

GenDB--an open source genome annotation system for prokaryote genomes.
Meyer F, Goesmann A, McHardy AC, Bartels D, Bekel T, Clausen J, Kalinowski J, Linke B, Rupp O, Giegerich R, Puhler A., Nucleic Acids Res. 31(8), 2003
PMID: 12682369

The PEDANT genome database in 2005.
Riley ML, Schmidt T, Wagner C, Mewes HW, Frishman D., Nucleic Acids Res. 33(Database issue), 2005
PMID: 15608204

Taverna: a tool for the composition and enactment of bioinformatics workflows.
Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P., Bioinformatics 20(17), 2004
PMID: 15201187

Wildfire: distributed, Grid-enabled workflow construction and execution.
Tang F, Chua CL, Ho LY, Lim YP, Issac P, Krishnan A., BMC Bioinformatics 6(), 2005
PMID: 15788106

Pegasys: software for executing and integrating analyses of biological sequences.
Shah SP, He DY, Sawkins JN, Druce JC, Quon G, Lett D, Zheng GX, Xu T, Ouellette BF., BMC Bioinformatics 5(), 2004
PMID: 15096276

Creating a bioinformatics nation.
Stein L., Nature 417(6885), 2002
PMID: 12000935

Designing and executing scientific workflows with a programmable integrator.
Chagoyen M, Kurul ME, De-Alarcon PA, Carazo JM, Gupta A., Bioinformatics 20(13), 2004
PMID: 15059834

Eclair--a web service for unravelling species origin of sequences sampled from mixed host interfaces.
Rudd S, Tetko IV., Nucleic Acids Res. 33(Web Server issue), 2005
PMID: 15980572

SOAP-based services provided by the European Bioinformatics Institute.
Pillai S, Silventoinen V, Kallio K, Senger M, Sobhany S, Tate J, Velankar S, Golovin A, Henrick K, Rice P, Stoehr P, Lopez R., Nucleic Acids Res. 33(Web Server issue), 2005
PMID: 15980463

Biosphere: the interoperation of web services in microarray cluster analysis.
Cheung KH, de Knikker R, Guo Y, Zhong G, Hager J, Yip KY, Kwan AK, Li P, Cheung DW., Appl. Bioinformatics 3(4), 2004
PMID: 15702956

AliasServer: a web server to handle multiple aliases used to refer to proteins.
Iragne F, Barre A, Goffard N, De Daruvar A., Bioinformatics 20(14), 2004
PMID: 15059813

Soap-HT-BLAST: high throughput BLAST based on Web services.
Wang J, Mu Q., Bioinformatics 19(14), 2003
PMID: 14512365

Biological SOAP servers and web services provided by the public sequence data bank.
Sugawara H, Miyazaki S., Nucleic Acids Res. 31(13), 2003
PMID: 12824432

INCLUSive: A web portal and service registry for microarray and regulatory sequence analysis.
Coessens B, Thijs G, Aerts S, Marchal K, De Smet F, Engelen K, Glenisson P, Moreau Y, Mathys J, De Moor B., Nucleic Acids Res. 31(13), 2003
PMID: 12824346

BioMOBY: an open source biological web services proposal.
Wilkinson MD, Links M., Brief. Bioinformatics 3(4), 2002
PMID: 12511062

Using the FASTA program to search protein and DNA sequence databases.
Pearson WR., Methods Mol. Biol. 24(), 1994
PMID: 8205202

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
Thompson JD, Higgins DG, Gibson TJ., Nucleic Acids Res. 22(22), 1994
PMID: 7984417

Fast folding and comparison of RNA secondary structures
Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P., 1994

XML
AUTHOR UNKNOWN, 0

XML Schema language
AUTHOR UNKNOWN, 0

WebServices technology based on XML
AUTHOR UNKNOWN, 0

Database resources of the National Center for Biotechnology Information.
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Helmberg W, Kapustin Y, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR, Ostell J, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Suzek TO, Tatusov R, Tatusova TA, Wagner L, Yaschenko E., Nucleic Acids Res. 34(Database issue), 2006
PMID: 16381840

NCBI Entrez Programming Utilities
AUTHOR UNKNOWN, 0

XML for Molecular Biology as compiled by Paul Gordon
AUTHOR UNKNOWN, 0

The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.
Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C, Roechert B, Poux S, Jung E, Mersch H, Kersey P, Lappe M, Li Y, Zeng R, Rana D, Nikolski M, Husi H, Brun C, Shanker K, Grant SG, Sander C, Bork P, Zhu W, Pandey A, Brazma A, Jacq B, Vidal M, Sherman D, Legrain P, Cesareni G, Xenarios I, Eisenberg D, Steipe B, Hogue C, Apweiler R., Nat. Biotechnol. 22(2), 2004
PMID: 14755292

ProML--the protein markup language for specification of protein sequences, structures and families.
Hanisch D, Zimmer R, Lengauer T., In Silico Biol. (Gedrukt) 2(3), 2002
PMID: 12542416

Design and implementation of microarray gene expression markup language (MAGE-ML).
Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M, Swiatek M, Marks WL, Goncalves J, Markel S, Iordan D, Shojatalab M, Pizarro A, White J, Hubley R, Deutsch E, Senger M, Aronow BJ, Robinson A, Bassett D, Stoeckert CJ Jr, Brazma A., Genome Biol. 3(9), 2002
PMID: 12225585

PhyloXML
AUTHOR UNKNOWN, 0

The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models.
Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr JH, Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U, Le Novere N, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J; SBML Forum., Bioinformatics 19(4), 2003
PMID: 12611808

BioTypes
AUTHOR UNKNOWN, 0

HobitTypes
AUTHOR UNKNOWN, 0

HOBIT Wiki
AUTHOR UNKNOWN, 0

BioSchemas Subversion Repository
AUTHOR UNKNOWN, 0

Sourceforge website
AUTHOR UNKNOWN, 0

BioSchemas SourceForge Project Site
AUTHOR UNKNOWN, 0

Multiple sequence alignment with the Divide-and-Conquer method.
Stoye J., Gene 211(2), 1998
PMID: 9669886

DIALIGN: multiple DNA and protein sequence alignment at BiBiServ.
Morgenstern B., Nucleic Acids Res. 32(Web Server issue), 2004
PMID: 15215344

RNAshapes: an integrated RNA analysis package based on abstract shapes.
Steffen P, Voss B, Rehmsmeier M, Reeder J, Giegerich R., Bioinformatics 22(4), 2005
PMID: 16357029

Computer prediction of RNA structure.
Zuker M., Meth. Enzymol. 180(), 1989
PMID: 2482418

Local similarity in RNA secondary structures.
Hochsmann M, Toller T, Giegerich R, Kurtz S., Proc IEEE Comput Soc Bioinform Conf 2(), 2003
PMID: 16452790

Secondary structure prediction for aligned RNA sequences.
Hofacker IL, Fekete M, Stadler PF., J. Mol. Biol. 319(5), 2002
PMID: 12079347

XSL Transformations (XSLT)
AUTHOR UNKNOWN, 0

TinySeq
AUTHOR UNKNOWN, 0

INSDseq/EMBLxml
AUTHOR UNKNOWN, 0

RNAML: a standard syntax for exchanging RNA information.
Waugh A, Gendron P, Altman R, Brown JW, Case D, Gautheret D, Harvey SC, Leontis N, Westbrook J, Westhof E, Zuker M, Major F., RNA 8(6), 2002
PMID: 12088144

PDBML: the representation of archival macromolecular structure data in XML.
Westbrook J, Ito N, Nakamura H, Henrick K, Berman HM., Bioinformatics 21(7), 2004
PMID: 15509603

BioSchemas
AUTHOR UNKNOWN, 0

BioDOM WebService
AUTHOR UNKNOWN, 0

The Bio* toolkits--a brief overview.
Mangalam H., Brief. Bioinformatics 3(3), 2002
PMID: 12230038

HOBIT website
AUTHOR UNKNOWN, 0

e2g: an interactive web-based server for efficiently mapping large EST and cDNA sets to genomic sequences.
Kruger J, Sczyrba A, Kurtz S, Giegerich R., Nucleic Acids Res. 32(Web Server issue), 2004
PMID: 15215398

RepeatMasker
AUTHOR UNKNOWN, 0

Replacing Suffix Trees with Enhanced Suffix Arrays
Abouelhoda M, Kurtz S, Ohlebusch E., 2004

Vmatch
AUTHOR UNKNOWN, 0

Prediction of complete gene structures in human genomic DNA.
Burge C, Karlin S., J. Mol. Biol. 268(1), 1997
PMID: 9149143

Comparison of bioinformatic XML schemas
AUTHOR UNKNOWN, 0

The internal described spacer 2 database – a web server for (not only) low level phylogentic analyses
Schultz J, Müller T, Achtziger M, Seibel PN, Dandekar T, Wolf M., 2006

A common core of secondary structure of the internal transcribed spacer 2 (ITS2) throughout the Eukaryota.
Schultz J, Maisel S, Gerlach D, Muller T, Wolf M., RNA 11(4), 2005
PMID: 15769870

Homology modeling revealed more than 20,000 rRNA internal transcribed spacer 2 (ITS2) secondary structures.
Wolf M, Achtziger M, Schultz J, Dandekar T, Muller T., RNA 11(11), 2005
PMID: 16244129

Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics.
Reeder J, Giegerich R., BMC Bioinformatics 5(), 2004
PMID: 15294028

REPuter: the manifold applications of repeat analysis on a genomic scale.
Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R., Nucleic Acids Res. 29(22), 2001
PMID: 11713313

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 17087823
PubMed | Europe PMC

Suchen in

Google Scholar

PUB - Publikationen an der Universität Bielefeld