Quality control for terms and definitions in ontologies and taxonomies

Köhler J, Munn K, Rüegg A, Skusa A, Smith B (2006)
BMC Bioinformatics 7(1): 212.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
Köhler, Jacob; Munn, Katherine; Rüegg, Alexander; Skusa, Andre; Smith, Barry
Abstract / Bemerkung
Background: Ontologies and taxonomies are among the most important computational resources for molecular biology and bioinformatics. A series of recent papers has shown that the Gene Ontology ( GO), the most prominent taxonomic resource in these fields, is marked by flaws of certain characteristic types, which flow from a failure to address basic ontological principles. As yet, no methods have been proposed which would allow ontology curators to pinpoint flawed terms or definitions in ontologies in a systematic way. Results: We present computational methods that automatically identify terms and definitions which are defined in a circular or unintelligible way. We further demonstrate the potential of these methods by applying them to isolate a subset of 6001 problematic GO terms. By automatically aligning GO with other ontologies and taxonomies we were able to propose alternative synonyms and definitions for some of these problematic terms. This allows us to demonstrate that these other resources do not contain definitions superior to those supplied by GO. Conclusion: Our methods provide reliable indications of the quality of terms and definitions in ontologies and taxonomies. Further, they are well suited to assist ontology curators in drawing their attention to those terms that are ill-defined. We have further shown the limitations of ontology mapping and alignment in assisting ontology curators in rectifying problems, thus pointing to the need for manual curation.
BMC Bioinformatics
Page URI


Köhler J, Munn K, Rüegg A, Skusa A, Smith B. Quality control for terms and definitions in ontologies and taxonomies. BMC Bioinformatics. 2006;7(1): 212.
Köhler, J., Munn, K., Rüegg, A., Skusa, A., & Smith, B. (2006). Quality control for terms and definitions in ontologies and taxonomies. BMC Bioinformatics, 7(1), 212. https://doi.org/10.1186/1471-2105-7-212
Köhler, Jacob, Munn, Katherine, Rüegg, Alexander, Skusa, Andre, and Smith, Barry. 2006. “Quality control for terms and definitions in ontologies and taxonomies”. BMC Bioinformatics 7 (1): 212.
Köhler, J., Munn, K., Rüegg, A., Skusa, A., and Smith, B. (2006). Quality control for terms and definitions in ontologies and taxonomies. BMC Bioinformatics 7:212.
Köhler, J., et al., 2006. Quality control for terms and definitions in ontologies and taxonomies. BMC Bioinformatics, 7(1): 212.
J. Köhler, et al., “Quality control for terms and definitions in ontologies and taxonomies”, BMC Bioinformatics, vol. 7, 2006, : 212.
Köhler, J., Munn, K., Rüegg, A., Skusa, A., Smith, B.: Quality control for terms and definitions in ontologies and taxonomies. BMC Bioinformatics. 7, : 212 (2006).
Köhler, Jacob, Munn, Katherine, Rüegg, Alexander, Skusa, Andre, and Smith, Barry. “Quality control for terms and definitions in ontologies and taxonomies”. BMC Bioinformatics 7.1 (2006): 212.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Access Level
OA Open Access
Zuletzt Hochgeladen
MD5 Prüfsumme

15 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

Relating Complexity and Error Rates of Ontology Concepts. More Complex NCIt Concepts Have More Errors.
Min H, Zheng L, Perl Y, Halper M, De Coronado S, Ochs C., Methods Inf Med 56(3), 2017
PMID: 28244549
Easy Extraction of Terms and Definitions with OWL2TL.
Judkins J, Utecht J, Brochhausen M., CEUR Workshop Proc 1747(), 2016
PMID: 28035214
Towards natural language question generation for the validation of ontologies and mappings.
Ben Abacha A, Dos Reis JC, Mrabet Y, Pruski C, Da Silveira M., J Biomed Semantics 7(1), 2016
PMID: 27502477
Measuring the evolution of ontology complexity: the gene ontology case study.
Dameron O, Bettembourg C, Le Meur N., PLoS One 8(10), 2013
PMID: 24146805
A UML profile for the OBO relation ontology.
Guardia GD, Vêncio RZ, de Farias CR., BMC Genomics 13 Suppl 5(), 2012
PMID: 23095840
Saliva Ontology: an ontology-based framework for a Salivaomics Knowledge Base.
Ai J, Smith B, Wong DT., BMC Bioinformatics 11(), 2010
PMID: 20525291
Structural group auditing of a UMLS semantic type's extent.
Chen Y, Gu HH, Perl Y, Geller J, Halper M., J Biomed Inform 42(1), 2009
PMID: 18619563
Structural group-based auditing of missing hierarchical relationships in UMLS.
Chen Y, Gu HH, Perl Y, Geller J., J Biomed Inform 42(3), 2009
PMID: 18824248
Data integration for plant genomics--exemplars from the integration of Arabidopsis thaliana databases.
Lysenko A, Hindle MM, Taubert J, Saqi M, Rawlings CJ., Brief Bioinform 10(6), 2009
PMID: 19933213
Two Is Not Always Better Than One: A Critical Evaluation of Two-System Theories.
Keren G, Schul Y., Perspect Psychol Sci 4(6), 2009
PMID: 26161732
Automated comparative auditing of NCIT genomic roles using NCBI.
Cohen B, Oren M, Min H, Perl Y, Halper M., J Biomed Inform 41(6), 2008
PMID: 18486558
Experiences mapping a legacy interface terminology to SNOMED CT.
Wade G, Rosenbloom ST., BMC Med Inform Decis Mak 8 Suppl 1(), 2008
PMID: 19007440
Topological analysis of large-scale biomedical terminology structures.
Bales ME, Lussier YA, Johnson SB., J Am Med Inform Assoc 14(6), 2007
PMID: 17712094
Protein microarray platforms for clinical proteomics.
Pollard HB, Srivastava M, Eidelman O, Jozwik C, Rothwell SW, Mueller GP, Jacobowitz DM, Darling T, Guggino WB, Wright J, Zeitlin PL, Paweletz CP., Proteomics - Clinical Applications 1(9), 2007
PMID: C1879
Bio-ontologies: current trends and future directions.
Bodenreider O, Stevens R., Brief Bioinform 7(3), 2006
PMID: 16899495

46 References

Daten bereitgestellt von Europe PubMed Central.

Creating the gene ontology resource: design and implementation.
Gene Ontology Consortium., Genome Res. 11(8), 2001
PMID: 11483584
The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.
Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R., Nucleic Acids Res. 32(Database issue), 2004
PMID: 14681408
The Gene Ontology (GO) database and informatics resource.
Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R; Gene Ontology Consortium., Nucleic Acids Res. 32(Database issue), 2004
PMID: 14681407
Terminology-driven literature mining and knowledge acquisition in biomedicine.
Nenadic G, Mima H, Spasic I, Ananiadou S, Tsujii J., Int J Med Inform 67(1-3), 2002
PMID: 12460630
Integration of Life Science Databases
Köhler J., 2004
A novel view on information content of concepts in extremely large ontologies.
Van Buggenhout C, Ceusters W., Stud Health Technol Inform 95(), 2003
PMID: 14664021

Ceusters W., 2001
Implications of compositionality in the gene ontology for its curation and usage.
Ogren PV, Cohen KB, Hunter L., Pac Symp Biocomput (), 2005
PMID: 15759624
Mistakes in medical ontologies: where do they come from and how can they be detected?
Ceusters W, Smith B, Kumar A, Dhaen C., Stud Health Technol Inform 102(), 2004
PMID: 15853269
Comparing Sets of Semantic Relations in Ontologies
Hovy EH., 2002

Noy NF, McGuinness DL., 2001
A reference ontology for biomedical informatics: the Foundational Model of Anatomy.
Rosse C, Mejino JL Jr., J Biomed Inform 36(6), 2003
PMID: 14759820
Ontologies for molecular biology and bioinformatics
Schulze-Kremer S., 2002

Smith B, Köhler J, Kumar A., 2004
The Role of Foundational Relations in the Alignment of Biomedical Ontologies: ; San Francisco.
Smith B, Rosse C., 2004
Relations in biomedical ontologies.
Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C., Genome Biol. 6(5), 2005
PMID: 15892874
Obol: integrating language and meaning in bio-ontologies.
Mungall CJ., Comp. Funct. Genomics 5(6-7), 2004
PMID: 18629143
SEMEDA: ontology based semantic integration of biological databases.
Kohler J, Philippi S, Lange M., Bioinformatics 19(18), 2003
PMID: 14668226
A methodology to migrate the gene ontology to a description logic environment using DAML+OIL.
Wroe CJ, Stevens R, Goble CA, Ashburner M., Pac Symp Biocomput (), 2003
PMID: 12603063
The compositional structure of Gene Ontology terms.
Ogren PV, Cohen KB, Acquaah-Mensah GK, Eberlein J, Hunter L., Pac Symp Biocomput (), 2004
PMID: 14992505

Kumar A, Smith B., 2003
Semantic Conflict Resolution Ontology (SCROL): An Ontology for Detecting and Resolving Data and Schema-Level Semantic Conflicts
Ram S, Park J., 2004
Characterizing Quality of Knowledge on Semantic Web
Supekar K, Patel C, Lee Y., 2004

Parsia B, Sirin E, Kalyanpur A., 2005
Consistency Checking of Semantic Web Ontologies
Baclawski K, Kokar MM, Waldinger RJ, Kogut PA., 2002
Law and order: Assessing and enforcing compliance with ontological modeling principles in the Foundational Model of Anatomy
Zhang S, Bodenreider O., 2005
Knowledge acquisition, consistency checking and concurrency control for Gene Ontology (GO).
Yeh I, Karp PD, Noy NF, Altman RB., Bioinformatics 19(2), 2003
PMID: 12538245
ONTOMETRIC: A Method to Choose the Appropriate Ontology
Lozano-Tello A, Gomez-Perez A., 2004

Haldar A, Mahadevan S., 2000

Copi IM, Cohen C., 2004
GO Editorial Guide
The role of definitions in biomedical concept representation.
Michael J, Mejino JL Jr, Rosse C., Proc AMIA Symp (), 2001
PMID: 11825231
MetaCyc: a multiorganism database of metabolic pathways and enzymes.
Krieger CJ, Zhang P, Mueller LA, Wang A, Paley S, Arnaud M, Pick J, Rhee SY, Karp PD., Nucleic Acids Res. 32(Database issue), 2004
PMID: 14681452
WordNet : an electronic lexical database
Fellbaum C., 1998
Medical Subject Headings (MeSH).
Lipscomb CE., Bull Med Libr Assoc 88(3), 2000
PMID: 10928714

Automatic ontology construction from the literature.
Blaschke C, Valencia A., Genome Inform 13(), 2002
PMID: 14571389

Sanderson M, Croft WB., 1999
Building mouse phenotype ontologies.
Gkoutos GV, Green EC, Mallon AM, Hancock JM, Davidson D., Pac Symp Biocomput (), 2004
PMID: 14992502
IR and AI: Using Co-Occurrence Theory to Generate Lightweight Ontologies: September 03 - 07 2001; Munich, Germany.
Ding Y., 2001
Linking experimental results, biological networks and sequence analysis methods using Ontologies and Generalized Data Structures
Köhler J, Rawlings C, Verrier P, Mitchell R, Skusa A, Ruegg A, Philippi S., 2004
Mappings of External Classification Systems to GO
Graph-based analysis and visualization of experimental results with ONDEX.
Kohler J, Baumbach J, Taubert J, Specht M, Skusa A, Ruegg A, Rawlings C, Verrier P, Philippi S., Bioinformatics 22(11), 2006
PMID: 16533819

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

PMID: 16623942
PubMed | Europe PMC

Suchen in

Google Scholar