Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings

Dörr, Daniel; Stoye, Jens; Böcker, Sebastian; Jahn, Katharina

Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings

Dörr D, Stoye J, Böcker S, Jahn K (2014)
BMC Genomics 15(Suppl. 6: Proc. of RECOMB-CG 2014): S2.

Zeitschriftenaufsatz | Veröffentlicht | Englisch

Download

1471-2164-15-S6-S2.pdf

DOI

https://doi.org/10.1186/1471-2164-15-S6-S2

URN

urn:nbn:de:0070-pub-26870336

Autor*in

Dörr, Daniel^UniBi ; Stoye, Jens^UniBi ; Böcker, Sebastian; Jahn, Katharina^UniBi

Einrichtung

Technische Fakultät > Int. Graduiertenkolleg DiDy (GRK 1906)
Centrum für Biotechnologie > Graduate Center > Graduate Cluster Industrial Biotechnology
Centrum für Biotechnologie > Institut für Bioinformatik
Centrum für Biotechnologie > Arbeitsgruppe J. Stoye
Technische Fakultät > AG Genominformatik

Abstract / Bemerkung

Background: Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Errors introduced in this process amplify in subsequent gene order analyses and thus may deteriorate gene cluster prediction. Results: In this work, we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure of the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes. Conclusions: Our model is able to detect gene clusters that would be also detected with well-established gene family-based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.

Erscheinungsjahr

2014

Zeitschriftentitel

BMC Genomics

Band

Ausgabe

Suppl. 6: Proc. of RECOMB-CG 2014

Art.-Nr.

ISSN

1471-2164

Finanzierungs-Informationen

Open-Access-Publikationskosten wurden durch die Deutsche Forschungsgemeinschaft und die Universität Bielefeld gefördert.

Page URI

https://pub.uni-bielefeld.de/record/2687033

Zitieren

Dörr D, Stoye J, Böcker S, Jahn K. Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics. 2014;15(Suppl. 6: Proc. of RECOMB-CG 2014): S2.

Dörr, D., Stoye, J., Böcker, S., & Jahn, K. (2014). Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics, 15(Suppl. 6: Proc. of RECOMB-CG 2014), S2. doi:10.1186/1471-2164-15-S6-S2

Dörr, Daniel, Stoye, Jens, Böcker, Sebastian, and Jahn, Katharina. 2014. “Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings”. BMC Genomics 15 (Suppl. 6: Proc. of RECOMB-CG 2014): S2.

Dörr, D., Stoye, J., Böcker, S., and Jahn, K. (2014). Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics 15:S2.

Dörr, D., et al., 2014. Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics, 15(Suppl. 6: Proc. of RECOMB-CG 2014): S2.

D. Dörr, et al., “Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings”, BMC Genomics, vol. 15, 2014, : S2.

Dörr, D., Stoye, J., Böcker, S., Jahn, K.: Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics. 15, : S2 (2014).

Dörr, Daniel, Stoye, Jens, Böcker, Sebastian, and Jahn, Katharina. “Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings”. BMC Genomics 15.Suppl. 6: Proc. of RECOMB-CG 2014 (2014): S2.

Alle Dateien verfügbar unter der/den folgenden Lizenz(en):

Copyright Statement:

Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]

Volltext(e)

Name

1471-2164-15-S6-S2.pdf

Access Level

Open Access

Zuletzt Hochgeladen

2019-09-06T09:18:25Z

MD5 Prüfsumme

084536c32a0dcb8d56920d30f53c67e3

Daten bereitgestellt von European Bioinformatics Institute (EBI)

1 Zitation in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

Finding approximate gene clusters with Gecko 3.
Winter S, Jahn K, Wehner S, Kuchenbecker L, Marz M, Stoye J, Bocker S., Nucleic Acids Res. 44(20), 2016
PMID: 27679480

29 References

Daten bereitgestellt von Europe PubMed Central.

Evolution of gene order conservation in prokaryotes
AUTHOR UNKNOWN, 2001

Molecular evidence for an ancient duplication of the entire yeast genome.
Wolfe KH, Shields DC., Nature 387(6634), 1997
PMID: 9192896

Algorithms for finding gene clusters
AUTHOR UNKNOWN, 2001

Quadratic time algorithms for finding common intervals in two and moresequences
AUTHOR UNKNOWN, 2004

Common intervals of multiple permutations
AUTHOR UNKNOWN, 2011

The algorithmic of gene teams
AUTHOR UNKNOWN, 2002

Identifying conserved gene clusters in the presence of homology families.
He X, Goldwasser MH., J. Comput. Biol. 12(6), 2005
PMID: 16108708

Detecting gene clusters under evolutionary constraint in a large number of genomes.
Ling X, He X, Xin D., Bioinformatics 25(5), 2009
PMID: 19158161

Integer linear programs for discovering approximate gene clusters
AUTHOR UNKNOWN, 2006

Computation of median gene clusters.
Bocker S, Jahn K, Mixtacki J, Stoye J., J. Comput. Biol. 16(8), 2009
PMID: 19689215

Efficient computation of approximate gene clusters based on reference occurrences.
Jahn K., J. Comput. Biol. 18(9), 2011
PMID: 21899430

The COG database: an updated version includes eukaryotes.
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA., BMC Bioinformatics 4(), 2003
PMID: 12969510

How environmental solution conditions determine the compaction velocity of single DNA molecules.
Hirano K, Ichikawa M, Ishido T, Ishikawa M, Baba Y, Yoshikawa K., Nucleic Acids Res. 40(1), 2011
PMID: 21896618

OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011.
Waterhouse RM, Zdobnov EM, Tegenfeldt F, Li J, Kriventseva EV., Nucleic Acids Res. 39(Database issue), 2010
PMID: 20972218

MultiMSOAR 2.0: an accurate tool to identify ortholog groups among multiple genomes.
Shi G, Peng MC, Jiang T., PLoS ONE 6(6), 2011
PMID: 21712981

OrthoMCL: identification of ortholog groups for eukaryotic genomes.
Li L, Stoeckert CJ Jr, Roos DS., Genome Res. 13(9), 2003
PMID: 12952885

InParanoid 7: new algorithms and tools for eukaryotic orthology analysis.
Ostlund G, Schmitt T, Forslund K, Kostler T, Messina DN, Roopra S, Frings O, Sonnhammer EL., Nucleic Acids Res. 38(Database issue), 2009
PMID: 19892828

Domain architecture comparison for multidomain homology identification.
Song N, Sedgewick RD, Durand D., J. Comput. Biol. 14(4), 2007
PMID: 17572026

Family classification without domain chaining.
Joseph JM, Durand D., Bioinformatics 25(12), 2009
PMID: 19478015

Genome-wide comparative gene family classification.
Frech C, Chen N., PLoS ONE 5(10), 2010
PMID: 20976221

Domains, motifs and clusters in the protein universe.
Liu J, Rost B., Curr Opin Chem Biol 7(1), 2003
PMID: 12547420

Algorithms on indeterminate strings
AUTHOR UNKNOWN, 2003

Fast algorithms to enumerate all common intervals of two permutations
AUTHOR UNKNOWN, 2000

Character sets of strings
AUTHOR UNKNOWN, 2007

Toward automatic reconstruction of a highly resolved tree of life.
Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P., Science 311(5765), 2006
PMID: 16513982

Metrics for GO based protein semantic similarity: a systematic evaluation.
Pesquita C, Faria D, Bastos H, Ferreira AE, Falcao AO, Couto FM., BMC Bioinformatics 9 Suppl 5(), 2008
PMID: 18460186

Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712

Proteinortho: detection of (co-)orthologs in large-scale analysis.
Lechner M, Findeiss S, Steiner L, Marz M, Stadler PF, Prohaska SJ., BMC Bioinformatics 12(), 2011
PMID: 21526987

RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more.
Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muniz-Rascado L, Garcia-Sotelo JS, Weiss V, Solano-Lira H, Martinez-Flores I, Medina-Rivera A, Salgado-Osorio G, Alquicira-Hernandez S, Alquicira-Hernandez K, Lopez-Fuentes A, Porron-Sotelo L, Huerta AM, Bonavides-Martinez C, Balderas-Martinez YI, Pannier L, Olvera M, Labastida A, Jimenez-Jacinto V, Vega-Alvarado L, Del Moral-Chavez V, Hernandez-Alvarez A, Morett E, Collado-Vides J., Nucleic Acids Res. 41(Database issue), 2012
PMID: 23203884

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 25571793
PubMed | Europe PMC

Suchen in

Google Scholar

PUB - Publikationen an der Universität Bielefeld