Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings

Dörr D, Stoye J, Böcker S, Jahn K (2014)
BMC Genomics 15(Suppl. 6: Proc. of RECOMB-CG 2014): S2.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
Volltext vorhanden für diesen Nachweis
Abstract / Bemerkung
Background: Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Errors introduced in this process amplify in subsequent gene order analyses and thus may deteriorate gene cluster prediction. Results: In this work, we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure of the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes. Conclusions: Our model is able to detect gene clusters that would be also detected with well-established gene family-based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.
BMC Genomics
Suppl. 6: Proc. of RECOMB-CG 2014
Article Processing Charge funded by the Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of Bielefeld University.


Dörr D, Stoye J, Böcker S, Jahn K. Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics. 2014;15(Suppl. 6: Proc. of RECOMB-CG 2014):S2.
Dörr, D., Stoye, J., Böcker, S., & Jahn, K. (2014). Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics, 15(Suppl. 6: Proc. of RECOMB-CG 2014), S2. doi:10.1186/1471-2164-15-S6-S2
Dörr, D., Stoye, J., Böcker, S., and Jahn, K. (2014). Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics 15, S2.
Dörr, D., et al., 2014. Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics, 15(Suppl. 6: Proc. of RECOMB-CG 2014), p S2.
D. Dörr, et al., “Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings”, BMC Genomics, vol. 15, 2014, pp. S2.
Dörr, D., Stoye, J., Böcker, S., Jahn, K.: Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics. 15, S2 (2014).
Dörr, Daniel, Stoye, Jens, Böcker, Sebastian, and Jahn, Katharina. “Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings”. BMC Genomics 15.Suppl. 6: Proc. of RECOMB-CG 2014 (2014): S2.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Access Level
OA Open Access
Zuletzt Hochgeladen

1 Zitation in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

Finding approximate gene clusters with Gecko 3.
Winter S, Jahn K, Wehner S, Kuchenbecker L, Marz M, Stoye J, Bocker S., Nucleic Acids Res. 44(20), 2016
PMID: 27679480

29 References

Daten bereitgestellt von Europe PubMed Central.

Evolution of gene order conservation in prokaryotes
Molecular evidence for an ancient duplication of the entire yeast genome.
Wolfe KH, Shields DC., Nature 387(6634), 1997
PMID: 9192896
Algorithms for finding gene clusters
Quadratic time algorithms for finding common intervals in two and moresequences
Common intervals of multiple permutations
The algorithmic of gene teams
Identifying conserved gene clusters in the presence of homology families.
He X, Goldwasser MH., J. Comput. Biol. 12(6), 2005
PMID: 16108708
Integer linear programs for discovering approximate gene clusters
Computation of median gene clusters.
Bocker S, Jahn K, Mixtacki J, Stoye J., J. Comput. Biol. 16(8), 2009
PMID: 19689215
The COG database: an updated version includes eukaryotes.
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA., BMC Bioinformatics 4(), 2003
PMID: 12969510
How environmental solution conditions determine the compaction velocity of single DNA molecules.
Hirano K, Ichikawa M, Ishido T, Ishikawa M, Baba Y, Yoshikawa K., Nucleic Acids Res. 40(1), 2011
PMID: 21896618
OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011.
Waterhouse RM, Zdobnov EM, Tegenfeldt F, Li J, Kriventseva EV., Nucleic Acids Res. 39(Database issue), 2010
PMID: 20972218
OrthoMCL: identification of ortholog groups for eukaryotic genomes.
Li L, Stoeckert CJ Jr, Roos DS., Genome Res. 13(9), 2003
PMID: 12952885
InParanoid 7: new algorithms and tools for eukaryotic orthology analysis.
Ostlund G, Schmitt T, Forslund K, Kostler T, Messina DN, Roopra S, Frings O, Sonnhammer EL., Nucleic Acids Res. 38(Database issue), 2009
PMID: 19892828
Domain architecture comparison for multidomain homology identification.
Song N, Sedgewick RD, Durand D., J. Comput. Biol. 14(4), 2007
PMID: 17572026
Family classification without domain chaining.
Joseph JM, Durand D., Bioinformatics 25(12), 2009
PMID: 19478015
Genome-wide comparative gene family classification.
Frech C, Chen N., PLoS ONE 5(10), 2010
PMID: 20976221
Domains, motifs and clusters in the protein universe.
Liu J, Rost B., Curr Opin Chem Biol 7(1), 2003
PMID: 12547420
Algorithms on indeterminate strings
Fast algorithms to enumerate all common intervals of two permutations
Character sets of strings
Toward automatic reconstruction of a highly resolved tree of life.
Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P., Science 311(5765), 2006
PMID: 16513982
Metrics for GO based protein semantic similarity: a systematic evaluation.
Pesquita C, Faria D, Bastos H, Ferreira AE, Falcao AO, Couto FM., BMC Bioinformatics 9 Suppl 5(), 2008
PMID: 18460186
Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712
Proteinortho: detection of (co-)orthologs in large-scale analysis.
Lechner M, Findeiss S, Steiner L, Marz M, Stadler PF, Prohaska SJ., BMC Bioinformatics 12(), 2011
PMID: 21526987
RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more.
Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muniz-Rascado L, Garcia-Sotelo JS, Weiss V, Solano-Lira H, Martinez-Flores I, Medina-Rivera A, Salgado-Osorio G, Alquicira-Hernandez S, Alquicira-Hernandez K, Lopez-Fuentes A, Porron-Sotelo L, Huerta AM, Bonavides-Martinez C, Balderas-Martinez YI, Pannier L, Olvera M, Labastida A, Jimenez-Jacinto V, Vega-Alvarado L, Del Moral-Chavez V, Hernandez-Alvarez A, Morett E, Collado-Vides J., Nucleic Acids Res. 41(Database issue), 2012
PMID: 23203884


Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®


PMID: 25571793
PubMed | Europe PMC

Suchen in

Google Scholar