Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings

Dörr D, Stoye J, Böcker S, Jahn K (2014)
BMC Genomics 15(Suppl. 6: Proc. of RECOMB-CG 2014).

Download
OA
Journal Article | Published | English
Author
Abstract
Background: Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Errors introduced in this process amplify in subsequent gene order analyses and thus may deteriorate gene cluster prediction. Results: In this work, we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure of the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes. Conclusions: Our model is able to detect gene clusters that would be also detected with well-established gene family-based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.
Publishing Year
ISSN
Financial disclosure
Article Processing Charge funded by the Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of Bielefeld University.
PUB-ID

Cite this

Dörr D, Stoye J, Böcker S, Jahn K. Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics. 2014;15(Suppl. 6: Proc. of RECOMB-CG 2014).
Dörr, D., Stoye, J., Böcker, S., & Jahn, K. (2014). Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics, 15(Suppl. 6: Proc. of RECOMB-CG 2014).
Dörr, D., Stoye, J., Böcker, S., and Jahn, K. (2014). Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics 15.
Dörr, D., et al., 2014. Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics, 15(Suppl. 6: Proc. of RECOMB-CG 2014).
D. Dörr, et al., “Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings”, BMC Genomics, vol. 15, 2014.
Dörr, D., Stoye, J., Böcker, S., Jahn, K.: Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings. BMC Genomics. 15, (2014).
Dörr, Daniel, Stoye, Jens, Böcker, Sebastian, and Jahn, Katharina. “Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings”. BMC Genomics 15.Suppl. 6: Proc. of RECOMB-CG 2014 (2014).
Main File(s)
Access Level
OA Open Access
Last Uploaded
2016-11-18T14:29:07Z

This data publication is cited in the following publications:
This publication cites the following data publications:

29 References

Data provided by Europe PubMed Central.

Metrics for GO based protein semantic similarity: a systematic evaluation.
Pesquita C, Faria D, Bastos H, Ferreira AE, Falcao AO, Couto FM., BMC Bioinformatics 9 Suppl 5(), 2008
PMID: 18460186
Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712
Proteinortho: detection of (co-)orthologs in large-scale analysis.
Lechner M, Findeiss S, Steiner L, Marz M, Stadler PF, Prohaska SJ., BMC Bioinformatics 12(), 2011
PMID: 21526987
RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more.
Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muniz-Rascado L, Garcia-Sotelo JS, Weiss V, Solano-Lira H, Martinez-Flores I, Medina-Rivera A, Salgado-Osorio G, Alquicira-Hernandez S, Alquicira-Hernandez K, Lopez-Fuentes A, Porron-Sotelo L, Huerta AM, Bonavides-Martinez C, Balderas-Martinez YI, Pannier L, Olvera M, Labastida A, Jimenez-Jacinto V, Vega-Alvarado L, Del Moral-Chavez V, Hernandez-Alvarez A, Morett E, Collado-Vides J., Nucleic Acids Res. 41(Database issue), 2013
PMID: 23203884

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 25571793
PubMed | Europe PMC

Search this title in

Google Scholar