Statistics for approximate gene clusters

Jahn K, Winter S, Stoye J, Böcker S (2013)
BMC Bioinformatics 14(Suppl 15: Proc. of RECOMB-CG 2013).

Journal Article | Published | English
; ; ;
Background Genes occurring co-localized in multiple genomes can be strong indicators for either functional constraints on the genome organization or remnant ancestral gene order. The computational detection of these patterns, which are usually referred to as gene clusters, has become increasingly sensitive over the past decade. The most powerful approaches allow for various types of imperfect cluster conservation: Cluster locations may be internally rearranged. The individual cluster locations may contain only a subset of the cluster genes and may be disrupted by uninvolved genes. Moreover cluster locations may not at all occur in some or even most of the studied genomes. The detection of such low quality clusters increases the risk of mistaking faint patterns that occur merely by chance for genuine findings. Therefore, it is crucial to estimate the significance of computational gene cluster predictions and discriminate between true conservation and coincidental clustering. Results In this paper, we present an efficient and accurate approach to estimate the significance of gene cluster predictions under the approximate common intervals model. Given a single gene cluster prediction, we calculate the probability to observe it with the same or a higher degree of conservation under the null hypothesis of random gene order, and add a correction factor to account for multiple testing. Our approach considers all parameters that define the quality of gene cluster conservation: the number of genomes in which the cluster occurs, the number of involved genes, the degree of conservation in the different genomes, as well as the frequency of the clustered genes within each genome. We apply our approach to evaluate gene cluster predictions in a large set of well annotated genomes.
Publishing Year

Cite this

Jahn K, Winter S, Stoye J, Böcker S. Statistics for approximate gene clusters. BMC Bioinformatics. 2013;14(Suppl 15: Proc. of RECOMB-CG 2013).
Jahn, K., Winter, S., Stoye, J., & Böcker, S. (2013). Statistics for approximate gene clusters. BMC Bioinformatics, 14(Suppl 15: Proc. of RECOMB-CG 2013).
Jahn, K., Winter, S., Stoye, J., and Böcker, S. (2013). Statistics for approximate gene clusters. BMC Bioinformatics 14.
Jahn, K., et al., 2013. Statistics for approximate gene clusters. BMC Bioinformatics, 14(Suppl 15: Proc. of RECOMB-CG 2013).
K. Jahn, et al., “Statistics for approximate gene clusters”, BMC Bioinformatics, vol. 14, 2013.
Jahn, K., Winter, S., Stoye, J., Böcker, S.: Statistics for approximate gene clusters. BMC Bioinformatics. 14, (2013).
Jahn, Katharina, Winter, Sascha, Stoye, Jens, and Böcker, Sebastian. “Statistics for approximate gene clusters”. BMC Bioinformatics 14.Suppl 15: Proc. of RECOMB-CG 2013 (2013).
Main File(s)
Access Level
OA Open Access
Last Uploaded
2014-01-15 15:06:21

This data publication is cited in the following publications:
This publication cites the following data publications:

1 Citation in Europe PMC

Data provided by Europe PubMed Central.

15 References

Data provided by Europe PubMed Central.

Identifying conserved gene clusters in the presence of homology families.
He X, Goldwasser MH., J. Comput. Biol. 12(6), 2005
PMID: 16108708
Computation of median gene clusters.
Bocker S, Jahn K, Mixtacki J, Stoye J., J. Comput. Biol. 16(8), 2009
PMID: 19689215
Tests for gene clustering.
Durand D, Sankoff D., J. Comput. Biol. 10(3-4), 2003
PMID: 12935338
Gene cluster statistics with gene families.
Raghupathy N, Durand D., Mol. Biol. Evol. 26(5), 2009
PMID: 19150803
The statistical analysis of spatially clustered genes under the maximum gap criterion.
Hoberman R, Sankoff D, Durand D., J. Comput. Biol. 12(8), 2005
PMID: 16241899
Statistical inference of chromosomal homology based on gene colinearity and applications to Arabidopsis and rice.
Wang X, Shi X, Li Z, Zhu Q, Kong L, Tang W, Ge S, Luo J., BMC Bioinformatics 7(), 2006
PMID: 17038171
Gapped permutation pattern discovery for gene order comparisons.
Parida L., J. Comput. Biol. 14(1), 2007
PMID: 17381345
CYNTENATOR: progressive gene order alignment of 17 vertebrate genomes.
Rodelsperger C, Dieterich C., PLoS ONE 5(1), 2010
PMID: 20126624
i-ADHoRe 3.0--fast and sensitive detection of genomic homology in extremely large data sets.
Proost S, Fostier J, De Witte D, Dhoedt B, Demeester P, Van de Peer Y, Vandepoele K., Nucleic Acids Res. 40(2), 2012
PMID: 22102584
The use of gene clusters to infer functional coupling.
Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N., Proc. Natl. Acad. Sci. U.S.A. 96(6), 1999
PMID: 10077608
NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy.
Pruitt KD, Tatusova T, Brown GR, Maglott DR., Nucleic Acids Res. 40(Database issue), 2012
PMID: 22121212
The COG database: an updated version includes eukaryotes.
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA., BMC Bioinformatics 4(), 2003
PMID: 12969510
RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more.
Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muniz-Rascado L, Garcia-Sotelo JS, Weiss V, Solano-Lira H, Martinez-Flores I, Medina-Rivera A, Salgado-Osorio G, Alquicira-Hernandez S, Alquicira-Hernandez K, Lopez-Fuentes A, Porron-Sotelo L, Huerta AM, Bonavides-Martinez C, Balderas-Martinez YI, Pannier L, Olvera M, Labastida A, Jimenez-Jacinto V, Vega-Alvarado L, Del Moral-Chavez V, Hernandez-Alvarez A, Morett E, Collado-Vides J., Nucleic Acids Res. 41(Database issue), 2013
PMID: 23203884
A novel method for accurate operon predictions in all sequenced prokaryotes.
Price MN, Huang KH, Alm EJ, Arkin AP., Nucleic Acids Res. 33(3), 2005
PMID: 15701760


0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®


PMID: 24564620
PubMed | Europe PMC

Search this title in

Google Scholar