Statistics for approximate gene clusters
Jahn K, Winter S, Stoye J, Böcker S (2013)
BMC Bioinformatics 14(Suppl 15: Proc. of RECOMB-CG 2013): S14.
Zeitschriftenaufsatz
| Veröffentlicht | Englisch
Download
Autor*in
Einrichtung
Abstract / Bemerkung
Background
Genes occurring co-localized in multiple genomes can be strong indicators for either functional constraints on the genome organization or remnant ancestral gene order. The computational detection of these patterns, which are usually referred to as gene clusters, has become increasingly sensitive over the past decade. The most powerful approaches allow for various types of imperfect cluster conservation: Cluster locations may be internally rearranged. The individual cluster locations may contain only a subset of the cluster genes and may be disrupted by uninvolved genes. Moreover cluster locations may not at all occur in some or even most of the studied genomes. The detection of such low quality clusters increases the risk of mistaking faint patterns that occur merely by chance for genuine findings. Therefore, it is crucial to estimate the significance of computational gene cluster predictions and discriminate between true conservation and coincidental clustering.
Results
In this paper, we present an efficient and accurate approach to estimate the significance of gene cluster predictions under the approximate common intervals model. Given a single gene cluster prediction, we calculate the probability to observe it with the same or a higher degree of conservation under the null hypothesis of random gene order, and add a correction factor to account for multiple testing. Our approach considers all parameters that define the quality of gene cluster conservation: the number of genomes in which the cluster occurs, the number of involved genes, the degree of conservation in the different genomes, as well as the frequency of the clustered genes within each genome. We apply our approach to evaluate gene cluster predictions in a large set of well annotated genomes.
Erscheinungsjahr
2013
Zeitschriftentitel
BMC Bioinformatics
Band
14
Ausgabe
Suppl 15: Proc. of RECOMB-CG 2013
Art.-Nr.
S14
ISSN
1471-2105
Page URI
https://pub.uni-bielefeld.de/record/2611633
Zitieren
Jahn K, Winter S, Stoye J, Böcker S. Statistics for approximate gene clusters. BMC Bioinformatics. 2013;14(Suppl 15: Proc. of RECOMB-CG 2013): S14.
Jahn, K., Winter, S., Stoye, J., & Böcker, S. (2013). Statistics for approximate gene clusters. BMC Bioinformatics, 14(Suppl 15: Proc. of RECOMB-CG 2013), S14. doi:10.1186/1471-2105-14-S15-S14
Jahn, Katharina, Winter, Sascha, Stoye, Jens, and Böcker, Sebastian. 2013. “Statistics for approximate gene clusters”. BMC Bioinformatics 14 (Suppl 15: Proc. of RECOMB-CG 2013): S14.
Jahn, K., Winter, S., Stoye, J., and Böcker, S. (2013). Statistics for approximate gene clusters. BMC Bioinformatics 14:S14.
Jahn, K., et al., 2013. Statistics for approximate gene clusters. BMC Bioinformatics, 14(Suppl 15: Proc. of RECOMB-CG 2013): S14.
K. Jahn, et al., “Statistics for approximate gene clusters”, BMC Bioinformatics, vol. 14, 2013, : S14.
Jahn, K., Winter, S., Stoye, J., Böcker, S.: Statistics for approximate gene clusters. BMC Bioinformatics. 14, : S14 (2013).
Jahn, Katharina, Winter, Sascha, Stoye, Jens, and Böcker, Sebastian. “Statistics for approximate gene clusters”. BMC Bioinformatics 14.Suppl 15: Proc. of RECOMB-CG 2013 (2013): S14.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:17Z
MD5 Prüfsumme
4390fd17f33eda00cbfbfeb7ea065c0f
Daten bereitgestellt von European Bioinformatics Institute (EBI)
1 Zitation in Europe PMC
Daten bereitgestellt von Europe PubMed Central.
Proteny: discovering and visualizing statistically significant syntenic clusters at the proteome level.
Gehrmann T, Reinders MJ., Bioinformatics 31(21), 2015
PMID: 26116928
Gehrmann T, Reinders MJ., Bioinformatics 31(21), 2015
PMID: 26116928
28 References
Daten bereitgestellt von Europe PubMed Central.
The COG database: an updated version includes eukaryotes.
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA., BMC Bioinformatics 4(), 2003
PMID: 12969510
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA., BMC Bioinformatics 4(), 2003
PMID: 12969510
Conserved clusters of functionally related genes in two bacterial genomes.
Tamames J, Casari G, Ouzounis C, Valencia A., J. Mol. Evol. 44(1), 1997
PMID: 9010137
Tamames J, Casari G, Ouzounis C, Valencia A., J. Mol. Evol. 44(1), 1997
PMID: 9010137
Conservation of gene order: a fingerprint of proteins that physically interact.
Dandekar T, Snel B, Huynen M, Bork P., Trends Biochem. Sci. 23(9), 1998
PMID: 9787636
Dandekar T, Snel B, Huynen M, Bork P., Trends Biochem. Sci. 23(9), 1998
PMID: 9787636
The use of gene clusters to infer functional coupling.
Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N., Proc. Natl. Acad. Sci. U.S.A. 96(6), 1999
PMID: 10077608
Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N., Proc. Natl. Acad. Sci. U.S.A. 96(6), 1999
PMID: 10077608
A novel method for accurate operon predictions in all sequenced prokaryotes.
Price MN, Huang KH, Alm EJ, Arkin AP., Nucleic Acids Res. 33(3), 2005
PMID: 15701760
Price MN, Huang KH, Alm EJ, Arkin AP., Nucleic Acids Res. 33(3), 2005
PMID: 15701760
Identifying conserved gene clusters in the presence of homology families.
He X, Goldwasser MH., J. Comput. Biol. 12(6), 2005
PMID: 16108708
He X, Goldwasser MH., J. Comput. Biol. 12(6), 2005
PMID: 16108708
The statistical analysis of spatially clustered genes under the maximum gap criterion.
Hoberman R, Sankoff D, Durand D., J. Comput. Biol. 12(8), 2005
PMID: 16241899
Hoberman R, Sankoff D, Durand D., J. Comput. Biol. 12(8), 2005
PMID: 16241899
Statistical inference of chromosomal homology based on gene colinearity and applications to Arabidopsis and rice.
Wang X, Shi X, Li Z, Zhu Q, Kong L, Tang W, Ge S, Luo J., BMC Bioinformatics 7(), 2006
PMID: 17038171
Wang X, Shi X, Li Z, Zhu Q, Kong L, Tang W, Ge S, Luo J., BMC Bioinformatics 7(), 2006
PMID: 17038171
Gene cluster analysis method identifies horizontally transferred genes with high reliability and indicates that they provide the main mechanism of operon gain in 8 species of gamma-Proteobacteria.
Homma K, Fukuchi S, Nakamura Y, Gojobori T, Nishikawa K., Mol. Biol. Evol. 24(3), 2006
PMID: 17185745
Homma K, Fukuchi S, Nakamura Y, Gojobori T, Nishikawa K., Mol. Biol. Evol. 24(3), 2006
PMID: 17185745
Gapped permutation pattern discovery for gene order comparisons.
Parida L., J. Comput. Biol. 14(1), 2007
PMID: 17381345
Parida L., J. Comput. Biol. 14(1), 2007
PMID: 17381345
Identifying clusters of functionally related genes in genomes.
Yi G, Sze SH, Thon MR., Bioinformatics 23(9), 2007
PMID: 17237058
Yi G, Sze SH, Thon MR., Bioinformatics 23(9), 2007
PMID: 17237058
Gecko and GhostFam: rigorous and efficient gene cluster detection in prokaryotic genomes.
Schmidt T, Stoye J., Methods Mol. Biol. 396(), 2007
PMID: 18025693
Schmidt T, Stoye J., Methods Mol. Biol. 396(), 2007
PMID: 18025693
Two plus two does not equal three: statistical tests for multiple genome comparison.
Raghupathy N, Hoberman R, Durand D., J Bioinform Comput Biol 6(1), 2008
PMID: 18324742
Raghupathy N, Hoberman R, Durand D., J Bioinform Comput Biol 6(1), 2008
PMID: 18324742
Efficiently identifying max-gap clusters in pairwise genome comparison.
Ling X, He X, Xin D, Han J, Han J., J. Comput. Biol. 15(6), 2008
PMID: 18631023
Ling X, He X, Xin D, Han J, Han J., J. Comput. Biol. 15(6), 2008
PMID: 18631023
Detecting gene clusters under evolutionary constraint in a large number of genomes.
Ling X, He X, Xin D., Bioinformatics 25(5), 2009
PMID: 19158161
Ling X, He X, Xin D., Bioinformatics 25(5), 2009
PMID: 19158161
Gene cluster statistics with gene families.
Raghupathy N, Durand D., Mol. Biol. Evol. 26(5), 2009
PMID: 19150803
Raghupathy N, Durand D., Mol. Biol. Evol. 26(5), 2009
PMID: 19150803
Computation of median gene clusters.
Bocker S, Jahn K, Mixtacki J, Stoye J., J. Comput. Biol. 16(8), 2009
PMID: 19689215
Bocker S, Jahn K, Mixtacki J, Stoye J., J. Comput. Biol. 16(8), 2009
PMID: 19689215
CYNTENATOR: progressive gene order alignment of 17 vertebrate genomes.
Rodelsperger C, Dieterich C., PLoS ONE 5(1), 2010
PMID: 20126624
Rodelsperger C, Dieterich C., PLoS ONE 5(1), 2010
PMID: 20126624
Efficient computation of approximate gene clusters based on reference occurrences.
Jahn K., J. Comput. Biol. 18(9), 2011
PMID: 21899430
Jahn K., J. Comput. Biol. 18(9), 2011
PMID: 21899430
NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy.
Pruitt KD, Tatusova T, Brown GR, Maglott DR., Nucleic Acids Res. 40(Database issue), 2011
PMID: 22121212
Pruitt KD, Tatusova T, Brown GR, Maglott DR., Nucleic Acids Res. 40(Database issue), 2011
PMID: 22121212
i-ADHoRe 3.0--fast and sensitive detection of genomic homology in extremely large data sets.
Proost S, Fostier J, De Witte D, Dhoedt B, Demeester P, Van de Peer Y, Vandepoele K., Nucleic Acids Res. 40(2), 2011
PMID: 22102584
Proost S, Fostier J, De Witte D, Dhoedt B, Demeester P, Van de Peer Y, Vandepoele K., Nucleic Acids Res. 40(2), 2011
PMID: 22102584
RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more.
Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muniz-Rascado L, Garcia-Sotelo JS, Weiss V, Solano-Lira H, Martinez-Flores I, Medina-Rivera A, Salgado-Osorio G, Alquicira-Hernandez S, Alquicira-Hernandez K, Lopez-Fuentes A, Porron-Sotelo L, Huerta AM, Bonavides-Martinez C, Balderas-Martinez YI, Pannier L, Olvera M, Labastida A, Jimenez-Jacinto V, Vega-Alvarado L, Del Moral-Chavez V, Hernandez-Alvarez A, Morett E, Collado-Vides J., Nucleic Acids Res. 41(Database issue), 2012
PMID: 23203884
Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muniz-Rascado L, Garcia-Sotelo JS, Weiss V, Solano-Lira H, Martinez-Flores I, Medina-Rivera A, Salgado-Osorio G, Alquicira-Hernandez S, Alquicira-Hernandez K, Lopez-Fuentes A, Porron-Sotelo L, Huerta AM, Bonavides-Martinez C, Balderas-Martinez YI, Pannier L, Olvera M, Labastida A, Jimenez-Jacinto V, Vega-Alvarado L, Del Moral-Chavez V, Hernandez-Alvarez A, Morett E, Collado-Vides J., Nucleic Acids Res. 41(Database issue), 2012
PMID: 23203884
Gene and context: integrative approaches to genome analysis.
Huynen MA, Snel B., Adv. Protein Chem. 54(), 2000
PMID: 10829232
Huynen MA, Snel B., Adv. Protein Chem. 54(), 2000
PMID: 10829232
Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context.
Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV., Genome Res. 11(3), 2001
PMID: 11230160
Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV., Genome Res. 11(3), 2001
PMID: 11230160
Identifying functional links between genes using conserved chromosomal proximity.
Yanai I, Mellor JC, DeLisi C., Trends Genet. 18(4), 2002
PMID: 11932011
Yanai I, Mellor JC, DeLisi C., Trends Genet. 18(4), 2002
PMID: 11932011
Fast identification and statistical evaluation of segmental homologies in comparative maps.
Calabrese PP, Chakravarty S, Vision TJ., Bioinformatics 19 Suppl 1(), 2003
PMID: 12855440
Calabrese PP, Chakravarty S, Vision TJ., Bioinformatics 19 Suppl 1(), 2003
PMID: 12855440
Computational approaches for the analysis of gene neighbourhoods in prokaryotic genomes.
Rogozin IB, Makarova KS, Wolf YI, Koonin EV., Brief. Bioinformatics 5(2), 2004
PMID: 15260894
Rogozin IB, Makarova KS, Wolf YI, Koonin EV., Brief. Bioinformatics 5(2), 2004
PMID: 15260894
Export
Markieren/ Markierung löschen
Markierte Publikationen
Web of Science
Dieser Datensatz im Web of Science®Quellen
PMID: 24564620
PubMed | Europe PMC
Suchen in