Orthology Detection Combining Clustering and Synteny for Very Large Datasets

Lechner M, Hernandez-Rosales M, Dörr D, Wieseke N, Thévenin A, Stoye J, Hartmann RK, Prohaska SJ, Stadler PF (2014)
PLoS ONE 9(8): e105015.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
Lechner, Marcus; Hernandez-Rosales, Maribel; Dörr, DanielUniBi ; Wieseke, Nicolas; Thévenin, AnnelyseUniBi; Stoye, JensUniBi ; Hartmann, Roland K.; Prohaska, Sonja J.; Stadler, Peter F.
Abstract / Bemerkung
The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.
Open-Access-Publikationskosten wurden durch die Deutsche Forschungsgemeinschaft und die Universität Bielefeld gefördert.
Page URI


Lechner M, Hernandez-Rosales M, Dörr D, et al. Orthology Detection Combining Clustering and Synteny for Very Large Datasets. PLoS ONE. 2014;9(8): e105015.
Lechner, M., Hernandez-Rosales, M., Dörr, D., Wieseke, N., Thévenin, A., Stoye, J., Hartmann, R. K., et al. (2014). Orthology Detection Combining Clustering and Synteny for Very Large Datasets. PLoS ONE, 9(8), e105015. doi:10.1371/journal.pone.0105015
Lechner, Marcus, Hernandez-Rosales, Maribel, Dörr, Daniel, Wieseke, Nicolas, Thévenin, Annelyse, Stoye, Jens, Hartmann, Roland K., Prohaska, Sonja J., and Stadler, Peter F. 2014. “Orthology Detection Combining Clustering and Synteny for Very Large Datasets”. PLoS ONE 9 (8): e105015.
Lechner, M., Hernandez-Rosales, M., Dörr, D., Wieseke, N., Thévenin, A., Stoye, J., Hartmann, R. K., Prohaska, S. J., and Stadler, P. F. (2014). Orthology Detection Combining Clustering and Synteny for Very Large Datasets. PLoS ONE 9:e105015.
Lechner, M., et al., 2014. Orthology Detection Combining Clustering and Synteny for Very Large Datasets. PLoS ONE, 9(8): e105015.
M. Lechner, et al., “Orthology Detection Combining Clustering and Synteny for Very Large Datasets”, PLoS ONE, vol. 9, 2014, : e105015.
Lechner, M., Hernandez-Rosales, M., Dörr, D., Wieseke, N., Thévenin, A., Stoye, J., Hartmann, R.K., Prohaska, S.J., Stadler, P.F.: Orthology Detection Combining Clustering and Synteny for Very Large Datasets. PLoS ONE. 9, : e105015 (2014).
Lechner, Marcus, Hernandez-Rosales, Maribel, Dörr, Daniel, Wieseke, Nicolas, Thévenin, Annelyse, Stoye, Jens, Hartmann, Roland K., Prohaska, Sonja J., and Stadler, Peter F. “Orthology Detection Combining Clustering and Synteny for Very Large Datasets”. PLoS ONE 9.8 (2014): e105015.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Access Level
OA Open Access
Zuletzt Hochgeladen
MD5 Prüfsumme

25 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

Cost-effective assembly of the African wild dog (Lycaon pictus) genome using linked reads.
Armstrong EE, Taylor RW, Prost S, Blinston P, van der Meer E, Madzikanda H, Mufute O, Mandisodza-Chikerema R, Stuelpnagel J, Sillero-Zubiri C, Petrov D., Gigascience 8(2), 2019
PMID: 30346553
Best match graphs.
Geiß M, Chávez E, González Laffitte M, López Sánchez A, Stadler BMR, Valdivia DI, Hellmuth M, Hernández Rosales M, Stadler PF., J Math Biol 78(7), 2019
PMID: 30968198
Comparative analyses identify genomic features potentially involved in the evolution of birds-of-paradise.
Prost S, Armstrong EE, Nylander J, Thomas GWC, Suh A, Petersen B, Dalen L, Benz BW, Blom MPK, Palkopoulou E, Ericson PGP, Irestedt M., Gigascience 8(5), 2019
PMID: 30689847
Extreme sensitivity to ultraviolet light in the fungal pathogen causing white-nose syndrome of bats.
Palmer JM, Drees KP, Foster JT, Lindner DL., Nat Commun 9(1), 2018
PMID: 29295979
Time-consistent reconciliation maps and forbidden time travel.
Nøjgaard N, Geiß M, Merkle D, Stadler PF, Wieseke N, Hellmuth M., Algorithms Mol Biol 13(), 2018
PMID: 29441122
TERribly Difficult: Searching for Telomerase RNAs in Saccharomycetes.
Waldl M, Thiel BC, Ochsenreiter R, Holzenleiter A, de Araujo Oliveira JV, Walter MEMT, Wolfinger MT, Stadler PF., Genes (Basel) 9(8), 2018
PMID: 30049970
The molecular genetic basis of herbivory between butterflies and their host plants.
Nallu S, Hill JA, Don K, Sahagun C, Zhang W, Meslin C, Snell-Rood E, Clark NL, Morehouse NI, Bergelson J, Wheat CW, Kronforst MR., Nat Ecol Evol 2(9), 2018
PMID: 30076351
SynerClust: a highly scalable, synteny-aware orthologue clustering tool.
Georgescu CH, Manson AL, Griggs AD, Desjardins CA, Pironti A, Wapinski I, Abeel T, Haas BJ, Earl AM., Microb Genom 4(11), 2018
PMID: 30418868
Transcriptologs: A Transcriptome-Based Approach to Predict Orthology Relationships.
Ambrosino L, Chiusano ML., Bioinform Biol Insights 11(), 2017
PMID: 28469416
Contrasting evolutionary genome dynamics between domesticated and wild yeasts.
Yue JX, Li J, Aigrain L, Hallin J, Persson K, Oliver K, Bergström A, Coupland P, Warringer J, Lagomarsino MC, Fischer G, Durbin R, Liti G., Nat Genet 49(6), 2017
PMID: 28416820
Positive diversifying selection is a pervasive adaptive force throughout the Drosophila radiation.
Cicconardi F, Marcatili P, Arthofer W, Schlick-Steiner BC, Steiner FM., Mol Phylogenet Evol 112(), 2017
PMID: 28458014
No evidence for a bovine mastitis Escherichia coli pathotype.
Leimbach A, Poehlein A, Vollmers J, Görlich D, Daniel R, Dobrindt U., BMC Genomics 18(1), 2017
PMID: 28482799
The gene family-free median of three.
Doerr D, Balaban M, Feijão P, Chauve C., Algorithms Mol Biol 12(), 2017
PMID: 28559921
New Genome Similarity Measures based on Conserved Gene Adjacencies.
Doerr D, Kowada LAB, Araujo E, Deshpande S, Dantas S, Moret BME, Stoye J., J Comput Biol 24(6), 2017
PMID: 28590847
OrthoReD: a rapid and accurate orthology prediction tool with low computational requirement.
Battenberg K, Lee EK, Chiu JC, Berry AM, Potter D., BMC Bioinformatics 18(1), 2017
PMID: 28633662
Microbial genome analysis: the COG approach.
Galperin MY, Kristensen DM, Makarova KS, Wolf YI, Koonin EV., Brief Bioinform (), 2017
PMID: 28968633
Genome-Guided Phylo-Transcriptomic Methods and the Nuclear Phylogentic Tree of the Paniceae Grasses.
Washburn JD, Schnable JC, Conant GC, Brutnell TP, Shao Y, Zhang Y, Ludwig M, Davidse G, Pires JC., Sci Rep 7(1), 2017
PMID: 29051622
OrthoGNC: A Software for Accurate Identification of Orthologs Based on Gene Neighborhood Conservation.
Jahangiri-Tazehkand S, Wong L, Eslahchi C., Genomics Proteomics Bioinformatics 15(6), 2017
PMID: 29133277
Elastic K-means using posterior probability.
Zheng A, Jiang B, Li Y, Zhang X, Ding C., PLoS One 12(12), 2017
PMID: 29240756
Functional Annotations of Paralogs: A Blessing and a Curse.
Zallot R, Harrison KJ, Kolaczkowski B, de Crécy-Lagard V., Life (Basel) 6(3), 2016
PMID: 27618105
An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species.
Galpert D, Del Río S, Herrera F, Ancede-Gallardo E, Antunes A, Agüero-Chapin G., Biomed Res Int 2015(), 2015
PMID: 26605337
Genomic legacy of the African cheetah, Acinonyx jubatus.
Dobrynin P, Liu S, Tamazian G, Xiong Z, Yurchenko AA, Krasheninnikova K, Kliver S, Schmidt-Küntzel A, Koepfli KP, Johnson W, Kuderna LF, García-Pérez R, Manuel Md, Godinez R, Komissarov A, Makunin A, Brukhin V, Qiu W, Zhou L, Li F, Yi J, Driscoll C, Antunes A, Oleksyk TK, Eizirik E, Perelman P, Roelke M, Wildt D, Diekhans M, Marques-Bonet T, Marker L, Bhak J, Wang J, Zhang G, O'Brien SJ., Genome Biol 16(), 2015
PMID: 26653294

60 References

Daten bereitgestellt von Europe PubMed Central.

Distinguishing homologous from analogous proteins.
Fitch WM., Syst. Zool. 19(2), 1970
PMID: 5449325
Homology a personal view on some of the problems.
Fitch WM., Trends Genet. 16(5), 2000
PMID: 10782117
Bayesian gene/species tree reconciliation and orthology analysis using MCMC

Orthology prediction at scalable resolution by phylogenetic tree analysis.
van der Heijden RT, Snel B, van Noort V, Huynen MA., BMC Bioinformatics 8(), 2007
PMID: 17346331
Ensembl 2007.
Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Ouverdin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E., Nucleic Acids Res. 35(Database issue), 2006
PMID: 17148474
Automatic genome-wide reconstruction of phylogenetic gene trees
The COG database: a tool for genome-scale analysis of protein functions and evolution.
Tatusov RL, Galperin MY, Natale DA, Koonin EV., Nucleic Acids Res. 28(1), 2000
PMID: 10592175
OrthoMCL: identification of ortholog groups for eukaryotic genomes.
Li L, Stoeckert CJ Jr, Roos DS., Genome Res. 13(9), 2003
PMID: 12952885
OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups.
Chen F, Mackey AJ, Stoeckert CJ Jr, Roos DS., Nucleic Acids Res. 34(Database issue), 2006
PMID: 16381887
OMA Browser--exploring orthologous relations across 352 complete genomes.
Schneider A, Dessimoz C, Gonnet GH., Bioinformatics 23(16), 2007
PMID: 17545180
OMA 2011: orthology inference among 1000 complete genomes.
Altenhoff AM, Schneider A, Gonnet GH, Dessimoz C., Nucleic Acids Res. 39(Database issue), 2010
PMID: 21113020
InParanoid 6: eukaryotic ortholog clusters with inparalogs.
Berglund AC, Sjolund E, Ostlund G, Sonnhammer EL., Nucleic Acids Res. 36(Database issue), 2007
PMID: 18055500
eggNOG: automated construction and annotation of orthologous groups of genes.
Jensen LJ, Julien P, Kuhn M, von Mering C, Muller J, Doerks T, Bork P., Nucleic Acids Res. 36(Database issue), 2007
PMID: 17942413
Database resources of the National Center for Biotechnology Information.
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Ostell J, Pruitt KD, Schuler GD, Shumway M, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E., Nucleic Acids Res. 36(Database issue), 2007
PMID: 18045790
Roundup 2.0: enabling comparative genomics for over 1800 genomes.
DeLuca TF, Cui J, Jung JY, St Gabriel KC, Wall DP., Bioinformatics 28(5), 2012
PMID: 22247275

Orthologs, paralogs, and evolutionary genomics.
Koonin EV., Annu. Rev. Genet. 39(), 2005
PMID: 16285863
Proteinortho: detection of (co-)orthologs in large-scale analysis.
Lechner M, Findeiss S, Steiner L, Marz M, Stadler PF, Prohaska SJ., BMC Bioinformatics 12(), 2011
PMID: 21526987
Orthology relations, symbolic ultrametrics, and cographs.
Hellmuth M, Hernandez-Rosales M, Huber KT, Moulton V, Stadler PF, Wieseke N., J Math Biol 66(1-2), 2012
PMID: 22456957
Phylogenetic and functional assessment of orthologs inference projects and methods.
Altenhoff AM, Dessimoz C., PLoS Comput. Biol. 5(1), 2009
PMID: 19148271
Assessing performance of orthology detection strategies applied to eukaryotic genomes.
Chen F, Mackey AJ, Vermunt JK, Roos DS., PLoS ONE 2(4), 2007
PMID: 17440619
Orthology prediction methods: a quality assessment using curated protein families.
Trachana K, Larsson TA, Powell S, Chen WH, Doerks T, Muller J, Bork P., Bioessays 33(10), 2011
PMID: 21853451
HaMStR: profile hidden markov model based search for orthologs in ESTs.
Ebersberger I, Strauss S, von Haeseler A., BMC Evol. Biol. 9(), 2009
PMID: 19586527
Fueling the future with fungal genomics
Whole-genome sequencing of multiple Arabidopsis thaliana populations.
Cao J, Schneeberger K, Ossowski S, Gunther T, Bender S, Fitz J, Koenig D, Lanz C, Stegle O, Lippert C, Wang X, Ott F, Muller J, Alonso-Blanco C, Borgwardt K, Schmid KJ, Weigel D., Nat. Genet. 43(10), 2011
PMID: 21874002
Natural history and evolutionary principles of gene duplication in fungi.
Wapinski I, Pfeffer A, Friedman N, Regev A., Nature 449(7158), 2007
PMID: 17805289
Conservation of gene order: a fingerprint of proteins that physically interact.
Dandekar T, Snel B, Huynen M, Bork P., Trends Biochem. Sci. 23(9), 1998
PMID: 9787636
SynBlast: assisting the analysis of conserved synteny information.
Lehmann J, Stadler PF, Prohaska SJ., BMC Bioinformatics 9(), 2008
PMID: 18721485
Positional orthology: putting genomic evolutionary relationships into context
The chromosome inversion problem

On the similarity of sets of permutations and its applications to genome comparison
Gossip is synteny: Incomplete gossip and the syntenic distance between genomes
Optimal algorithms for uncovering synteny problem
i-ADHoRe 3.0--fast and sensitive detection of genomic homology in extremely large data sets.
Proost S, Fostier J, De Witte D, Dhoedt B, Demeester P, Van de Peer Y, Vandepoele K., Nucleic Acids Res. 40(2), 2011
PMID: 22102584
MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity.
Wang Y, Tang H, Debarry JD, Tan X, Li J, Wang X, Lee TH, Jin H, Marler B, Guo H, Kissinger JC, Paterson AH., Nucleic Acids Res. 40(7), 2012
PMID: 22217600
CYNTENATOR: progressive gene order alignment of 17 vertebrate genomes.
Rodelsperger C, Dieterich C., PLoS ONE 5(1), 2010
PMID: 20126624
DAGchainer: a tool for mining segmental genome duplications and synteny.
Haas BJ, Delcher AL, Wortman JR, Salzberg SL., Bioinformatics 20(18), 2004
PMID: 15247098

Genomic distance under gene substitutions.
Braga MD, Machado R, Ribeiro LC, Stoye J., BMC Bioinformatics 12 Suppl 9(), 2011
PMID: 22151231
BLAST+: architecture and applications.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL., BMC Bioinformatics 10(), 2009
PMID: 20003500
Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712

The evolutionary fate and consequences of duplicate genes.
Lynch M, Conery JS., Science 290(5494), 2000
PMID: 11073452
Paths and cycles in breakpoint graph of random multichromosomal genomes.
Xu W, Zheng C, Sankoff D., J. Comput. Biol. 14(4), 2007
PMID: 17572021
Simulation of gene family histories
Biological sequence simulation for testing complex evolutionary hypotheses: indel-Seq-Gen version 2.0.
Strope CL, Abel K, Scott SD, Moriyama EN., Mol. Biol. Evol. 26(11), 2009
PMID: 19651852
ALF--a simulation framework for genome evolution.
Dalquen DA, Anisimova M, Gonnet GH, Dessimoz C., Mol. Biol. Evol. 29(4), 2011
PMID: 22160766
Ensembl 2011
Insertion of horizontally transferred genes within conserved syntenic regions of yeast genomes
Computational methods for Gene Orthology inference.
Kristensen DM, Wolf YI, Mushegian AR, Koonin EV., Brief. Bioinformatics 12(5), 2011
PMID: 21690100
Identifying single copy orthologs in Metazoa
Transcriptome profiling of Giardia intestinalis using strand-specific RNA-seq
Development of universal genetic markers based on single-copy orthologous (COSII) genes in Poaceae.
Liu H, Guo X, Wu J, Chen GB, Ying Y., Plant Cell Rep. 32(3), 2012
PMID: 23233129

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

PMID: 25137074
PubMed | Europe PMC

Suchen in

Google Scholar