A novel approach to remote homology detection: jumping alignments
Spang R, Rehmsmeier M, Stoye J (2002)
Journal of Computational Biology 9(5): 747-760.
Zeitschriftenaufsatz
| Veröffentlicht | Englisch
Download
Autor*in
Einrichtung
Abstract / Bemerkung
We describe a new algorithm for protein classification and the detection of remote homologs. The rationale is to exploit both vertical and horizontal information of a multiple alignment in a well-balanced manner. This is in contrast to established methods such as profiles and profile hidden Markov models which focus on vertical information as they model the columns of the alignment independently and to family pairwise search which focuses on horizontal information as it treats given sequences separately. In our setting, we want to select from a given database of "candidate sequences" those proteins that belong to a given superfamily. In order to do so, each candidate sequence is separately tested against a multiple alignment of the known members of the superfamily by means of a new jumping alignment algorithm. This algorithm is an extension of the Smith-Waterman algorithm and computes a local alignment of a single sequence and a multiple alignment. In contrast to traditional methods, however, this alignment is not based on a summary of the individual columns of the multiple alignment. Rather, the candidate sequence is at each position aligned to one sequence of the multiple alignment, called the "reference sequence". In addition, the reference sequence may change within the alignment, while each such jump is penalized. To evaluate the discriminative quality of the jumping alignment algorithm, we compare it to profiles, profile hidden Markov models, and family pairwise search on a subset of the SCOP database of protein domains. The discriminative quality is assessed by median false positive counts (med-FP-counts). For moderate med-FP-counts, the number of successful searches with our method is considerably higher than with the competing methods.
Stichworte
Homology detection;
Jumping alignments;
Protein classification;
Sequence analysis
Erscheinungsjahr
2002
Zeitschriftentitel
Journal of Computational Biology
Band
9
Ausgabe
5
Seite(n)
747-760
ISSN
1066-5277
eISSN
1557-8666
Page URI
https://pub.uni-bielefeld.de/record/1773578
Zitieren
Spang R, Rehmsmeier M, Stoye J. A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology. 2002;9(5):747-760.
Spang, R., Rehmsmeier, M., & Stoye, J. (2002). A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology, 9(5), 747-760. https://doi.org/10.1089/106652702761034172
Spang, Rainer, Rehmsmeier, Marc, and Stoye, Jens. 2002. “A novel approach to remote homology detection: jumping alignments”. Journal of Computational Biology 9 (5): 747-760.
Spang, R., Rehmsmeier, M., and Stoye, J. (2002). A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology 9, 747-760.
Spang, R., Rehmsmeier, M., & Stoye, J., 2002. A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology, 9(5), p 747-760.
R. Spang, M. Rehmsmeier, and J. Stoye, “A novel approach to remote homology detection: jumping alignments”, Journal of Computational Biology, vol. 9, 2002, pp. 747-760.
Spang, R., Rehmsmeier, M., Stoye, J.: A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology. 9, 747-760 (2002).
Spang, Rainer, Rehmsmeier, Marc, and Stoye, Jens. “A novel approach to remote homology detection: jumping alignments”. Journal of Computational Biology 9.5 (2002): 747-760.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Name
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T08:48:08Z
MD5 Prüfsumme
d74c8f1ce5f4359744115523b4715a1d
Daten bereitgestellt von European Bioinformatics Institute (EBI)
14 Zitationen in Europe PMC
Daten bereitgestellt von Europe PubMed Central.
Decoding noises in HIV computational genotyping.
Jia M, Shaw T, Zhang X, Liu D, Shen Y, Ezeamama AE, Yang C, Zhang M., Virology 511(), 2017
PMID: 28918303
Jia M, Shaw T, Zhang X, Liu D, Shen Y, Ezeamama AE, Yang C, Zhang M., Virology 511(), 2017
PMID: 28918303
Pareto optimization in algebraic dynamic programming.
Saule C, Giegerich R., Algorithms Mol Biol 10(), 2015
PMID: 26150892
Saule C, Giegerich R., Algorithms Mol Biol 10(), 2015
PMID: 26150892
Probabilistic inference of viral quasispecies subject to recombination.
Töpfer A, Zagordi O, Prabhakaran S, Roth V, Halperin E, Beerenwinkel N., J Comput Biol 20(2), 2013
PMID: 23383997
Töpfer A, Zagordi O, Prabhakaran S, Roth V, Halperin E, Beerenwinkel N., J Comput Biol 20(2), 2013
PMID: 23383997
Haploid to diploid alignment for variation calling assessment.
Mäkinen V, Rahkola J., BMC Bioinformatics 14 Suppl 15(), 2013
PMID: 24564537
Mäkinen V, Rahkola J., BMC Bioinformatics 14 Suppl 15(), 2013
PMID: 24564537
TCRep 3D: an automated in silico approach to study the structural properties of TCR repertoires.
Leimgruber A, Ferber M, Irving M, Hussain-Kahn H, Wieckowski S, Derré L, Rufer N, Zoete V, Michielin O., PLoS One 6(10), 2011
PMID: 22053188
Leimgruber A, Ferber M, Irving M, Hussain-Kahn H, Wieckowski S, Derré L, Rufer N, Zoete V, Michielin O., PLoS One 6(10), 2011
PMID: 22053188
HIV classification using the coalescent theory.
Bulla I, Schultz AK, Schreiber F, Zhang M, Leitner T, Korber B, Morgenstern B, Stanke M., Bioinformatics 26(11), 2010
PMID: 20400454
Bulla I, Schultz AK, Schreiber F, Zhang M, Leitner T, Korber B, Morgenstern B, Stanke M., Bioinformatics 26(11), 2010
PMID: 20400454
Improving accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues.
Lu Y, Sze SH., Nucleic Acids Res 37(2), 2009
PMID: 19056820
Lu Y, Sze SH., Nucleic Acids Res 37(2), 2009
PMID: 19056820
Homology and phylogeny and their automated inference.
Fuellen G., Naturwissenschaften 95(6), 2008
PMID: 18288471
Fuellen G., Naturwissenschaften 95(6), 2008
PMID: 18288471
Recco: recombination analysis using cost optimization.
Maydt J, Lengauer T., Bioinformatics 22(9), 2006
PMID: 16488909
Maydt J, Lengauer T., Bioinformatics 22(9), 2006
PMID: 16488909
A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes.
Schultz AK, Zhang M, Leitner T, Kuiken C, Korber B, Morgenstern B, Stanke M., BMC Bioinformatics 7(), 2006
PMID: 16716226
Schultz AK, Zhang M, Leitner T, Kuiken C, Korber B, Morgenstern B, Stanke M., BMC Bioinformatics 7(), 2006
PMID: 16716226
jpHMM at GOBICS: a web server to detect genomic recombinations in HIV-1.
Zhang M, Schultz AK, Calef C, Kuiken C, Leitner T, Korber B, Morgenstern B, Stanke M., Nucleic Acids Res 34(web server issue), 2006
PMID: 16845050
Zhang M, Schultz AK, Calef C, Kuiken C, Leitner T, Korber B, Morgenstern B, Stanke M., Nucleic Acids Res 34(web server issue), 2006
PMID: 16845050
A sequence sub-sampling algorithm increases the power to detect distant homologues.
Johnston CR, Shields DC., Nucleic Acids Res 33(12), 2005
PMID: 16006623
Johnston CR, Shields DC., Nucleic Acids Res 33(12), 2005
PMID: 16006623
Sensitive detection of sequence similarity using combinatorial pattern discovery: a challenging study of two distantly related protein families.
Darzentas N, Rigoutsos I, Ouzounis CA., Proteins 61(4), 2005
PMID: 16224785
Darzentas N, Rigoutsos I, Ouzounis CA., Proteins 61(4), 2005
PMID: 16224785
A robust method to detect structural and functional remote homologues.
Shachar O, Linial M., Proteins 57(3), 2004
PMID: 15382232
Shachar O, Linial M., Proteins 57(3), 2004
PMID: 15382232
29 References
Daten bereitgestellt von Europe PubMed Central.
Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712
AUTHOR UNKNOWN, 0
Hidden Markov models of biological primary sequence information.
Baldi P, Chauvin Y, Hunkapiller T, McClure MA., Proc. Natl. Acad. Sci. U.S.A. 91(3), 1994
PMID: 8302831
Baldi P, Chauvin Y, Hunkapiller T, McClure MA., Proc. Natl. Acad. Sci. U.S.A. 91(3), 1994
PMID: 8302831
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.
Brenner SE, Chothia C, Hubbard TJ., Proc. Natl. Acad. Sci. U.S.A. 95(11), 1998
PMID: 9600919
Brenner SE, Chothia C, Hubbard TJ., Proc. Natl. Acad. Sci. U.S.A. 95(11), 1998
PMID: 9600919
A flexible motif search technique based on generalized profiles.
Bucher P, Karplus K, Moeri N, Hofmann K., Comput. Chem. 20(1), 1996
PMID: 8867839
Bucher P, Karplus K, Moeri N, Hofmann K., Comput. Chem. 20(1), 1996
PMID: 8867839
Maximum discrimination hidden Markov models of sequence consensus.
Eddy SR, Mitchison G, Durbin R., J. Comput. Biol. 2(1), 1995
PMID: 7497123
Eddy SR, Mitchison G, Durbin R., J. Comput. Biol. 2(1), 1995
PMID: 7497123
Gene recognition via spliced sequence alignment.
Gelfand MS, Mironov AA, Pevzner PA., Proc. Natl. Acad. Sci. U.S.A. 93(17), 1996
PMID: 8799154
Gelfand MS, Mironov AA, Pevzner PA., Proc. Natl. Acad. Sci. U.S.A. 93(17), 1996
PMID: 8799154
An improved algorithm for matching biological sequences.
Gotoh O., J. Mol. Biol. 162(3), 1982
PMID: 7166760
Gotoh O., J. Mol. Biol. 162(3), 1982
PMID: 7166760
Profile analysis: detection of distantly related proteins.
Gribskov M, McLachlan AD, Eisenberg D., Proc. Natl. Acad. Sci. U.S.A. 84(13), 1987
PMID: 3474607
Gribskov M, McLachlan AD, Eisenberg D., Proc. Natl. Acad. Sci. U.S.A. 84(13), 1987
PMID: 3474607
Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching.
Gribskov M, Robinson NL., Comput. Chem. 20(1), 1996
PMID: 16718863
Gribskov M, Robinson NL., Comput. Chem. 20(1), 1996
PMID: 16718863
Homology detection via family pairwise search.
Grundy WN., J. Comput. Biol. 5(3), 1998
PMID: 9773344
Grundy WN., J. Comput. Biol. 5(3), 1998
PMID: 9773344
A space-efficient algorithm for local similarities.
Huang XQ, Hardison RC, Miller W., Comput. Appl. Biosci. 6(4), 1990
PMID: 2257499
Huang XQ, Hardison RC, Miller W., Comput. Appl. Biosci. 6(4), 1990
PMID: 2257499
A discriminative framework for detecting remote protein homologies.
Jaakkola T, Diekhans M, Haussler D., J. Comput. Biol. 7(1-2), 2000
PMID: 10890390
Jaakkola T, Diekhans M, Haussler D., J. Comput. Biol. 7(1-2), 2000
PMID: 10890390
Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.
Karlin S, Altschul SF., Proc. Natl. Acad. Sci. U.S.A. 87(6), 1990
PMID: 2315319
Karlin S, Altschul SF., Proc. Natl. Acad. Sci. U.S.A. 87(6), 1990
PMID: 2315319
Komatsoulis, Ann. Int. Conf. Computational Molecular Biology (RECOMB 97) (), 1997
Hidden Markov models in computational biology. Applications to protein modeling.
Krogh A, Brown M, Mian IS, Sjolander K, Haussler D., J. Mol. Biol. 235(5), 1994
PMID: 8107089
Krogh A, Brown M, Mian IS, Sjolander K, Haussler D., J. Mol. Biol. 235(5), 1994
PMID: 8107089
AUTHOR UNKNOWN, 0
SCOP: a structural classification of proteins database.
Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C., Nucleic Acids Res. 28(1), 2000
PMID: 10592240
Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C., Nucleic Acids Res. 28(1), 2000
PMID: 10592240
Improving the sensitivity of the sequence profile method.
Luthy R, Xenarios I, Bucher P., Protein Sci. 3(1), 1994
PMID: 7511453
Luthy R, Xenarios I, Bucher P., Protein Sci. 3(1), 1994
PMID: 7511453
DIALIGN: finding local similarities by multiple sequence alignment.
Morgenstern B, Frech K, Dress A, Werner T., Bioinformatics 14(3), 1998
PMID: 9614273
Morgenstern B, Frech K, Dress A, Werner T., Bioinformatics 14(3), 1998
PMID: 9614273
SCOP: a structural classification of proteins database for the investigation of sequences and structures.
Murzin AG, Brenner SE, Hubbard T, Chothia C., J. Mol. Biol. 247(4), 1995
PMID: 7723011
Murzin AG, Brenner SE, Hubbard T, Chothia C., J. Mol. Biol. 247(4), 1995
PMID: 7723011
Intermediate sequences increase the detection of homology between sequences.
Park J, Teichmann SA, Hubbard T, Chothia C., J. Mol. Biol. 273(1), 1997
PMID: 9367767
Park J, Teichmann SA, Hubbard T, Chothia C., J. Mol. Biol. 273(1), 1997
PMID: 9367767
Phylogenetic information improves homology detection.
Rehmsmeier M, Vingron M., Proteins 45(4), 2001
PMID: 11746684
Rehmsmeier M, Vingron M., Proteins 45(4), 2001
PMID: 11746684
Identification of common molecular subsequences.
Smith TF, Waterman MS., J. Mol. Biol. 147(1), 1981
PMID: 7265238
Smith TF, Waterman MS., J. Mol. Biol. 147(1), 1981
PMID: 7265238
Spang, Int. Conf. Intelligent Systems for Molecular Biology (ISMB 00) (), 2000
Statistics of large-scale sequence searching.
Spang R, Vingron M., Bioinformatics 14(3), 1998
PMID: 9614271
Spang R, Vingron M., Bioinformatics 14(3), 1998
PMID: 9614271
Identification of protein sequence homology by consensus template alignment.
Taylor WR., J. Mol. Biol. 188(2), 1986
PMID: 3088284
Taylor WR., J. Mol. Biol. 188(2), 1986
PMID: 3088284
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
Thompson JD, Higgins DG, Gibson TJ., Nucleic Acids Res. 22(22), 1994
PMID: 7984417
Thompson JD, Higgins DG, Gibson TJ., Nucleic Acids Res. 22(22), 1994
PMID: 7984417
AUTHOR UNKNOWN, 0
Export
Markieren/ Markierung löschen
Markierte Publikationen
Web of Science
Dieser Datensatz im Web of Science®Quellen
PMID: 12487762
PubMed | Europe PMC
Suchen in