Significant speedup of database searches with HMMs by search space reduction with PSSM family models

Beckstette M, Homann R, Giegerich R, Kurtz S (2009)
Bioinformatics 25(24): 3251-3258.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
Es wurde kein Volltext hochgeladen. Nur Publikationsnachweis!
Autor/in
; ; ;
Abstract / Bemerkung
Motivation: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. They provide sensitive family descriptors, and sequence database searching with pHMMs has become a standard task in today's genome annotation pipelines. On the downside, searching with pHMMs is computationally expensive. Results: We propose a new method for efficient protein family classification and for speeding up database searches with pHMMs as is necessary for large-scale analysis scenarios. We employ simpler models of protein families called position-specific scoring matrices family models (PSSM-FMs). For fast database search, we combine full-text indexing, efficient exact p-value computation of PSSM match scores and fast fragment chaining. The resulting method is well suited to prefilter the set of sequences to be searched for subsequent database searches with pHMMs. We achieved a classification performance only marginally inferior to hmmsearch, yet, results could be obtained in a fraction of runtime with a speedup of >64-fold. In experiments addressing the method's ability to prefilter the sequence space for subsequent database searches with pHMMs, our method reduces the number of sequences to be searched with hmmsearch to only 0.80% of all sequences. The filter is very fast and leads to a total speedup of factor 43 over the unfiltered search, while retaining >99.5% of the original results. In a lossless filter setup for hmmsearch on UniProtKB/Swiss-Prot, we observed a speedup of factor 92.
Erscheinungsjahr
2009
Zeitschriftentitel
Bioinformatics
Band
25
Ausgabe
24
Seite(n)
3251-3258
ISSN
1367-4803
eISSN
1460-2059
Page URI
https://pub.uni-bielefeld.de/record/1589458

Zitieren

Beckstette M, Homann R, Giegerich R, Kurtz S. Significant speedup of database searches with HMMs by search space reduction with PSSM family models. Bioinformatics. 2009;25(24):3251-3258.
Beckstette, M., Homann, R., Giegerich, R., & Kurtz, S. (2009). Significant speedup of database searches with HMMs by search space reduction with PSSM family models. Bioinformatics, 25(24), 3251-3258. doi:10.1093/bioinformatics/btp593
Beckstette, M., Homann, R., Giegerich, R., and Kurtz, S. (2009). Significant speedup of database searches with HMMs by search space reduction with PSSM family models. Bioinformatics 25, 3251-3258.
Beckstette, M., et al., 2009. Significant speedup of database searches with HMMs by search space reduction with PSSM family models. Bioinformatics, 25(24), p 3251-3258.
M. Beckstette, et al., “Significant speedup of database searches with HMMs by search space reduction with PSSM family models”, Bioinformatics, vol. 25, 2009, pp. 3251-3258.
Beckstette, M., Homann, R., Giegerich, R., Kurtz, S.: Significant speedup of database searches with HMMs by search space reduction with PSSM family models. Bioinformatics. 25, 3251-3258 (2009).
Beckstette, Michael, Homann, Robert, Giegerich, Robert, and Kurtz, Stefan. “Significant speedup of database searches with HMMs by search space reduction with PSSM family models”. Bioinformatics 25.24 (2009): 3251-3258.

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 19828575
PubMed | Europe PMC

Suchen in

Google Scholar