Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling

Wolfsheimer S, Herms I, Rahmann S, Hartmann AK (2011)
BMC Bioinformatics 12(1): 47-2105.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
Es wurde kein Volltext hochgeladen. Nur Publikationsnachweis!
Autor/in
; ; ;
Abstract / Bemerkung
Background: Molecular database search tools need statistical models to assess the significance for the resulting hits. In the classical approach one asks the question how probable a certain score is observed by pure chance. Asymptotic theories for such questions are available for two random i.i.d. sequences. Some effort had been made to include effects of finite sequence lengths and to account for specific compositions of the sequences. In many applications, such as a large-scale database homology search for transmembrane proteins, these models are not the most appropriate ones. Search sensitivity and specificity benefit from position-dependent scoring schemes or use of Hidden Markov Models. Additional, one may wish to go beyond the assumption that the sequences are i.i.d. Despite their practical importance, the statistical properties of these settings have not been well investigated yet. Results: In this paper, we discuss an efficient and general method to compute the score distribution to any desired accuracy. The general approach may be applied to different sequence models and and various similarity measures that satisfy a few weak assumptions. We have access to the low-probability region ("tail") of the distribution where scores are larger than expected by pure chance and therefore relevant for practical applications. Our method uses recent ideas from rare-event simulations, combining Markov chain Monte Carlo simulations with importance sampling and generalized ensembles. We present results for the score statistics of fixed and random queries against random sequences. In a second step, we extend the approach to a model of transmembrane proteins, which can hardly be described as i.i.d. sequences. For this case, we compare the statistical properties of a fixed query model as well as a hidden Markov sequence model in connection with a position based scoring scheme against the classical approach. Conclusions: The results illustrate that the sensitivity and specificity strongly depend on the underlying scoring and sequence model. A specific ROC analysis for the case of transmembrane proteins supports our observation.
Erscheinungsjahr
2011
Zeitschriftentitel
BMC Bioinformatics
Band
12
Ausgabe
1
Seite(n)
47-2105
ISSN
1471-2105
Page URI
https://pub.uni-bielefeld.de/record/2425310

Zitieren

Wolfsheimer S, Herms I, Rahmann S, Hartmann AK. Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling. BMC Bioinformatics. 2011;12(1):47-2105.
Wolfsheimer, S., Herms, I., Rahmann, S., & Hartmann, A. K. (2011). Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling. BMC Bioinformatics, 12(1), 47-2105. doi:10.1186/1471-2105-12-47
Wolfsheimer, S., Herms, I., Rahmann, S., and Hartmann, A. K. (2011). Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling. BMC Bioinformatics 12, 47-2105.
Wolfsheimer, S., et al., 2011. Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling. BMC Bioinformatics, 12(1), p 47-2105.
S. Wolfsheimer, et al., “Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling”, BMC Bioinformatics, vol. 12, 2011, pp. 47-2105.
Wolfsheimer, S., Herms, I., Rahmann, S., Hartmann, A.K.: Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling. BMC Bioinformatics. 12, 47-2105 (2011).
Wolfsheimer, Stefan, Herms, Inke, Rahmann, Sven, and Hartmann, Alexander K. “Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling”. BMC Bioinformatics 12.1 (2011): 47-2105.

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 21291566
PubMed | Europe PMC

Suchen in

Google Scholar