Enhancing sensitivity and controlling false discovery rate in somatic indel discovery using a latent variable model

Dijkstra LJ, Köster J, Marschall T, Schönhuth A (2017)
bioRxiv.

Preprint | Veröffentlicht | Englisch
 
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Dijkstra, Louis J.; Köster, Johannes; Marschall, Tobias; Schönhuth, AlexanderUniBi
Abstract / Bemerkung
AbstractCancer is a genetic disorder in the first place. Therefore, next-generation sequencing (NGS) based discovery of somatically acquired genetic variants has gained widespread attention. Computational prediction of somatic variants, however, is affected by a variety of confounding factors. In addition to the uncertainties that one commonly encounters also in germline variation prediction, such as misplaced and/or inaccurate read alignments, cancer heterogeneity and impure samples significantly add to the issues. Overall, this hampers state-of-the-art indel discovery tools to discover somatic indels at operable performance rates, although they perform excellently when calling germline indels. While affecting all size ranges, both common and cancer-specific problems interfere in particularly unfavorable ways in the prediction of somatic midsize (30-150 bp) insertions and deletions.Here, we present a latent variable model that can take the major confounding factors and uncertainties into a unifying account. Using this modeling framework, we first demonstrate how to efficiently compute the probability for a (putative) indel to be somatic, thereby resolving a principled computational runtime bottleneck in Bayesian uncertainty quantification. Second, we show how to reliably estimate the allele frequencies for a given list of indels. Third, we also present an intuitive and effective way to control the false discovery rate, an issue in genetic variant discovery that has been found notoriously hard to deal with. As a tool that implements all methodology developed, we present PROSIC (PROcessing Somatic Indel Calls). PROSIC achieves significant improvements in particular in terms of recall when applied to deletion call sheets, as provided by prevalent state-of-the-art tools, in comparison to their integrated somatic indel calling routines.The software is publicly available at https://prosic.github.io and can be easily installed via https://bioconda.github.io.
Erscheinungsjahr
2017
Zeitschriftentitel
bioRxiv
Page URI
https://pub.uni-bielefeld.de/record/2941789

Zitieren

Dijkstra LJ, Köster J, Marschall T, Schönhuth A. Enhancing sensitivity and controlling false discovery rate in somatic indel discovery using a latent variable model. bioRxiv. 2017.
Dijkstra, L. J., Köster, J., Marschall, T., & Schönhuth, A. (2017). Enhancing sensitivity and controlling false discovery rate in somatic indel discovery using a latent variable model. bioRxiv
Dijkstra, L. J., Köster, J., Marschall, T., and Schönhuth, A. (2017). Enhancing sensitivity and controlling false discovery rate in somatic indel discovery using a latent variable model. bioRxiv.
Dijkstra, L.J., et al., 2017. Enhancing sensitivity and controlling false discovery rate in somatic indel discovery using a latent variable model. bioRxiv.
L.J. Dijkstra, et al., “Enhancing sensitivity and controlling false discovery rate in somatic indel discovery using a latent variable model”, bioRxiv, 2017.
Dijkstra, L.J., Köster, J., Marschall, T., Schönhuth, A.: Enhancing sensitivity and controlling false discovery rate in somatic indel discovery using a latent variable model. bioRxiv. (2017).
Dijkstra, Louis J., Köster, Johannes, Marschall, Tobias, and Schönhuth, Alexander. “Enhancing sensitivity and controlling false discovery rate in somatic indel discovery using a latent variable model”. bioRxiv (2017).

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Quellen

Preprint: 10.1101/121954

Suchen in

Google Scholar