Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype

Yin, Bojian; Balvert, Marleen; van der Spek, Rick A A; Dutilh, Bas E; Bohté, Sander; Veldink, Jan; Schönhuth, Alexander

Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype

Yin B, Balvert M, van der Spek RAA, Dutilh BE, Bohté S, Veldink J, Schönhuth A (2019)
Bioinformatics 35(14): i538-i547.

Zeitschriftenaufsatz | Veröffentlicht | Englisch

Download

Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!

DOI

https://doi.org/10.1093/bioinformatics/btz369

Autor*in

Yin, Bojian; Balvert, Marleen; van der Spek, Rick A A; Dutilh, Bas E; Bohté, Sander; Veldink, Jan; Schönhuth, Alexander^UniBi

Einrichtung

Technische Fakultät > AG Genome Data Science

Abstract / Bemerkung

Abstract Motivation Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease caused by aberrations in the genome. While several disease-causing variants have been identified, a major part of heritability remains unexplained. ALS is believed to have a complex genetic basis where non-additive combinations of variants constitute disease, which cannot be picked up using the linear models employed in classical genotype–phenotype association studies. Deep learning on the other hand is highly promising for identifying such complex relations. We therefore developed a deep-learning based approach for the classification of ALS patients versus healthy individuals from the Dutch cohort of the Project MinE dataset. Based on recent insight that regulatory regions harbor the majority of disease-associated variants, we employ a two-step approach: first promoter regions that are likely associated to ALS are identified, and second individuals are classified based on their genotype in the selected genomic regions. Both steps employ a deep convolutional neural network. The network architecture accounts for the structure of genome data by applying convolution only to parts of the data where this makes sense from a genomics perspective. Results Our approach identifies potentially ALS-associated promoter regions, and generally outperforms other classification methods. Test results support the hypothesis that non-additive combinations of variants contribute to ALS. Architectures and protocols developed are tailored toward processing population-scale, whole-genome data. We consider this a relevant first step toward deep learning assisted genotype–phenotype association in whole genome-sized data. Availability and implementation Our code will be available on Github, together with a synthetic dataset (https://github.com/byin-cwi/ALS-Deeplearning). The data used in this study is available to bona-fide researchers upon request. Supplementary information Supplementary data are available at Bioinformatics online.

Stichworte

Statistics and Probability; Computational Theory and Mathematics; Biochemistry; Molecular Biology; Computational Mathematics; Computer Science Applications

Erscheinungsjahr

2019

Zeitschriftentitel

Bioinformatics

Band

Ausgabe

Seite(n)

i538-i547

ISSN

1367-4803

eISSN

1460-2059

Page URI

https://pub.uni-bielefeld.de/record/2941756

Zitieren

Yin B, Balvert M, van der Spek RAA, et al. Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype. Bioinformatics. 2019;35(14):i538-i547.

Yin, B., Balvert, M., van der Spek, R. A. A., Dutilh, B. E., Bohté, S., Veldink, J., & Schönhuth, A. (2019). Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype. Bioinformatics, 35(14), i538-i547. doi:10.1093/bioinformatics/btz369

Yin, Bojian, Balvert, Marleen, van der Spek, Rick A A, Dutilh, Bas E, Bohté, Sander, Veldink, Jan, and Schönhuth, Alexander. 2019. “Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype”. Bioinformatics 35 (14): i538-i547.

Yin, B., Balvert, M., van der Spek, R. A. A., Dutilh, B. E., Bohté, S., Veldink, J., and Schönhuth, A. (2019). Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype. Bioinformatics 35, i538-i547.

Yin, B., et al., 2019. Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype. Bioinformatics, 35(14), p i538-i547.

B. Yin, et al., “Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype”, Bioinformatics, vol. 35, 2019, pp. i538-i547.

Yin, B., Balvert, M., van der Spek, R.A.A., Dutilh, B.E., Bohté, S., Veldink, J., Schönhuth, A.: Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype. Bioinformatics. 35, i538-i547 (2019).

Yin, Bojian, Balvert, Marleen, van der Spek, Rick A A, Dutilh, Bas E, Bohté, Sander, Veldink, Jan, and Schönhuth, Alexander. “Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype”. Bioinformatics 35.14 (2019): i538-i547.

Daten bereitgestellt von European Bioinformatics Institute (EBI)