GraphTeams. A method for discovering spatial gene clusters in Hi-C sequencing data

Schulz, Tizian; Stoye, Jens; Dörr, Daniel

GraphTeams. A method for discovering spatial gene clusters in Hi-C sequencing data

Schulz T, Stoye J, Dörr D (2018)
BMC Genomics 19(Suppl. 5): 308.

Zeitschriftenaufsatz | Veröffentlicht | Englisch

Download

s12864-018-4622-0.doerr.pdf 1.42 MB

DOI

https://doi.org/10.1186/s12864-018-4622-0

URN

urn:nbn:de:0070-pub-29190056

Autor*in

Schulz, Tizian^UniBi ; Stoye, Jens^UniBi ; Dörr, Daniel^UniBi

Einrichtung

Centrum für Biotechnologie > Arbeitsgruppe J. Stoye
Centrum für Biotechnologie > Institut für Bioinformatik
Technische Fakultät > AG Genominformatik
Technische Fakultät > Int. Graduiertenkolleg DiDy (GRK 1906)

Abstract / Bemerkung

Abstract Background Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. Results We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. Conclusions By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.

Stichworte

Spatial gene cluster; Gene teams; Single-linkage clustering; Graph teams; Hi-C data

Erscheinungsjahr

2018

Zeitschriftentitel

BMC Genomics

Band

Ausgabe

Suppl. 5

Art.-Nr.

308

ISSN

1471-2164

eISSN

1471-2164

Finanzierungs-Informationen

Open-Access-Publikationskosten wurden durch die Deutsche Forschungsgemeinschaft und die Universität Bielefeld gefördert.

Page URI

https://pub.uni-bielefeld.de/record/2919005

Zitieren

Schulz T, Stoye J, Dörr D. GraphTeams. A method for discovering spatial gene clusters in Hi-C sequencing data. BMC Genomics. 2018;19(Suppl. 5): 308.

Schulz, T., Stoye, J., & Dörr, D. (2018). GraphTeams. A method for discovering spatial gene clusters in Hi-C sequencing data. BMC Genomics, 19(Suppl. 5), 308. doi:10.1186/s12864-018-4622-0

Schulz, Tizian, Stoye, Jens, and Dörr, Daniel. 2018. “GraphTeams. A method for discovering spatial gene clusters in Hi-C sequencing data”. BMC Genomics 19 (Suppl. 5): 308.

Schulz, T., Stoye, J., and Dörr, D. (2018). GraphTeams. A method for discovering spatial gene clusters in Hi-C sequencing data. BMC Genomics 19:308.

Schulz, T., Stoye, J., & Dörr, D., 2018. GraphTeams. A method for discovering spatial gene clusters in Hi-C sequencing data. BMC Genomics, 19(Suppl. 5): 308.

T. Schulz, J. Stoye, and D. Dörr, “GraphTeams. A method for discovering spatial gene clusters in Hi-C sequencing data”, BMC Genomics, vol. 19, 2018, : 308.

Schulz, T., Stoye, J., Dörr, D.: GraphTeams. A method for discovering spatial gene clusters in Hi-C sequencing data. BMC Genomics. 19, : 308 (2018).

Schulz, Tizian, Stoye, Jens, and Dörr, Daniel. “GraphTeams. A method for discovering spatial gene clusters in Hi-C sequencing data”. BMC Genomics 19.Suppl. 5 (2018): 308.

Alle Dateien verfügbar unter der/den folgenden Lizenz(en):

Copyright Statement:

Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]

Volltext(e)

Name

s12864-018-4622-0.doerr.pdf 1.42 MB

Access Level

Open Access

Zuletzt Hochgeladen

2019-09-06T09:18:58Z

MD5 Prüfsumme

e4d55d6b5efe5d4984c599f279e64085

Daten bereitgestellt von European Bioinformatics Institute (EBI)

Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

26 References

Daten bereitgestellt von Europe PubMed Central.

An algorithmic view of gene teams
Beal M, Bergeron A, Corteel S, Raffinot M., 2004

Identifying conserved gene clusters in the presence of homology families.
He X, Goldwasser MH., J. Comput. Biol. 12(6), 2005
PMID: 16108708

A new efficient algorithm for the gene-team problem on general sequences.
Wang BF, Kuo CC, Liu SJ, Lin CH., IEEE/ACM Trans Comput Biol Bioinform 9(2), 2012
PMID: 22282907

Constructing a Gene Team Tree in Almost O (n lg n) Time.
Wang BF, Lin CH, Yang IT., IEEE/ACM Trans Comput Biol Bioinform 11(1), 2014
PMID: 26355514

[Operon: a group of genes with the expression coordinated by an operator.]
JACOB F, PERRIN D, SANCHEZ C, MONOD J., C. R. Hebd. Seances Acad. Sci. 250(), 1960
PMID: 14406329

The NK homeobox gene cluster predates the origin of Hox genes.
Larroux C, Fahey B, Degnan SM, Adamski M, Rokhsar DS, Degnan BM., Curr. Biol. 17(8), 2007
PMID: 17379523

Hi-C: a comprehensive technique to capture the conformation of genomes.
Belton JM, McCord RP, Gibcus JH, Naumova N, Zhan Y, Dekker J., Methods 58(3), 2012
PMID: 22652625

Topological domains in mammalian genomes identified by analysis of chromatin interactions.
Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B., Nature 485(7398), 2012
PMID: 22495300

Comprehensive mapping of long-range interactions reveals folding principles of the human genome.
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J., Science 326(5950), 2009
PMID: 19815776

Three-dimensional folding and functional organization principles of the Drosophila genome.
Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, Parrinello H, Tanay A, Cavalli G., Cell 148(3), 2012
PMID: 22265598

Evolutionarily conserved replication timing profiles predict long-range chromatin interactions and distinguish closely related cell types.
Ryba T, Hiratani I, Lu J, Itoh M, Kulik M, Zhang J, Schulz TC, Robins AJ, Dalton S, Gilbert DM., Genome Res. 20(6), 2010
PMID: 20430782

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions.
Burton JN, Adey A, Patwardhan RP, Qiu R, Kitzman JO, Shendure J., Nat. Biotechnol. 31(12), 2013
PMID: 24185095

Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing.
Selvaraj S, R Dixon J, Bansal V, Ren B., Nat. Biotechnol. 31(12), 2013
PMID: 24185094

Character sets of strings
Didier G, Schmidt T, Stoye J, Tsur D., 2006

Gecko and GhostFam: rigorous and efficient gene cluster detection in prokaryotic genomes.
Schmidt T, Stoye J., Methods Mol. Biol. 396(), 2007
PMID: 18025693

Fast algorithms to enumerate all common intervals of two permutations
Uno T, Yagiura M., 2000

Efficient computation of approximate gene clusters based on reference occurrences.
Jahn K., J. Comput. Biol. 18(9), 2011
PMID: 21899430

Finding approximate gene clusters with Gecko 3.
Winter S, Jahn K, Wehner S, Kuchenbecker L, Marz M, Stoye J, Bocker S., Nucleic Acids Res. 44(20), 2016
PMID: 27679480

Functional gene groups are concentrated within chromosomes, among chromosomes and in the nuclear space of the human genome.
Thevenin A, Ein-Dor L, Ozery-Flato M, Shamir R., Nucleic Acids Res. 42(15), 2014
PMID: 25056310

Cormen TH, Leiserson CE, Rivest RL, Stein C., 2009

Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G., Nat. Genet. 25(1), 2000
PMID: 10802651

GO-based functional dissimilarity of gene sets.
Diaz-Diaz N, Aguilar-Ruiz JS., BMC Bioinformatics 12(), 2011
PMID: 21884611

Snakemake--a scalable bioinformatics workflow engine.
Koster J, Rahmann S., Bioinformatics 28(19), 2012
PMID: 22908215

Ensembl 2016.
Yates A, Akanni W, Amode MR, Barrell D, Billis K, Carvalho-Silva D, Cummins C, Clapham P, Fitzgerald S, Gil L, Giron CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T, Keenan S, Lavidas I, Martin FJ, Maurel T, McLaren W, Murphy DN, Nag R, Nuhn M, Parker A, Patricio M, Pignatelli M, Rahtz M, Riat HS, Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, Zadissa A, Birney E, Harrow J, Muffato M, Perry E, Ruffier M, Spudich G, Trevanion SJ, Cunningham F, Aken BL, Zerbino DR, Flicek P., Nucleic Acids Res. 44(D1), 2015
PMID: 26687719

Improved algorithms for finding gene teams and constructing gene team trees.
Wang BF, Lin CH., IEEE/ACM Trans Comput Biol Bioinform 8(5), 2011
PMID: 21116042

Gene team tree: a hierarchical representation of gene teams for all gap lengths.
Zhang M, Leong HW., J. Comput. Biol. 16(10), 2009
PMID: 19803736

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB