Fast and sensitive multiple alignment of large genomic sequences

Brudno M, Chapman M, Göttgens B, Batzoglou S, Morgenstern B (2003)
BMC Bioinformatics 4(1).

Download
OA
Journal Article | Published | English
Author
; ; ; ;
Abstract
Background: Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method. Results: Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure. Conclusion: We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues.
Publishing Year
ISSN
PUB-ID

Cite this

Brudno M, Chapman M, Göttgens B, Batzoglou S, Morgenstern B. Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics. 2003;4(1).
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., & Morgenstern, B. (2003). Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics, 4(1).
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., and Morgenstern, B. (2003). Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4.
Brudno, M., et al., 2003. Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics, 4(1).
M. Brudno, et al., “Fast and sensitive multiple alignment of large genomic sequences”, BMC Bioinformatics, vol. 4, 2003.
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics. 4, (2003).
Brudno, Michael, Chapman, Michael, Göttgens, Berthold, Batzoglou, Serafim, and Morgenstern, Burkhard. “Fast and sensitive multiple alignment of large genomic sequences”. BMC Bioinformatics 4.1 (2003).
Main File(s)
Access Level
OA Open Access

This data publication is cited in the following publications:
This publication cites the following data publications:

65 Citations in Europe PMC

Data provided by Europe PubMed Central.

First genome-wide association study in an Australian aboriginal population provides insights into genetic risk factors for body mass index and type 2 diabetes.
Anderson D, Cordell HJ, Fakiola M, Francis RW, Syn G, Scaman ES, Davis E, Miles SJ, McLeay T, Jamieson SE, Blackwell JM., PLoS ONE 10(3), 2015
PMID: 25760438
Comparative genomics reveals tissue-specific regulation of prolactin receptor gene expression.
Schennink A, Trott JF, Manjarin R, Lemay DG, Freking BA, Hovey RC., J. Mol. Endocrinol. 54(1), 2015
PMID: 25358647
Insights into the possible role of IFNG and IFNGR1 in Kala-azar and Post Kala-azar Dermal Leishmaniasis in Sudanese patients.
Salih MA, Fakiola M, Abdelraheem MH, Younis BM, Musa AM, ElHassan AM, Blackwell JM, Ibrahim ME, Mohamed HS., BMC Infect. Dis. 14(), 2014
PMID: 25466928
Genetic and functional evidence for a locus controlling otitis media at chromosome 10q26.3.
Rye MS, Scaman ES, Thornton RB, Vijayasekaran S, Coates HL, Francis RW, Pennell CE, Blackwell JM, Jamieson SE., BMC Med. Genet. 15(), 2014
PMID: 24499112
A theoretical model for whole genome alignment.
Belal NA, Heath LS., J. Comput. Biol. 18(5), 2011
PMID: 21210739
Cgaln: fast and space-efficient whole-genome alignment.
Nakato R, Gotoh O., BMC Bioinformatics 11(), 2010
PMID: 20433723
A new measurement of sequence conservation.
Cai X, Hu H, Li X., BMC Genomics 10(), 2009
PMID: 20028539
Lossless filter for multiple repeats with bounded edit distance.
Peterlongo P, Sacomoto GA, do Lago AP, Pisanti N, Sagot MF., Algorithms Mol Biol 4(), 2009
PMID: 19183438
Evaluation of cis-regulatory function in zebrafish.
Pashos EE, Kague E, Fisher S., Brief Funct Genomic Proteomic 7(6), 2008
PMID: 18820318
Aligning sequences by minimum description length.
Conery JS., EURASIP J Bioinform Syst Biol (), 2007
PMID: 18274649
Analysis of invariant sequences in 266 complete genomes.
Goto N, Kurokawa K, Yasunaga T., Gene 401(1-2), 2007
PMID: 17728079
MAP2: multiple alignment of syntenic genomic sequences.
Ye L, Huang X., Nucleic Acids Res. 33(1), 2005
PMID: 15640451
Improvement of alignment accuracy utilizing sequentially conserved motifs.
Chakrabarti S, Bhardwaj N, Anand PA, Sowdhamini R., BMC Bioinformatics 5(), 2004
PMID: 15509307
AGenDA: gene prediction by cross-species sequence comparison.
Taher L, Rinner O, Garg S, Sczyrba A, Morgenstern B., Nucleic Acids Res. 32(Web Server issue), 2004
PMID: 15215399

54 References

Data provided by Europe PubMed Central.

Rose: generating sequence families.
Stoye J, Evers D, Meyer F., Bioinformatics 14(2), 1998
PMID: 9545448
Efficient string matching: an aid to bibliographic search
Aho A, Corasick M., 1975
Trie memory
Fredkin E., 1960
Skip lists: A probabilistic alternative to balanced trees
Pugh W., 1990

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 14693042
PubMed | Europe PMC

Search this title in

Google Scholar