Fast and sensitive multiple alignment of large genomic sequences
Brudno M, Chapman M, Göttgens B, Batzoglou S, Morgenstern B (2003)
BMC Bioinformatics 4(1): 66.
Download
Journal Article
| Published
| English
Author
Brudno, Michael
;
Chapman, Michael
;
Göttgens, Berthold
;
Batzoglou, Serafim
;
Morgenstern, Burkhard
Department
Abstract
Background: Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method. Results: Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure. Conclusion: We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues.
Publishing Year
ISSN
PUB-ID
Cite this
Brudno M, Chapman M, Göttgens B, Batzoglou S, Morgenstern B. Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics. 2003;4(1):66.
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., & Morgenstern, B. (2003). Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics, 4(1), 66. doi:10.1186/1471-2105-4-66
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., and Morgenstern, B. (2003). Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4, 66.
Brudno, M., et al., 2003. Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics, 4(1), p 66.
M. Brudno, et al., “Fast and sensitive multiple alignment of large genomic sequences”, BMC Bioinformatics, vol. 4, 2003, pp. 66.
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics. 4, 66 (2003).
Brudno, Michael, Chapman, Michael, Göttgens, Berthold, Batzoglou, Serafim, and Morgenstern, Burkhard. “Fast and sensitive multiple alignment of large genomic sequences”. BMC Bioinformatics 4.1 (2003): 66.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
File Name
Access Level

This data publication is cited in the following publications:
This publication cites the following data publications:
70 Citations in Europe PMC
Data provided by Europe PubMed Central.
Incorporating information from length-mutational events into phylogenetic analysis.
Müller K., Mol Phylogenet Evol 38(3), 2006
PMID: 16129628
Müller K., Mol Phylogenet Evol 38(3), 2006
PMID: 16129628
The CD8alpha from sea bass (Dicentrarchus labrax L.): Cloning, expression and 3D modelling.
Buonocore F, Randelli E, Bird S, Secombes CJ, Costantini S, Facchiano A, Mazzini M, Scapigliati G., Fish Shellfish Immunol 20(4), 2006
PMID: 16230027
Buonocore F, Randelli E, Bird S, Secombes CJ, Costantini S, Facchiano A, Mazzini M, Scapigliati G., Fish Shellfish Immunol 20(4), 2006
PMID: 16230027
Accurate anchoring alignment of divergent sequences.
Huang W, Umbach DM, Li L., Bioinformatics 22(1), 2006
PMID: 16301203
Huang W, Umbach DM, Li L., Bioinformatics 22(1), 2006
PMID: 16301203
Shuffling of cis-regulatory elements is a pervasive feature of the vertebrate lineage.
Sanges R, Kalmar E, Claudiani P, D'Amato M, Muller F, Stupka E., Genome Biol 7(7), 2006
PMID: 16859531
Sanges R, Kalmar E, Claudiani P, D'Amato M, Muller F, Stupka E., Genome Biol 7(7), 2006
PMID: 16859531
Multiple sequence alignment with user-defined constraints at GOBICS.
Morgenstern B, Werner N, Prohaska SJ, Steinkamp R, Schneider I, Subramanian AR, Stadler PF, Weyer-Menkhoff J., Bioinformatics 21(7), 2005
PMID: 15546937
Morgenstern B, Werner N, Prohaska SJ, Steinkamp R, Schneider I, Subramanian AR, Stadler PF, Weyer-Menkhoff J., Bioinformatics 21(7), 2005
PMID: 15546937
MAP2: multiple alignment of syntenic genomic sequences.
Ye L, Huang X., Nucleic Acids Res 33(1), 2005
PMID: 15640451
Ye L, Huang X., Nucleic Acids Res 33(1), 2005
PMID: 15640451
Gene expression patterns associated with blood-feeding in the malaria mosquito Anopheles gambiae.
Dana AN, Hong YS, Kern MK, Hillenmeyer ME, Harker BW, Lobo NF, Hogan JR, Romans P, Collins FH., BMC Genomics 6(), 2005
PMID: 15651988
Dana AN, Hong YS, Kern MK, Hillenmeyer ME, Harker BW, Lobo NF, Hogan JR, Romans P, Collins FH., BMC Genomics 6(), 2005
PMID: 15651988
GMAP: a genomic mapping and alignment program for mRNA and EST sequences.
Wu TD, Watanabe CK., Bioinformatics 21(9), 2005
PMID: 15728110
Wu TD, Watanabe CK., Bioinformatics 21(9), 2005
PMID: 15728110
Identification of genetic polymorphisms through comparative DNA sequence analysis on the K-ras gene: implications for lung tumor susceptibility.
Wang M, Wang Y, You M., Exp Lung Res 31(2), 2005
PMID: 15824019
Wang M, Wang Y, You M., Exp Lung Res 31(2), 2005
PMID: 15824019
DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.
Subramanian AR, Weyer-Menkhoff J, Kaufmann M, Morgenstern B., BMC Bioinformatics 6(), 2005
PMID: 15784139
Subramanian AR, Weyer-Menkhoff J, Kaufmann M, Morgenstern B., BMC Bioinformatics 6(), 2005
PMID: 15784139
Multiple sequence alignments.
Wallace IM, Blackshields G, Higgins DG., Curr Opin Struct Biol 15(3), 2005
PMID: 15963889
Wallace IM, Blackshields G, Higgins DG., Curr Opin Struct Biol 15(3), 2005
PMID: 15963889
Multiple alignment of genomic sequences using CHAOS, DIALIGN and ABC.
Pöhler D, Werner N, Steinkamp R, Morgenstern B., Nucleic Acids Res 33(web server issue), 2005
PMID: 15980528
Pöhler D, Werner N, Steinkamp R, Morgenstern B., Nucleic Acids Res 33(web server issue), 2005
PMID: 15980528
DIALIGN: multiple DNA and protein sequence alignment at BiBiServ.
Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215344
Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215344
The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences.
Brudno M, Steinkamp R, Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215346
Brudno M, Steinkamp R, Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215346
AGenDA: gene prediction by cross-species sequence comparison.
Taher L, Rinner O, Garg S, Sczyrba A, Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215399
Taher L, Rinner O, Garg S, Sczyrba A, Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215399
AUGUSTUS: a web server for gene finding in eukaryotes.
Stanke M, Steinkamp R, Waack S, Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215400
Stanke M, Steinkamp R, Waack S, Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215400
Bioinformatics for the 'bench biologist': how to find regulatory regions in genomic DNA.
Nardone J, Lee DU, Ansel KM, Rao A., Nat Immunol 5(8), 2004
PMID: 15282556
Nardone J, Lee DU, Ansel KM, Rao A., Nat Immunol 5(8), 2004
PMID: 15282556
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.
Schmollinger M, Nieselt K, Kaufmann M, Morgenstern B., BMC Bioinformatics 5(), 2004
PMID: 15357879
Schmollinger M, Nieselt K, Kaufmann M, Morgenstern B., BMC Bioinformatics 5(), 2004
PMID: 15357879
Improvement of alignment accuracy utilizing sequentially conserved motifs.
Chakrabarti S, Bhardwaj N, Anand PA, Sowdhamini R., BMC Bioinformatics 5(), 2004
PMID: 15509307
Chakrabarti S, Bhardwaj N, Anand PA, Sowdhamini R., BMC Bioinformatics 5(), 2004
PMID: 15509307
Noncoding regulatory sequences of Ciona exhibit strong correspondence between evolutionary constraint and functional importance.
Johnson DS, Davidson B, Brown CD, Smith WC, Sidow A., Genome Res 14(12), 2004
PMID: 15545496
Johnson DS, Davidson B, Brown CD, Smith WC, Sidow A., Genome Res 14(12), 2004
PMID: 15545496
54 References
Data provided by Europe PubMed Central.
Rose: generating sequence families.
Stoye J, Evers D, Meyer F., Bioinformatics 14(2), 1998
PMID: 9545448
Stoye J, Evers D, Meyer F., Bioinformatics 14(2), 1998
PMID: 9545448
Efficient string matching: an aid to bibliographic search
Aho A, Corasick M., 1975
Aho A, Corasick M., 1975
Trie memory
Fredkin E., 1960
Fredkin E., 1960
Skip lists: A probabilistic alternative to balanced trees
Pugh W., 1990
Pugh W., 1990
Export
0 Marked PublicationsWeb of Science
View record in Web of Science®Sources
PMID: 14693042
PubMed | Europe PMC