Fast and sensitive multiple alignment of large genomic sequences

Brudno M, Chapman M, Göttgens B, Batzoglou S, Morgenstern B (2003)
BMC Bioinformatics 4(1): 66.

Download
OA
Journal Article | Published | English
Author
; ; ; ;
Abstract
Background: Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method. Results: Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure. Conclusion: We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues.
Publishing Year
ISSN
PUB-ID

Cite this

Brudno M, Chapman M, Göttgens B, Batzoglou S, Morgenstern B. Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics. 2003;4(1):66.
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., & Morgenstern, B. (2003). Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics, 4(1), 66. doi:10.1186/1471-2105-4-66
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., and Morgenstern, B. (2003). Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4, 66.
Brudno, M., et al., 2003. Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics, 4(1), p 66.
M. Brudno, et al., “Fast and sensitive multiple alignment of large genomic sequences”, BMC Bioinformatics, vol. 4, 2003, pp. 66.
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics. 4, 66 (2003).
Brudno, Michael, Chapman, Michael, Göttgens, Berthold, Batzoglou, Serafim, and Morgenstern, Burkhard. “Fast and sensitive multiple alignment of large genomic sequences”. BMC Bioinformatics 4.1 (2003): 66.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
Access Level
OA Open Access

This data publication is cited in the following publications:
This publication cites the following data publications:

70 Citations in Europe PMC

Data provided by Europe PubMed Central.

The CD8alpha from sea bass (Dicentrarchus labrax L.): Cloning, expression and 3D modelling.
Buonocore F, Randelli E, Bird S, Secombes CJ, Costantini S, Facchiano A, Mazzini M, Scapigliati G., Fish Shellfish Immunol 20(4), 2006
PMID: 16230027
Accurate anchoring alignment of divergent sequences.
Huang W, Umbach DM, Li L., Bioinformatics 22(1), 2006
PMID: 16301203
Shuffling of cis-regulatory elements is a pervasive feature of the vertebrate lineage.
Sanges R, Kalmar E, Claudiani P, D'Amato M, Muller F, Stupka E., Genome Biol 7(7), 2006
PMID: 16859531
Multiple sequence alignment with user-defined constraints at GOBICS.
Morgenstern B, Werner N, Prohaska SJ, Steinkamp R, Schneider I, Subramanian AR, Stadler PF, Weyer-Menkhoff J., Bioinformatics 21(7), 2005
PMID: 15546937
MAP2: multiple alignment of syntenic genomic sequences.
Ye L, Huang X., Nucleic Acids Res 33(1), 2005
PMID: 15640451
Gene expression patterns associated with blood-feeding in the malaria mosquito Anopheles gambiae.
Dana AN, Hong YS, Kern MK, Hillenmeyer ME, Harker BW, Lobo NF, Hogan JR, Romans P, Collins FH., BMC Genomics 6(), 2005
PMID: 15651988
GMAP: a genomic mapping and alignment program for mRNA and EST sequences.
Wu TD, Watanabe CK., Bioinformatics 21(9), 2005
PMID: 15728110
DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.
Subramanian AR, Weyer-Menkhoff J, Kaufmann M, Morgenstern B., BMC Bioinformatics 6(), 2005
PMID: 15784139
Multiple sequence alignments.
Wallace IM, Blackshields G, Higgins DG., Curr Opin Struct Biol 15(3), 2005
PMID: 15963889
Multiple alignment of genomic sequences using CHAOS, DIALIGN and ABC.
Pöhler D, Werner N, Steinkamp R, Morgenstern B., Nucleic Acids Res 33(web server issue), 2005
PMID: 15980528
DIALIGN: multiple DNA and protein sequence alignment at BiBiServ.
Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215344
The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences.
Brudno M, Steinkamp R, Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215346
AGenDA: gene prediction by cross-species sequence comparison.
Taher L, Rinner O, Garg S, Sczyrba A, Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215399
AUGUSTUS: a web server for gene finding in eukaryotes.
Stanke M, Steinkamp R, Waack S, Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215400
Bioinformatics for the 'bench biologist': how to find regulatory regions in genomic DNA.
Nardone J, Lee DU, Ansel KM, Rao A., Nat Immunol 5(8), 2004
PMID: 15282556
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.
Schmollinger M, Nieselt K, Kaufmann M, Morgenstern B., BMC Bioinformatics 5(), 2004
PMID: 15357879
Improvement of alignment accuracy utilizing sequentially conserved motifs.
Chakrabarti S, Bhardwaj N, Anand PA, Sowdhamini R., BMC Bioinformatics 5(), 2004
PMID: 15509307

54 References

Data provided by Europe PubMed Central.

Rose: generating sequence families.
Stoye J, Evers D, Meyer F., Bioinformatics 14(2), 1998
PMID: 9545448
Efficient string matching: an aid to bibliographic search
Aho A, Corasick M., 1975
Trie memory
Fredkin E., 1960
Skip lists: A probabilistic alternative to balanced trees
Pugh W., 1990

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 14693042
PubMed | Europe PMC

Search this title in

Google Scholar