Benchmarking tools for the alignment of functional noncoding DNA

Pollard DA, Bergman CM, Stoye J, Celniker SE, Eisen MB (2004)
BMC Bioinformatics 5(1): 6.

Download
OA
Journal Article | Original Article | Published | English
Author
; ; ; ;
Abstract / Notes
Background: Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against "correct" alignments inferred from structural data. For noncoding sequences, where such independent validation is lacking, simulation provides an effective means to generate "correct" alignments with which to benchmark alignment tools. Results: Using rates of noncoding sequence evolution estimated from the genus Drosophila, we simulated alignments over a range of divergence times under varying models incorporating point substitution, insertion/deletion events, and short blocks of constrained sequences such as those found in cis-regulatory regions. We then compared "correct" alignments generated by a modified version of the ROSE simulation platform to alignments of the simulated derived sequences produced by eight pairwise alignment tools (Avid, BlastZ, Chaos, ClustalW, DiAlign, Lagan, Needle, and WABA) to determine the off-the-shelf performance of each tool. As expected, the ability to align noncoding sequences accurately decreases with increasing divergence for all tools, and declines faster in the presence of insertion/deletion evolution. Global alignment tools (Avid, ClustalW, Lagan, and Needle) typically have higher sensitivity over entire noncoding sequences as well as in constrained sequences. Local tools (BlastZ, Chaos, and WABA) have lower overall sensitivity as a consequence of incomplete coverage, but have high specificity to detect constrained sequences as well as high sensitivity within the subset of sequences they align. Tools such as DiAlign, which generate both local and global outputs, produce alignments of constrained sequences with both high sensitivity and specificity for divergence distances in the range of 1.25 - 3.0 substitutions per site. Conclusion: For species with genomic properties similar to Drosophila, we conclude that a single pair of optimally diverged species analyzed with a high performance alignment tool can yield accurate and specific alignments of functionally constrained noncoding sequences. Further algorithm development, optimization of alignment parameters, and benchmarking studies will be necessary to extract the maximal biological information from alignments of functional noncoding DNA.
Publishing Year
ISSN
PUB-ID

Cite this

Pollard DA, Bergman CM, Stoye J, Celniker SE, Eisen MB. Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics. 2004;5(1):6.
Pollard, D. A., Bergman, C. M., Stoye, J., Celniker, S. E., & Eisen, M. B. (2004). Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics, 5(1), 6. doi:10.1186/1471-2105-5-6
Pollard, D. A., Bergman, C. M., Stoye, J., Celniker, S. E., and Eisen, M. B. (2004). Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics 5, 6.
Pollard, D.A., et al., 2004. Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics, 5(1), p 6.
D.A. Pollard, et al., “Benchmarking tools for the alignment of functional noncoding DNA”, BMC Bioinformatics, vol. 5, 2004, pp. 6.
Pollard, D.A., Bergman, C.M., Stoye, J., Celniker, S.E., Eisen, M.B.: Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics. 5, 6 (2004).
Pollard, Daniel A., Bergman, Casey M., Stoye, Jens, Celniker, Susan E., and Eisen, Michael B. “Benchmarking tools for the alignment of functional noncoding DNA”. BMC Bioinformatics 5.1 (2004): 6.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
Access Level
OA Open Access

This data publication is cited in the following publications:
This publication cites the following data publications:

71 Citations in Europe PMC

Data provided by Europe PubMed Central.

MySSP: non-stationary evolutionary sequence simulation, including indels.
Rosenberg MS., Evol Bioinform Online 1(), 2007
PMID: 19325855
BlastAlign: a program that uses blast to align problematic nucleotide sequences.
Belshaw R, Katzourakis A., Bioinformatics 21(1), 2005
PMID: 15310559
Multiple sequence alignment with user-defined constraints at GOBICS.
Morgenstern B, Werner N, Prohaska SJ, Steinkamp R, Schneider I, Subramanian AR, Stadler PF, Weyer-Menkhoff J., Bioinformatics 21(7), 2005
PMID: 15546937
Bases of motifs for generating repeated patterns with wild cards.
Pisanti N, Crochemore M, Grossi R, Sagot MF., IEEE/ACM Trans Comput Biol Bioinform 2(1), 2005
PMID: 17044163
DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.
Subramanian AR, Weyer-Menkhoff J, Kaufmann M, Morgenstern B., BMC Bioinformatics 6(), 2005
PMID: 15784139
ThurGood: evaluating assembly-to-assembly mapping.
Shatkay H, Miller J, Mobarry C, Flanigan M, Yooseph S, Sutton G., J Comput Biol 11(5), 2004
PMID: 15700403
Aligning multiple genomic sequences with the threaded blockset aligner.
Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W., Genome Res 14(4), 2004
PMID: 15060014
Correction: Benchmarking tools for the alignment of functional noncoding DNA.
Pollard DA, Bergman CM, Stoye J, Celniker SE, Eisen MB., BMC Bioinformatics 5(), 2004
PMID: 15186509
DIALIGN: multiple DNA and protein sequence alignment at BiBiServ.
Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215344
The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences.
Brudno M, Steinkamp R, Morgenstern B., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215346
VISTA: computational tools for comparative genomics.
Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215394
CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison.
Castrignanò T, Canali A, Grillo G, Liuni S, Mignone F, Pesole G., Nucleic Acids Res 32(web server issue), 2004
PMID: 15215464
Bioinformatics for the 'bench biologist': how to find regulatory regions in genomic DNA.
Nardone J, Lee DU, Ansel KM, Rao A., Nat Immunol 5(8), 2004
PMID: 15282556
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.
Schmollinger M, Nieselt K, Kaufmann M, Morgenstern B., BMC Bioinformatics 5(), 2004
PMID: 15357879
Marine organism cell biology and regulatory sequence discoveryin comparative functional genomics.
Barnes DW, Mattingly CJ, Parton A, Dowell LM, Bayne CJ, Forrest JN., Cytotechnology 46(2-3), 2004
PMID: 19003267
Bioinformatics: harvesting information for plant and crop science.
King GJ., Semin Cell Dev Biol 15(6), 2004
PMID: 15561592

71 References

Data provided by Europe PubMed Central.

On some criteria for estimating the order of a Markov chain.
Katz RW., 1981
PAML (version 3.13)
AUTHOR UNKNOWN, 0
Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.
Hasegawa M, Kishino H, Yano T., J. Mol. Evol. 22(2), 1985
PMID: 3934395
Codon usage bias and base composition of nuclear genes in Drosophila.
Moriyama EN, Hartl DL., Genetics 134(3), 1993
PMID: 8349115
Intraspecific nuclear DNA variation in Drosophila.
Moriyama EN, Powell JR., Mol. Biol. Evol. 13(1), 1996
PMID: 8583899
ROSE (version 1.3)
AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0
Bayesian adaptive sequence alignment algorithms.
Zhu J, Liu JS, Lawrence CE., Bioinformatics 14(1), 1998
PMID: 9520499
BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences.
Tatusova TA, Madden TL., FEMS Microbiol. Lett. 174(2), 1999
PMID: 10339815
Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs.
Jareborg N, Birney E, Durbin R., Genome Res. 9(9), 1999
PMID: 10508839
Fast algorithms for large-scale genome alignment and comparison.
Delcher AL, Phillippy A, Carlton J, Salzberg SL., Nucleic Acids Res. 30(11), 2002
PMID: 12034836
OWEN: aligning long collinear regions of genomes.
Ogurtsov AY, Roytberg MA, Shabalina SA, Kondrashov AS., Bioinformatics 18(12), 2002
PMID: 12490463
Improved tools for biological sequence comparison.
Pearson WR, Lipman DJ., Proc. Natl. Acad. Sci. U.S.A. 85(8), 1988
PMID: 3162770
Human-mouse alignments with BLASTZ.
Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W., Genome Res. 13(1), 2003
PMID: 12529312
EMBOSS: the European Molecular Biology Open Software Suite.
Rice P, Longden I, Bleasby A., Trends Genet. 16(6), 2000
PMID: 10827456

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 14736341
PubMed | Europe PMC

Search this title in

Google Scholar