Benchmarking tools for the alignment of functional noncoding DNA

Pollard DA, Bergman CM, Stoye J, Celniker SE, Eisen MB (2004)
BMC Bioinformatics 5(1): 6.

Journal Article | Original Article | Published | English
; ; ; ;
Background: Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against "correct" alignments inferred from structural data. For noncoding sequences, where such independent validation is lacking, simulation provides an effective means to generate "correct" alignments with which to benchmark alignment tools. Results: Using rates of noncoding sequence evolution estimated from the genus Drosophila, we simulated alignments over a range of divergence times under varying models incorporating point substitution, insertion/deletion events, and short blocks of constrained sequences such as those found in cis-regulatory regions. We then compared "correct" alignments generated by a modified version of the ROSE simulation platform to alignments of the simulated derived sequences produced by eight pairwise alignment tools (Avid, BlastZ, Chaos, ClustalW, DiAlign, Lagan, Needle, and WABA) to determine the off-the-shelf performance of each tool. As expected, the ability to align noncoding sequences accurately decreases with increasing divergence for all tools, and declines faster in the presence of insertion/deletion evolution. Global alignment tools (Avid, ClustalW, Lagan, and Needle) typically have higher sensitivity over entire noncoding sequences as well as in constrained sequences. Local tools (BlastZ, Chaos, and WABA) have lower overall sensitivity as a consequence of incomplete coverage, but have high specificity to detect constrained sequences as well as high sensitivity within the subset of sequences they align. Tools such as DiAlign, which generate both local and global outputs, produce alignments of constrained sequences with both high sensitivity and specificity for divergence distances in the range of 1.25 - 3.0 substitutions per site. Conclusion: For species with genomic properties similar to Drosophila, we conclude that a single pair of optimally diverged species analyzed with a high performance alignment tool can yield accurate and specific alignments of functionally constrained noncoding sequences. Further algorithm development, optimization of alignment parameters, and benchmarking studies will be necessary to extract the maximal biological information from alignments of functional noncoding DNA.
Publishing Year

Cite this

Pollard DA, Bergman CM, Stoye J, Celniker SE, Eisen MB. Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics. 2004;5(1):6.
Pollard, D. A., Bergman, C. M., Stoye, J., Celniker, S. E., & Eisen, M. B. (2004). Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics, 5(1), 6. doi:10.1186/1471-2105-5-6
Pollard, D. A., Bergman, C. M., Stoye, J., Celniker, S. E., and Eisen, M. B. (2004). Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics 5, 6.
Pollard, D.A., et al., 2004. Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics, 5(1), p 6.
D.A. Pollard, et al., “Benchmarking tools for the alignment of functional noncoding DNA”, BMC Bioinformatics, vol. 5, 2004, pp. 6.
Pollard, D.A., Bergman, C.M., Stoye, J., Celniker, S.E., Eisen, M.B.: Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics. 5, 6 (2004).
Pollard, Daniel A., Bergman, Casey M., Stoye, Jens, Celniker, Susan E., and Eisen, Michael B. “Benchmarking tools for the alignment of functional noncoding DNA”. BMC Bioinformatics 5.1 (2004): 6.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
Access Level
OA Open Access

This data publication is cited in the following publications:
This publication cites the following data publications:

71 Citations in Europe PMC

Data provided by Europe PubMed Central.

The appeasement of Doug: a synthetic approach to enhancer biology.
Vincent BJ, Estrada J, DePace AH., Integr Biol (Camb) 8(4), 2016
PMID: 26936291
BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements.
De Witte D, Van de Velde J, Decap D, Van Bel M, Audenaert P, Demeester P, Dhoedt B, Vandepoele K, Fostier J., Bioinformatics 31(23), 2015
PMID: 26254488
Association of obesity with serum leptin, adiponectin, and serotonin and gut microflora in beagle dogs.
Park HJ, Lee SE, Kim HB, Isaacson RE, Seo KW, Song KH., J. Vet. Intern. Med. 29(1), 2015
PMID: 25407880
Methods to detect selection on noncoding DNA.
Zhen Y, Andolfatto P., Methods Mol. Biol. 856(), 2012
PMID: 22399458
Use of ChIP-Seq data for the design of a multiple promoter-alignment method.
Erb I, Gonzalez-Vallinas JR, Bussotti G, Blanco E, Eyras E, Notredame C., Nucleic Acids Res. 40(7), 2012
PMID: 22230796
Cgaln: fast and space-efficient whole-genome alignment.
Nakato R, Gotoh O., BMC Bioinformatics 11(), 2010
PMID: 20433723
Patterns of DNA-sequence divergence between Drosophila miranda and D. pseudoobscura.
Marion de Proce S, Halligan DL, Keightley PD, Charlesworth B., J. Mol. Evol. 69(6), 2009
PMID: 19859648
Comparative genomic workflow: discovery of conserved noncoding DNA patterns.
Rajapakse J, Pooja , Chen C, Ho SL., IEEE Eng Med Biol Mag 28(4), 2009
PMID: 19622420
MySSP: non-stationary evolutionary sequence simulation, including indels.
Rosenberg MS., Evol. Bioinform. Online 1(), 2007
PMID: 19325855
Bases of motifs for generating repeated patterns with wild cards.
Pisanti N, Crochemore M, Grossi R, Sagot MF., IEEE/ACM Trans Comput Biol Bioinform 2(1), 2005
PMID: 17044163
Genomic multiple sequence alignments: refinement using a genetic algorithm.
Wang C, Lefkowitz EJ., BMC Bioinformatics 6(), 2005
PMID: 16086841
Multiple alignment of genomic sequences using CHAOS, DIALIGN and ABC.
Pohler D, Werner N, Steinkamp R, Morgenstern B., Nucleic Acids Res. 33(Web Server issue), 2005
PMID: 15980528
Marine organism cell biology and regulatory sequence discoveryin comparative functional genomics.
Barnes DW, Mattingly CJ, Parton A, Dowell LM, Bayne CJ, Forrest JN Jr., Cytotechnology 46(2-3), 2004
PMID: 19003267
ThurGood: evaluating assembly-to-assembly mapping.
Shatkay H, Miller J, Mobarry C, Flanigan M, Yooseph S, Sutton G., J. Comput. Biol. 11(5), 2004
PMID: 15700403
Bioinformatics: harvesting information for plant and crop science.
King GJ., Semin. Cell Dev. Biol. 15(6), 2004
PMID: 15561592
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.
Schmollinger M, Nieselt K, Kaufmann M, Morgenstern B., BMC Bioinformatics 5(), 2004
PMID: 15357879
Correction: Benchmarking tools for the alignment of functional noncoding DNA.
Pollard DA, Bergman CM, Stoye J, Celniker SE, Eisen MB., BMC Bioinformatics 5(), 2004
PMID: 15186509

71 References

Data provided by Europe PubMed Central.

On some criteria for estimating the order of a Markov chain.
Katz RW., 1981
PAML (version 3.13)
Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.
Hasegawa M, Kishino H, Yano T., J. Mol. Evol. 22(2), 1985
PMID: 3934395
Codon usage bias and base composition of nuclear genes in Drosophila.
Moriyama EN, Hartl DL., Genetics 134(3), 1993
PMID: 8349115
Intraspecific nuclear DNA variation in Drosophila.
Moriyama EN, Powell JR., Mol. Biol. Evol. 13(1), 1996
PMID: 8583899
ROSE (version 1.3)

Bayesian adaptive sequence alignment algorithms.
Zhu J, Liu JS, Lawrence CE., Bioinformatics 14(1), 1998
PMID: 9520499
BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences.
Tatusova TA, Madden TL., FEMS Microbiol. Lett. 174(2), 1999
PMID: 10339815
Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs.
Jareborg N, Birney E, Durbin R., Genome Res. 9(9), 1999
PMID: 10508839
Fast algorithms for large-scale genome alignment and comparison.
Delcher AL, Phillippy A, Carlton J, Salzberg SL., Nucleic Acids Res. 30(11), 2002
PMID: 12034836
OWEN: aligning long collinear regions of genomes.
Ogurtsov AY, Roytberg MA, Shabalina SA, Kondrashov AS., Bioinformatics 18(12), 2002
PMID: 12490463
Improved tools for biological sequence comparison.
Pearson WR, Lipman DJ., Proc. Natl. Acad. Sci. U.S.A. 85(8), 1988
PMID: 3162770
Human-mouse alignments with BLASTZ.
Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W., Genome Res. 13(1), 2003
PMID: 12529312
EMBOSS: the European Molecular Biology Open Software Suite.
Rice P, Longden I, Bleasby A., Trends Genet. 16(6), 2000
PMID: 10827456


0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®


PMID: 14736341
PubMed | Europe PMC

Search this title in

Google Scholar