Ensemble approach combining multiple methods improves human transcription start site prediction

Dineen DG, Schroeder M, Higgins DG, Cunningham P (2010)
BMC Genomics 11(1).

Download
OA
Journal Article | Published | English
Author
; ; ;
Abstract
Background: The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets. Results: We demonstrate the heterogeneity of current prediction sets, and take advantage of this heterogeneity to construct a two-level classifier ('Profisi Ensemble') using predictions from 7 programs, along with 2 other data sources. Support vector machines using 'full' and 'reduced' data sets are combined in an either/or approach. We achieve a 14% increase in performance over the current state-of-the-art, as benchmarked by a third-party tool. Conclusions: Supervised learning methods are a useful way to combine predictions from diverse sources.
Publishing Year
ISSN
PUB-ID

Cite this

Dineen DG, Schroeder M, Higgins DG, Cunningham P. Ensemble approach combining multiple methods improves human transcription start site prediction. BMC Genomics. 2010;11(1).
Dineen, D. G., Schroeder, M., Higgins, D. G., & Cunningham, P. (2010). Ensemble approach combining multiple methods improves human transcription start site prediction. BMC Genomics, 11(1).
Dineen, D. G., Schroeder, M., Higgins, D. G., and Cunningham, P. (2010). Ensemble approach combining multiple methods improves human transcription start site prediction. BMC Genomics 11.
Dineen, D.G., et al., 2010. Ensemble approach combining multiple methods improves human transcription start site prediction. BMC Genomics, 11(1).
D.G. Dineen, et al., “Ensemble approach combining multiple methods improves human transcription start site prediction”, BMC Genomics, vol. 11, 2010.
Dineen, D.G., Schroeder, M., Higgins, D.G., Cunningham, P.: Ensemble approach combining multiple methods improves human transcription start site prediction. BMC Genomics. 11, (2010).
Dineen, David G., Schroeder, Markus, Higgins, Desmond G., and Cunningham, Padraig. “Ensemble approach combining multiple methods improves human transcription start site prediction”. BMC Genomics 11.1 (2010).
Main File(s)
Access Level
OA Open Access
Last Uploaded
2012-02-16 08:37:57

This data publication is cited in the following publications:
This publication cites the following data publications:

2 Citations in Europe PMC

Data provided by Europe PubMed Central.

The impact of sequence length and number of sequences on promoter prediction performance.
Carvalho SG, Guerra-Sa R, de C Merschmann LH., BMC Bioinformatics 16 Suppl 19(), 2015
PMID: 26695879

33 References

Data provided by Europe PubMed Central.

Computational identification of promoters and first exons in the human genome.
Davuluri RV, Grosse I, Zhang MQ., Nat. Genet. 29(4), 2001
PMID: 11726928
Using multiple alignments to improve gene prediction.
Gross SS, Brent MR., J. Comput. Biol. 13(2), 2006
PMID: 16597247
Combining classifiers for improved classification of proteins from sequence or structure.
Melvin I, Weston J, Leslie CS, Noble WS., BMC Bioinformatics 9(), 2008
PMID: 18808707
The human genome browser at UCSC.
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D., Genome Res. 12(6), 2002
PMID: 12045153
DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs.
Suzuki Y, Yamashita R, Nakai K, Sugano S., Nucleic Acids Res. 30(1), 2002
PMID: 11752328
The WEKA data mining software: an update
AUTHOR UNKNOWN, 2009
LIBSVM: a library for support vector machines
AUTHOR UNKNOWN, 2001

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 21118509
PubMed | Europe PMC

Search this title in

Google Scholar