Ensemble approach combining multiple methods improves human transcription start site prediction

Dineen DG, Schroeder M, Higgins DG, Cunningham P (2010)
BMC Genomics 11(1): 677.

Download
OA
Zeitschriftenaufsatz | Veröffentlicht | Englisch
Volltext vorhanden für diesen Nachweis
Autor
; ; ;
Abstract / Bemerkung
Background: The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets. Results: We demonstrate the heterogeneity of current prediction sets, and take advantage of this heterogeneity to construct a two-level classifier ('Profisi Ensemble') using predictions from 7 programs, along with 2 other data sources. Support vector machines using 'full' and 'reduced' data sets are combined in an either/or approach. We achieve a 14% increase in performance over the current state-of-the-art, as benchmarked by a third-party tool. Conclusions: Supervised learning methods are a useful way to combine predictions from diverse sources.
Erscheinungsjahr
Zeitschriftentitel
BMC Genomics
Band
11
Ausgabe
1
Seite(n)
677
ISSN
PUB-ID

Zitieren

Dineen DG, Schroeder M, Higgins DG, Cunningham P. Ensemble approach combining multiple methods improves human transcription start site prediction. BMC Genomics. 2010;11(1):677.
Dineen, D. G., Schroeder, M., Higgins, D. G., & Cunningham, P. (2010). Ensemble approach combining multiple methods improves human transcription start site prediction. BMC Genomics, 11(1), 677. doi:10.1186/1471-2164-11-677
Dineen, D. G., Schroeder, M., Higgins, D. G., and Cunningham, P. (2010). Ensemble approach combining multiple methods improves human transcription start site prediction. BMC Genomics 11, 677.
Dineen, D.G., et al., 2010. Ensemble approach combining multiple methods improves human transcription start site prediction. BMC Genomics, 11(1), p 677.
D.G. Dineen, et al., “Ensemble approach combining multiple methods improves human transcription start site prediction”, BMC Genomics, vol. 11, 2010, pp. 677.
Dineen, D.G., Schroeder, M., Higgins, D.G., Cunningham, P.: Ensemble approach combining multiple methods improves human transcription start site prediction. BMC Genomics. 11, 677 (2010).
Dineen, David G., Schroeder, Markus, Higgins, Desmond G., and Cunningham, Padraig. “Ensemble approach combining multiple methods improves human transcription start site prediction”. BMC Genomics 11.1 (2010): 677.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2012-02-16T08:37:57Z

2 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

The impact of sequence length and number of sequences on promoter prediction performance.
Carvalho SG, Guerra-Sá R, de C Merschmann LH., BMC Bioinformatics 16 Suppl 19(), 2015
PMID: 26695879

33 References

Daten bereitgestellt von Europe PubMed Central.

The transcriptional landscape of the mammalian genome.
Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R, Shimokawa K, Bajic VB, Brenner SE, Batalov S, Forrest AR, Zavolan M, Davis MJ, Wilming LG, Aidinis V, Allen JE, Ambesi-Impiombato A, Apweiler R, Aturaliya RN, Bailey TL, Bansal M, Baxter L, Beisel KW, Bersano T, Bono H, Chalk AM, Chiu KP, Choudhary V, Christoffels A, Clutterbuck DR, Crowe ML, Dalla E, Dalrymple BP, de Bono B, Della Gatta G, di Bernardo D, Down T, Engstrom P, Fagiolini M, Faulkner G, Fletcher CF, Fukushima T, Furuno M, Futaki S, Gariboldi M, Georgii-Hemming P, Gingeras TR, Gojobori T, Green RE, Gustincich S, Harbers M, Hayashi Y, Hensch TK, Hirokawa N, Hill D, Huminiecki L, Iacono M, Ikeo K, Iwama A, Ishikawa T, Jakt M, Kanapin A, Katoh M, Kawasawa Y, Kelso J, Kitamura H, Kitano H, Kollias G, Krishnan SP, Kruger A, Kummerfeld SK, Kurochkin IV, Lareau LF, Lazarevic D, Lipovich L, Liu J, Liuni S, McWilliam S, Madan Babu M, Madera M, Marchionni L, Matsuda H, Matsuzawa S, Miki H, Mignone F, Miyake S, Morris K, Mottagui-Tabar S, Mulder N, Nakano N, Nakauchi H, Ng P, Nilsson R, Nishiguchi S, Nishikawa S, Nori F, Ohara O, Okazaki Y, Orlando V, Pang KC, Pavan WJ, Pavesi G, Pesole G, Petrovsky N, Piazza S, Reed J, Reid JF, Ring BZ, Ringwald M, Rost B, Ruan Y, Salzberg SL, Sandelin A, Schneider C, Schonbach C, Sekiguchi K, Semple CA, Seno S, Sessa L, Sheng Y, Shibata Y, Shimada H, Shimada K, Silva D, Sinclair B, Sperling S, Stupka E, Sugiura K, Sultana R, Takenaka Y, Taki K, Tammoja K, Tan SL, Tang S, Taylor MS, Tegner J, Teichmann SA, Ueda HR, van Nimwegen E, Verardo R, Wei CL, Yagi K, Yamanishi H, Zabarovsky E, Zhu S, Zimmer A, Hide W, Bult C, Grimmond SM, Teasdale RD, Liu ET, Brusic V, Quackenbush J, Wahlestedt C, Mattick JS, Hume DA, Kai C, Sasaki D, Tomaru Y, Fukuda S, Kanamori-Katayama M, Suzuki M, Aoki J, Arakawa T, Iida J, Imamura K, Itoh M, Kato T, Kawaji H, Kawagashira N, Kawashima T, Kojima M, Kondo S, Konno H, Nakano K, Ninomiya N, Nishio T, Okada M, Plessy C, Shibata K, Shiraki T, Suzuki S, Tagami M, Waki K, Watahiki A, Okamura-Oho Y, Suzuki H, Kawai J, Hayashizaki Y; FANTOM Consortium; RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group)., Science 309(5740), 2005
PMID: 16141072
Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution.
Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR., Science 308(5725), 2005
PMID: 15790807
RNA maps reveal new RNA classes and a possible function for pervasive transcription.
Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Willingham AT, Stadler PF, Hertel J, Hackermuller J, Hofacker IL, Bell I, Cheung E, Drenkow J, Dumais E, Patel S, Helt G, Ganesh M, Ghosh S, Piccolboni A, Sementchenko V, Tammana H, Gingeras TR., Science 316(5830), 2007
PMID: 17510325
Genome-wide analysis of mammalian promoter architecture and evolution.
Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CA, Taylor MS, Engstrom PG, Frith MC, Forrest AR, Alkema WB, Tan SL, Plessy C, Kodzius R, Ravasi T, Kasukawa T, Fukuda S, Kanamori-Katayama M, Kitazume Y, Kawaji H, Kai C, Nakamura M, Konno H, Nakano K, Mottagui-Tabar S, Arner P, Chesi A, Gustincich S, Persichetti F, Suzuki H, Grimmond SM, Wells CA, Orlando V, Wahlestedt C, Liu ET, Harbers M, Kawai J, Bajic VB, Hume DA, Hayashizaki Y., Nat. Genet. 38(6), 2006
PMID: 16645617
Toward a gold standard for promoter prediction evaluation.
Abeel T, Van de Peer Y, Saeys Y., Bioinformatics 25(12), 2009
PMID: 19478005
A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters.
Saxonov S, Berg P, Brutlag DL., Proc. Natl. Acad. Sci. U.S.A. 103(5), 2006
PMID: 16432200
Determining promoter location based on DNA structure first-principles calculations.
Goni JR, Perez A, Torrents D, Orozco M., Genome Biol. 8(12), 2007
PMID: 18072969
ARTS: accurate recognition of transcription starts in human.
Sonnenburg S, Zien A, Ratsch G., Bioinformatics 22(14), 2006
PMID: 16873509
High DNA melting temperature predicts transcription start site location in human and mouse
AUTHOR UNKNOWN, 2009
MetaProm: a neural network based meta-predictor for alternative human promoter prediction.
Wang J, Ungar LH, Tseng H, Hannenhalli S., BMC Genomics 8(), 2007
PMID: 17941982
Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selection
AUTHOR UNKNOWN, 2008
Classifier ensembles for protein structural class prediction with varying homology.
Kedarisetti KD, Kurgan L, Dick S., Biochem. Biophys. Res. Commun. 348(3), 2006
PMID: 16904630
Ensemble Based Systems in Decision Making
AUTHOR UNKNOWN, 2006
Is Combining Classifiers Better than Selecting the Best One
AUTHOR UNKNOWN, 2004
Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.
Bajic VB, Brent MR, Brown RH, Frankish A, Harrow J, Ohler U, Solovyev VV, Tan SL., Genome Biol. 7 Suppl 1(), 2006
PMID: 16925837
Developmental programming of CpG island methylation profiles in the human genome.
Straussman R, Nejman D, Roberts D, Steinfeld I, Blum B, Benvenisty N, Simon I, Yakhini Z, Cedar H., Nat. Struct. Mol. Biol. 16(5), 2009
PMID: 19377480
The effect of methylation on some biological parameters in Salmonella enterica serovar Typhimurium.
Aloui A, Tagourti J, El May A, Joseleau Petit D, Landoulsi A., Pathol. Biol. 59(4), 2009
PMID: 19477083
Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences.
King DC, Taylor J, Elnitski L, Chiaromonte F, Miller W, Hardison RC., Genome Res. 15(8), 2005
PMID: 16024817
Genome-wide analysis of core promoter elements from conserved human and mouse orthologous pairs.
Jin VX, Singer GA, Agosto-Perez FJ, Liyanarachchi S, Davuluri RV., BMC Bioinformatics 7(), 2006
PMID: 16522199
ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles.
Abeel T, Saeys Y, Rouze P, Van de Peer Y., Bioinformatics 24(13), 2008
PMID: 18586720
Generic eukaryotic core promoter prediction using structural features of DNA.
Abeel T, Saeys Y, Bonnet E, Rouze P, Van de Peer Y., Genome Res. 18(2), 2007
PMID: 18096745
Computational identification of promoters and first exons in the human genome.
Davuluri RV, Grosse I, Zhang MQ., Nat. Genet. 29(4), 2001
PMID: 11726928
Using multiple alignments to improve gene prediction.
Gross SS, Brent MR., J. Comput. Biol. 13(2), 2006
PMID: 16597247
Combining classifiers for improved classification of proteins from sequence or structure.
Melvin I, Weston J, Leslie CS, Noble WS., BMC Bioinformatics 9(), 2008
PMID: 18808707
The human genome browser at UCSC.
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D., Genome Res. 12(6), 2002
PMID: 12045153
DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs.
Suzuki Y, Yamashita R, Nakai K, Sugano S., Nucleic Acids Res. 30(1), 2002
PMID: 11752328
The WEKA data mining software: an update
AUTHOR UNKNOWN, 2009
LIBSVM: a library for support vector machines
AUTHOR UNKNOWN, 2001

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 21118509
PubMed | Europe PMC

Suchen in

Google Scholar