IsoSVM - Distinguishing isoforms and paralogs on the protein level

Spitzer M, Lorkowski S, Cullen P, Sczyrba A, Fuellen G (2006)
BMC Bioinformatics 7(1): 110.

Download
OA
Zeitschriftenaufsatz | Veröffentlicht | Englisch
Volltext vorhanden für diesen Nachweis
Autor
; ; ; ;
Abstract / Bemerkung
Background: Recent progress in cDNA and EST sequencing is yielding a deluge of sequence data. Like database search results and proteome databases, this data gives rise to inferred protein sequences without ready access to the underlying genomic data. Analysis of this information ( e. g. for EST clustering or phylogenetic reconstruction from proteome data) is hampered because it is not known if two protein sequences are isoforms ( splice variants) or not (i.e. paralogs/orthologs). However, even without knowing the intron/exon structure, visual analysis of the pattern of similarity across the alignment of the two protein sequences is usually helpful since paralogs and orthologs feature substitutions with respect to each other, as opposed to isoforms, which do not. Results: The IsoSVM tool introduces an automated approach to identifying isoforms on the protein level using a support vector machine (SVM) classifier. Based on three specific features used as input of the SVM classifier, it is possible to automatically identify isoforms with little effort and with an accuracy of more than 97%. We show that the SVM is superior to a radial basis function network and to a linear classifier. As an example application we use IsoSVM to estimate that a set of Xenopus laevis EST clusters consists of approximately 81% cases where sequences are each other's paralogs and 19% cases where sequences are each other's isoforms. The number of isoforms and paralogs in this allotetraploid species is of interest in the study of evolution. Conclusion: We developed an SVM classifier that can be used to distinguish isoforms from paralogs with high accuracy and without access to the genomic data. It can be used to analyze, for example, EST data and database search results. Our software is freely available on the Web, under the name IsoSVM.
Erscheinungsjahr
Zeitschriftentitel
BMC Bioinformatics
Band
7
Ausgabe
1
Seite(n)
110
ISSN
PUB-ID

Zitieren

Spitzer M, Lorkowski S, Cullen P, Sczyrba A, Fuellen G. IsoSVM - Distinguishing isoforms and paralogs on the protein level. BMC Bioinformatics. 2006;7(1):110.
Spitzer, M., Lorkowski, S., Cullen, P., Sczyrba, A., & Fuellen, G. (2006). IsoSVM - Distinguishing isoforms and paralogs on the protein level. BMC Bioinformatics, 7(1), 110. doi:10.1186/1471-2105-7-110
Spitzer, M., Lorkowski, S., Cullen, P., Sczyrba, A., and Fuellen, G. (2006). IsoSVM - Distinguishing isoforms and paralogs on the protein level. BMC Bioinformatics 7, 110.
Spitzer, M., et al., 2006. IsoSVM - Distinguishing isoforms and paralogs on the protein level. BMC Bioinformatics, 7(1), p 110.
M. Spitzer, et al., “IsoSVM - Distinguishing isoforms and paralogs on the protein level”, BMC Bioinformatics, vol. 7, 2006, pp. 110.
Spitzer, M., Lorkowski, S., Cullen, P., Sczyrba, A., Fuellen, G.: IsoSVM - Distinguishing isoforms and paralogs on the protein level. BMC Bioinformatics. 7, 110 (2006).
Spitzer, M, Lorkowski, S, Cullen, P, Sczyrba, Alexander, and Fuellen, Georg. “IsoSVM - Distinguishing isoforms and paralogs on the protein level”. BMC Bioinformatics 7.1 (2006): 110.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
1970-01-01T00:00:00Z

5 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

The p53 gene with emphasis on its paralogues in mosquitoes.
Chen TH, Wu YJ, Hou JN, Chiu CH, Chen WJ., J Microbiol Immunol Infect 50(6), 2017
PMID: 28690024
Ancient dynamin segments capture early stages of host-mitochondrial integration.
Purkanti R, Thattai M., Proc Natl Acad Sci U S A 112(9), 2015
PMID: 25691734

37 References

Daten bereitgestellt von Europe PubMed Central.


Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P., 2000
Listening to silence and understanding nonsense: exonic mutations that affect splicing.
Cartegni L, Chew SL, Krainer AR., Nat. Rev. Genet. 3(4), 2002
PMID: 11967553
Alternative RNA splicing in the nervous system.
Grabowski PJ, Black DL., Prog. Neurobiol. 65(3), 2001
PMID: 11473790
Distinguishing homologous from analogous proteins.
Fitch WM., Syst. Zool. 19(2), 1970
PMID: 5449325
ASAP: the Alternative Splicing Annotation Project.
Lee C, Atanelov L, Modrek B, Xing Y., Nucleic Acids Res. 31(1), 2003
PMID: 12519958
EASED: Extended Alternatively Spliced EST Database.
Pospisil H, Herrmann A, Bortfeldt RH, Reich JG., Nucleic Acids Res. 32(Database issue), 2004
PMID: 14681361
ASD: the Alternative Splicing Database.
Thanaraj TA, Stamm S, Clark F, Riethoven JJ, Le Texier V, Muilu J., Nucleic Acids Res. 32(Database issue), 2004
PMID: 14681360
A training algorithm for optimal margin classifiers
Boser BE, Guyon IM, Vapnik VN., 1992
Support vector networks
Cortes C, Vapnik V., 1995

Schölkopf B, Smola AJ., 2002
An Introduction to Kernel-based Learning Algorithms
Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B., 2001
Support vector machine applications in bioinformatics.
Byvatov E, Schneider G., Appl. Bioinformatics 2(2), 2003
PMID: 15130823
Sequence information for the splicing of human pre-mRNA identified by support vector machine classification.
Zhang XH, Heller KA, Hefter I, Leslie CS, Chasin LA., Genome Res. 13(12), 2003
PMID: 14656968
Mismatch string kernels for discriminative protein classification.
Leslie CS, Eskin E, Cohen A, Weston J, Noble WS., Bioinformatics 20(4), 2004
PMID: 14990442
Accurate identification of alternatively spliced exons using support vector machine.
Dror G, Sorek R, Shamir R., Bioinformatics 21(7), 2004
PMID: 15531599
Making large-Scale SVM Learning Practical
Joachims T., 1999
BLASTing proteomes, yielding phylogenies.
Fuellen G, Spitzer M, Cullen P, Lorkowski S., In Silico Biol. (Gedrukt) 3(3), 2003
PMID: 12954093
Fast learning in networks of locally-tuned processing units
Moody J, Darken CJ., 1989
Database resources of the National Center for Biotechnology Information.
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Church DM, DiCuccio M, Edgar R, Federhen S, Helmberg W, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR, Ostell J, Pontius JU, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST, Sirotkin K, Starchenko G, Suzek TO, Tatusov R, Tatusova TA, Wagner L, Yaschenko E., Nucleic Acids Res. 33(Database issue), 2005
PMID: 15608222
XenDB: full length cDNA prediction and cross species mapping in Xenopus laevis.
Sczyrba A, Beckstette M, Brivanlou AH, Giegerich R, Altmann CR., BMC Genomics 6(), 2005
PMID: 16162280
Replacing Suffix Trees with Enhanced Suffix Arrays
Abouelhoda MI, Kurtz S, Ohlebusch E., 2004
Vmatch
AUTHOR UNKNOWN, 0
CAP3: A DNA sequence assembly program.
Huang X, Madan A., Genome Res. 9(9), 1999
PMID: 10508846
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ., Nucleic Acids Res. 25(17), 1997
PMID: 9254694
The human ATP-binding cassette (ABC) transporter superfamily.
Dean M, Rzhetsky A, Allikmets R., Genome Res. 11(7), 2001
PMID: 11435397
IsoSVM
AUTHOR UNKNOWN, 0
A practical guide to support vector classification
Hsu CW, Chang CC, Lin CJ., 0
Neural Network FAQ
Sarle WS., 1997
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
Katoh K, Misawa K, Kuma K, Miyata T., Nucleic Acids Res. 30(14), 2002
PMID: 12136088
The rapid generation of mutation data matrices from protein sequences.
Jones DT, Taylor WR, Thornton JM., Comput. Appl. Biosci. 8(3), 1992
PMID: 1633570
A Gentle Guide to Multiple Alignment
Fuellen G., 1997
MView: a web-compatible database search or multiple alignment viewer.
Brown NP, Leroy C, Sander C., Bioinformatics 14(4), 1998
PMID: 9632837
Soft Margins for AdaBoost
Rätsch G, Onoda T, Müller K., 2001
A leisurely look at the bootstrap, the jackknife, and cross-validation
Efron B, Gong G., 1983

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 16519805
PubMed | Europe PMC

Suchen in

Google Scholar