IsoSVM - Distinguishing isoforms and paralogs on the protein level

Spitzer M, Lorkowski S, Cullen P, Sczyrba A, Fuellen G (2006)
BMC Bioinformatics 7(1): 110.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
OA
Autor/in
; ; ; ;
Abstract / Bemerkung
Background: Recent progress in cDNA and EST sequencing is yielding a deluge of sequence data. Like database search results and proteome databases, this data gives rise to inferred protein sequences without ready access to the underlying genomic data. Analysis of this information ( e. g. for EST clustering or phylogenetic reconstruction from proteome data) is hampered because it is not known if two protein sequences are isoforms ( splice variants) or not (i.e. paralogs/orthologs). However, even without knowing the intron/exon structure, visual analysis of the pattern of similarity across the alignment of the two protein sequences is usually helpful since paralogs and orthologs feature substitutions with respect to each other, as opposed to isoforms, which do not. Results: The IsoSVM tool introduces an automated approach to identifying isoforms on the protein level using a support vector machine (SVM) classifier. Based on three specific features used as input of the SVM classifier, it is possible to automatically identify isoforms with little effort and with an accuracy of more than 97%. We show that the SVM is superior to a radial basis function network and to a linear classifier. As an example application we use IsoSVM to estimate that a set of Xenopus laevis EST clusters consists of approximately 81% cases where sequences are each other's paralogs and 19% cases where sequences are each other's isoforms. The number of isoforms and paralogs in this allotetraploid species is of interest in the study of evolution. Conclusion: We developed an SVM classifier that can be used to distinguish isoforms from paralogs with high accuracy and without access to the genomic data. It can be used to analyze, for example, EST data and database search results. Our software is freely available on the Web, under the name IsoSVM.
Erscheinungsjahr
2006
Zeitschriftentitel
BMC Bioinformatics
Band
7
Ausgabe
1
Seite(n)
110
ISSN
1471-2105
Page URI
https://pub.uni-bielefeld.de/record/1599756

Zitieren

Spitzer M, Lorkowski S, Cullen P, Sczyrba A, Fuellen G. IsoSVM - Distinguishing isoforms and paralogs on the protein level. BMC Bioinformatics. 2006;7(1):110.
Spitzer, M., Lorkowski, S., Cullen, P., Sczyrba, A., & Fuellen, G. (2006). IsoSVM - Distinguishing isoforms and paralogs on the protein level. BMC Bioinformatics, 7(1), 110. doi:10.1186/1471-2105-7-110
Spitzer, M., Lorkowski, S., Cullen, P., Sczyrba, A., and Fuellen, G. (2006). IsoSVM - Distinguishing isoforms and paralogs on the protein level. BMC Bioinformatics 7, 110.
Spitzer, M., et al., 2006. IsoSVM - Distinguishing isoforms and paralogs on the protein level. BMC Bioinformatics, 7(1), p 110.
M. Spitzer, et al., “IsoSVM - Distinguishing isoforms and paralogs on the protein level”, BMC Bioinformatics, vol. 7, 2006, pp. 110.
Spitzer, M., Lorkowski, S., Cullen, P., Sczyrba, A., Fuellen, G.: IsoSVM - Distinguishing isoforms and paralogs on the protein level. BMC Bioinformatics. 7, 110 (2006).
Spitzer, M, Lorkowski, S, Cullen, P, Sczyrba, Alexander, and Fuellen, Georg. “IsoSVM - Distinguishing isoforms and paralogs on the protein level”. BMC Bioinformatics 7.1 (2006): 110.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T08:48:02Z
MD5 Prüfsumme
099f888119699839585e94a45cb72b54

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 16519805
PubMed | Europe PMC

Suchen in

Google Scholar