Genome analysis based on EST collections : a clustering pipeline and a database on Xenopus laevis

Sczyrba A (2006)
Bielefeld (Germany): Bielefeld University.

Bielefelder E-Dissertation | Englisch
 
Download
OA
Autor*in
Sczyrba, Alexander
Gutachter*in / Betreuer*in
Giegerich, Robert (Prof. Dr.)
Abstract / Bemerkung
Collections of Expressed Sequence Tags (ESTs) provide the most extensive available survey of the transcriptome of an organism and with it evidence for the existence of genes. They are indispensable for gene discovery, gene structure prediction, and genomic mapping. The price of the low-cost high-throughput data is that ESTs contain high error rates and are not very well annotated. The low quality sequence data can be improved by several processing steps and by clustering into gene-oriented clusters, which again can be assembled to contig sequences for further analyses. The first part of this thesis describes an EST clustering pipeline that makes use of enhanced suffix arrays as implemented in the software tool Vmatch. Enhanced suffix arrays are a data structure that has been shown to be as powerful as suffix trees, with the advantage of a reduced space requirement and reduced processing time. Further on, enhanced suffix arrays have been shown to be superior to other matching tools for a variety of applications. We will validate the clustering results based on a "gold-standard" EST data set of Arabidopsis thaliana and compare the result to other widely used clustering tools. The implemented clustering pipeline takes advantage of the underlying database and enables unique batch functionality of mapping results from other organisms to the species of interest. For some species, EST projects provide the only information about their gene content. One of these species is the African clawed frog Xenopus laevis. Research using this model system has provided critical insights into the mechanisms of early vertebrate development and cell biology. Despite of the interest in this model organism, no genome project is currently ongoing, and EST and cDNA sequences are the only resource available. To further improve Xenopus as a non-mammalian model system, one of the goals of highest priority is the generation of ESTs and full length cDNA collections, as they facilitate functional assays, one of the particular strengths of Xenopus. We have applied the EST clustering pipeline described in the first part of this thesis to Xenopus laevis ESTs, both to identify full length protein encoding sequences and full length cDNA clones. The unique database system XenDB supports comparative approaches between Xenopus laevis and other model systems, and enables the retrieval of their potential full length clones.
Stichworte
Glatter Krallenfrosch , Cluster-Analyse , Bioinformatik , , EST clustering , Suffix arrays , Full length clone prediction
Jahr
2006
Page URI
https://pub.uni-bielefeld.de/record/2303538

Zitieren

Sczyrba A. Genome analysis based on EST collections : a clustering pipeline and a database on Xenopus laevis. Bielefeld (Germany): Bielefeld University; 2006.
Sczyrba, A. (2006). Genome analysis based on EST collections : a clustering pipeline and a database on Xenopus laevis. Bielefeld (Germany): Bielefeld University.
Sczyrba, Alexander. 2006. Genome analysis based on EST collections : a clustering pipeline and a database on Xenopus laevis. Bielefeld (Germany): Bielefeld University.
Sczyrba, A. (2006). Genome analysis based on EST collections : a clustering pipeline and a database on Xenopus laevis. Bielefeld (Germany): Bielefeld University.
Sczyrba, A., 2006. Genome analysis based on EST collections : a clustering pipeline and a database on Xenopus laevis, Bielefeld (Germany): Bielefeld University.
A. Sczyrba, Genome analysis based on EST collections : a clustering pipeline and a database on Xenopus laevis, Bielefeld (Germany): Bielefeld University, 2006.
Sczyrba, A.: Genome analysis based on EST collections : a clustering pipeline and a database on Xenopus laevis. Bielefeld University, Bielefeld (Germany) (2006).
Sczyrba, Alexander. Genome analysis based on EST collections : a clustering pipeline and a database on Xenopus laevis. Bielefeld (Germany): Bielefeld University, 2006.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T08:57:43Z
MD5 Prüfsumme
70f784c3bfa9787d7b16c209af15c0a4


Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar