Construction of a Public CHO Cell Line Transcript Database Using Versatile Bioinformatics Analysis Pipelines

Rupp O, Becker J, Brinkrolf K, Timmermann C, Borth N, Pühler A, Noll T, Goesmann A (2014)
PLoS ONE 9(1).

Journal Article | Published | English

No fulltext has been uploaded

Abstract
Chinese hamster ovary (CHO) cell lines represent the most commonly used mammalian expression system for the production of therapeutic proteins. In this context, detailed knowledge of the CHO cell transcriptome might help to improve biotechnological processes conducted by specific cell lines. Nevertheless, very few assembled cDNA sequences of CHO cells were publicly released until recently, which puts a severe limitation on biotechnological research. Two extended annotation systems and web-based tools, one for browsing eukaryotic genomes (GenDBE) and one for viewing eukaryotic transcriptomes (SAMS), were established as the first step towards a publicly usable CHO cell genome/transcriptome analysis platform. This is complemented by the development of a new strategy to assemble the ca. 100 million reads, sequenced from a broad range of diverse transcripts, to a high quality CHO cell transcript set. The cDNA libraries were constructed from different CHO cell lines grown under various culture conditions and sequenced using Roche/454 and Illumina sequencing technologies in addition to sequencing reads from a previous study. Two pipelines to extend and improve the CHO cell line transcripts were established. First, de novo assemblies were carried out with the Trinity and Oases assemblers, using varying k-mer sizes. The resulting contigs were screened for potential CDS using ESTScan. Redundant contigs were filtered out using cd-hit-est. The remaining CDS contigs were re-assembled with CAP3. Second, a reference-based assembly with the TopHat/Cufflinks pipeline was performed, using the recently published draft genome sequence of CHO-K1 as reference. Additionally, the de novo contigs were mapped to the reference genome using GMAP and merged with the Cufflinks assembly using the cuffmerge software. With this approach 28,874 transcripts located on 16,492 gene loci could be assembled. Combining the results of both approaches, 65,561 transcripts were identified for CHO cell lines, which could be clustered by sequence identity into 17,598 gene clusters.
Publishing Year
ISSN
eISSN
PUB-ID

Cite this

Rupp O, Becker J, Brinkrolf K, et al. Construction of a Public CHO Cell Line Transcript Database Using Versatile Bioinformatics Analysis Pipelines. PLoS ONE. 2014;9(1).
Rupp, O., Becker, J., Brinkrolf, K., Timmermann, C., Borth, N., Pühler, A., Noll, T., et al. (2014). Construction of a Public CHO Cell Line Transcript Database Using Versatile Bioinformatics Analysis Pipelines. PLoS ONE, 9(1).
Rupp, O., Becker, J., Brinkrolf, K., Timmermann, C., Borth, N., Pühler, A., Noll, T., and Goesmann, A. (2014). Construction of a Public CHO Cell Line Transcript Database Using Versatile Bioinformatics Analysis Pipelines. PLoS ONE 9.
Rupp, O., et al., 2014. Construction of a Public CHO Cell Line Transcript Database Using Versatile Bioinformatics Analysis Pipelines. PLoS ONE, 9(1).
O. Rupp, et al., “Construction of a Public CHO Cell Line Transcript Database Using Versatile Bioinformatics Analysis Pipelines”, PLoS ONE, vol. 9, 2014.
Rupp, O., Becker, J., Brinkrolf, K., Timmermann, C., Borth, N., Pühler, A., Noll, T., Goesmann, A.: Construction of a Public CHO Cell Line Transcript Database Using Versatile Bioinformatics Analysis Pipelines. PLoS ONE. 9, (2014).
Rupp, Oliver, Becker, Jennifer, Brinkrolf, Karina, Timmermann, Christina, Borth, Nicole, Pühler, Alfred, Noll, Thomas, and Goesmann, Alexander. “Construction of a Public CHO Cell Line Transcript Database Using Versatile Bioinformatics Analysis Pipelines”. PLoS ONE 9.1 (2014).
This data publication is cited in the following publications:
This publication cites the following data publications:

9 Citations in Europe PMC

Data provided by Europe PubMed Central.

Effect of Temperature Downshift on the Transcriptomic Responses of Chinese Hamster Ovary Cells Using Recombinant Human Tissue Plasminogen Activator Production Culture.
Bedoya-Lopez A, Estrada K, Sanchez-Flores A, Ramirez OT, Altamirano C, Segovia L, Miranda-Rios J, Trujillo-Roldan MA, Valdez-Cruz NA., PLoS ONE 11(3), 2016
PMID: 26991106
Towards next generation CHO cell biology: Bioinformatics methods for RNA-Seq-based expression profiling.
Monger C, Kelly PS, Gallagher C, Clynes M, Barron N, Clarke C., Biotechnol J 10(7), 2015
PMID: 26058739
The DNA methylation landscape of Chinese hamster ovary (CHO) DP-12 cells.
Wippermann A, Rupp O, Brinkrolf K, Hoffrogge R, Noll T., J. Biotechnol. 199(), 2015
PMID: 25701679
Engineering the supply chain for protein production/secretion in yeasts and mammalian cells.
Klein T, Niklas J, Heinzle E., J. Ind. Microbiol. Biotechnol. 42(3), 2015
PMID: 25561318
Global insights into the Chinese hamster and CHO cell transcriptomes.
Vishwanathan N, Yongky A, Johnson KC, Fu HY, Jacob NM, Le H, Yusufi FN, Lee DY, Hu WS., Biotechnol. Bioeng. 112(5), 2015
PMID: 25450749
Cross-species transcriptomic approach reveals genes in hamster implantation sites.
Lei W, Herington J, Galindo CL, Ding T, Brown N, Reese J, Paria BC., Reproduction 148(6), 2014
PMID: 25252651
Discovery of transcription start sites in the Chinese hamster genome by next-generation RNA sequencing.
Jakobi T, Brinkrolf K, Tauch A, Noll T, Stoye J, Puhler A, Goesmann A., J. Biotechnol. 190(), 2014
PMID: 25086342
Advancing biopharmaceutical process science through transcriptome analysis.
Vishwanathan N, Le H, Le T, Hu WS., Curr. Opin. Biotechnol. 30(), 2014
PMID: 25014889

49 References

Data provided by Europe PubMed Central.


AUTHOR UNKNOWN, 0

AUTHOR UNKNOWN, 0
The Sequence Analysis and Management System -- SAMS-2.0: data management and sequence analysis adapted to changing requirements from traditional sanger sequencing to ultrafast sequencing technologies.
Bekel T, Henckel K, Kuster H, Meyer F, Mittard Runte V, Neuweger H, Paarmann D, Rupp O, Zakrzewski M, Puhler A, Stoye J, Goesmann A., J. Biotechnol. 140(1-2), 2009
PMID: 19297685
POAVIZ: a Partial order multiple sequence alignment visualizer.
Grasso C, Quist M, Ke K, Lee C., Bioinformatics 19(11), 2003
PMID: 12874062

AUTHOR UNKNOWN, 0
TopHat: discovering splice junctions with RNA-Seq.
Trapnell C, Pachter L, Salzberg SL., Bioinformatics 25(9), 2009
PMID: 19289445
Full-length transcriptome assembly from RNA-Seq data without a reference genome.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A., Nat. Biotechnol. 29(7), 2011
PMID: 21572440
Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels.
Schulz MH, Zerbino DR, Vingron M, Birney E., Bioinformatics 28(8), 2012
PMID: 22368243
GMAP: a genomic mapping and alignment program for mRNA and EST sequences.
Wu TD, Watanabe CK., Bioinformatics 21(9), 2005
PMID: 15728110

AUTHOR UNKNOWN, 0
CAP3: A DNA sequence assembly program.
Huang X, Madan A., Genome Res. 9(9), 1999
PMID: 10508846
BLAT--the BLAST-like alignment tool.
Kent WJ., Genome Res. 12(4), 2002
PMID: 11932250
Biological evaluation of d2, an algorithm for high-performance sequence comparison.
Hide W, Burke J, Davison DB., J. Comput. Biol. 1(3), 1994
PMID: 8790465
RAPYD--rapid annotation platform for yeast data.
Schneider J, Blom J, Jaenicke S, Linke B, Brinkrolf K, Neuweger H, Tauch A, Goesmann A., J. Biotechnol. 155(1), 2011
PMID: 21040748
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ., Nucleic Acids Res. 25(17), 1997
PMID: 9254694

AUTHOR UNKNOWN, 0
KEGG: Kyoto Encyclopedia of Genes and Genomes.
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M., Nucleic Acids Res. 27(1), 1999
PMID: 9847135
The COG database: an updated version includes eukaryotes.
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA., BMC Bioinformatics 4(), 2003
PMID: 12969510
eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges.
Powell S, Szklarczyk D, Trachana K, Roth A, Kuhn M, Muller J, Arnold R, Rattei T, Letunic I, Doerks T, Jensen LJ, von Mering C, Bork P., Nucleic Acids Res. 40(Database issue), 2012
PMID: 22096231
The Pfam protein families database.
Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR, Bateman A., Nucleic Acids Res. 38(Database issue), 2010
PMID: 19920124

Export

0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®

Sources

PMID: 24427317
PubMed | Europe PMC

Search this title in

Google Scholar