Shape based indexing for faster search of RNA family databases

Janssen S, Reeder J, Giegerich R (2008)
BMC Bioinformatics 9(1): 131.

OA 1471-2105-9-131.pdf
Journal Article | Original Article | Published | English
Background: Most non-coding RNA families exert their function by means of a conserved, common secondary structure. The Rfam data base contains more than five hundred structurally annotated RNA families. Unfortunately, searching for new family members using covariance models (CMs) is very time consuming. Filtering approaches that use the sequence conservation to reduce the number of CM searches, are fast, but it is unknown to which sacrifice. Results: We present a new filtering approach, which exploits the family specific secondary structure and significantly reduces the number of CM searches. The filter eliminates approximately 85% of the queries and discards only 2.6% true positives when evaluating Rfam against itself. First results also capture previously undetected non-coding RNAs in a recent human RNAz screen. Conclusion: The RNA shape index filter (RNAsifter) is based on the following rationale: An RNA family is characterised by structure, much more succinctly than by sequence content. Structures of individual family members, which naturally have different length and sequence composition, may exhibit structural variation in detail, but overall, they have a common shape in a more abstract sense. Given a fixed release of the Rfam data base, we can compute these abstract shapes for all families. This is called a shape index. If a query sequence belongs to a certain family, it must be able to fold into the family shape with reasonable free energy. Therefore, rather than matching the query against all families in the data base, we can first (and quickly) compute its feasible shape(s), and use the shape index to access only those families where a good match is possible due to a common shape with the query.
Publishing Year

Cite this

Janssen S, Reeder J, Giegerich R. Shape based indexing for faster search of RNA family databases. BMC Bioinformatics. 2008;9(1):131.
Janssen, S., Reeder, J., & Giegerich, R. (2008). Shape based indexing for faster search of RNA family databases. BMC Bioinformatics, 9(1), 131. doi:10.1186/1471-2105-9-131
Janssen, S., Reeder, J., and Giegerich, R. (2008). Shape based indexing for faster search of RNA family databases. BMC Bioinformatics 9, 131.
Janssen, S., Reeder, J., & Giegerich, R., 2008. Shape based indexing for faster search of RNA family databases. BMC Bioinformatics, 9(1), p 131.
S. Janssen, J. Reeder, and R. Giegerich, “Shape based indexing for faster search of RNA family databases”, BMC Bioinformatics, vol. 9, 2008, pp. 131.
Janssen, S., Reeder, J., Giegerich, R.: Shape based indexing for faster search of RNA family databases. BMC Bioinformatics. 9, 131 (2008).
Janssen, Stefan, Reeder, Jens, and Giegerich, Robert. “Shape based indexing for faster search of RNA family databases”. BMC Bioinformatics 9.1 (2008): 131.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
File Name
Access Level
OA Open Access
Access Level
OA Open Access

This data publication is cited in the following publications:
This publication cites the following data publications:

14 Citations in Europe PMC

Data provided by Europe PubMed Central.

Exploring Consensus RNA Substructural Patterns Using Subgraph Mining.
Chen Q, Lan C, Chen B, Wang L, Li J, Zhang C., IEEE/ACM Trans Comput Biol Bioinform 14(5), 2017
PMID: 28026781
Accurate Classification of RNA Structures Using Topological Fingerprints.
Huang J, Li K, Gribskov M., PLoS One 11(10), 2016
PMID: 27755571
Chaining sequence/structure seeds for computing RNA similarity.
Bourgeade L, Chauve C, Allali J., J Comput Biol 22(3), 2015
PMID: 25768236
A Machine Learning Approach for Accurate Annotation of Noncoding RNAs.
Song Y, Liu C, Wang Z., IEEE/ACM Trans Comput Biol Bioinform 12(3), 2015
PMID: 26357266
Shape and secondary structure prediction for ncRNAs including pseudoknots based on linear SVM.
Achawanantakun R, Sun Y., BMC Bioinformatics 14 Suppl 2(), 2013
PMID: 23369147
Interval-based distance function for identifying RNA structure candidates.
Chen Q, Li G, Phoebe Chen YP., J Theor Biol 269(1), 2011
PMID: 21056578
Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction.
Janssen S, Schudoma C, Steger G, Giegerich R., BMC Bioinformatics 12(), 2011
PMID: 22051375
Sequence-structure relationships in RNA loops: establishing the basis for loop homology modeling.
Schudoma C, May P, Nikiforova V, Walther D., Nucleic Acids Res 38(3), 2010
PMID: 19923230
De novo prediction of structured RNAs from genomic sequences.
Gorodkin J, Hofacker IL, Torarinsson E, Yao Z, Havgaard JH, Ruzzo WL., Trends Biotechnol 28(1), 2010
PMID: 19942311
Faster computation of exact RNA shape probabilities.
Janssen S, Giegerich R., Bioinformatics 26(5), 2010
PMID: 20080511
A genome-wide survey of sRNAs in the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti.
Schlüter JP, Reinkensmeier J, Daschkey S, Evguenieva-Hackenberg E, Janssen S, Jänicke S, Becker JD, Giegerich R, Becker A., BMC Genomics 11(), 2010
PMID: 20398411
Identification and classification of ncRNA molecules using graph properties.
Childs L, Nikoloski Z, May P, Walther D., Nucleic Acids Res 37(9), 2009
PMID: 19339518
Regulatory RNAs in prokaryotes: here, there and everywhere.
Narberhaus F, Vogel J., Mol Microbiol 74(2), 2009
PMID: 19732342

19 References

Data provided by Europe PubMed Central.

Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome.
Washietl S, Hofacker IL, Lukasser M, Huttenhofer A, Stadler PF., Nat. Biotechnol. 23(11), 2005
PMID: 16273071
Identification and classification of conserved RNA secondary structures in the human genome.
Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D., PLoS Comput. Biol. 2(4), 2006
PMID: 16628248
Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure.
Torarinsson E, Sawera M, Havgaard JH, Fredholm M, Gorodkin J., Genome Res. 16(7), 2006
PMID: 16751343
Prediction of structured non-coding RNAs in the genomes of the nematodes Caenorhabditis elegans and Caenorhabditis briggsae.
Missal K, Zhu X, Rose D, Deng W, Skogerbo G, Chen R, Stadler PF., J. Exp. Zool. B Mol. Dev. Evol. 306(4), 2006
PMID: 16425273
Rfam: annotating non-coding RNAs in complete genomes.
Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A., Nucleic Acids Res. 33(Database issue), 2005
PMID: 15608160
RNA sequence analysis using covariance models.
Eddy SR, Durbin R., Nucleic Acids Res. 22(11), 1994
PMID: 8029015
Query-dependent banding (QDB) for faster RNA similarity searches.
Nawrocki EP, Eddy SR., PLoS Comput. Biol. 3(3), 2007
PMID: 17397253
Rfam: an RNA family database.
Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR., Nucleic Acids Res. 31(1), 2003
PMID: 12520045
Sequence-based heuristics for faster annotation of non-coding RNA families.
Weinberg Z, Ruzzo WL., Bioinformatics 22(1), 2006
PMID: 16267089
Abstract shapes of RNA.
Giegerich R, Voss B, Rehmsmeier M., Nucleic Acids Res. 32(16), 2004
PMID: 15371549
Complete probabilistic analysis of RNA shapes.
Voss B, Giegerich R, Rehmsmeier M., BMC Biol. 4(), 2006
PMID: 16480488
Fast Folding and Comparison of RNA Secondary Structures
Hofacker IL, Fontana W, Stadler PF, Bonhoeffer S, Tacker M, Schuster P., 1994
Secondary structure prediction for aligned RNA sequences.
Hofacker IL, Fekete M, Stadler PF., J. Mol. Biol. 319(5), 2002
PMID: 12079347
Structural analysis of aligned RNAs.
Voss B., Nucleic Acids Res. 34(19), 2006
PMID: 17020924
pknotsRG: RNA pseudoknot folding including near-optimal structures and sliding windows.
Reeder J, Steffen P, Giegerich R., Nucleic Acids Res. 35(Web Server issue), 2007
PMID: 17478505


0 Marked Publications

Open Data PUB

Web of Science

View record in Web of Science®


PMID: 18312625
PubMed | Europe PMC

Search this title in

Google Scholar