RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations

Hartung M, Zwick M (2014)
Bielefeld University.

Download
OA
Research Data
Creator
;
Abstract
We release the RAMBO 800+ corpus providing manual annotations for Rare and AMBiguOus abbreviations of gene names in about 800 MEDLINE abstracts. It can be used to train gene recognition systems for this class of abbreviations, as discussed in Hartung et al. (BioNLP 2014). The corpus covers eight gene name abbreviation types: AHR, CLI, CLU, COPD, HF, MOX, PLS, SAH. For each of these types, 100 (in case of MOX: 81) abstracts have been randomly sampled from MEDLINE. In each of these abstracts, every mention of an abbreviation of interest has been manually annotated as denoting a gene/protein or not. Plus, all other tokens in the 800 abstracts have been annotated in the same way.
Publishing Year
Data Re-Use License
This RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations is made available under the Open Data Commons Attribution License: http://opendatacommons.org/licenses/by/1.0
PUB-ID

Cite this

Hartung M, Zwick M. RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University; 2014.
Hartung, M., & Zwick, M. (2014). RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University.
Hartung, M., and Zwick, M. (2014). RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University.
Hartung, M., & Zwick, M., 2014. RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations, Bielefeld University.
M. Hartung and M. Zwick, RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations, Bielefeld University, 2014.
Hartung, M., Zwick, M.: RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University (2014).
Hartung, Matthias, and Zwick, Matthias. RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University, 2014.
All files available under the following license(s):
Main File(s)
Access Level
OA Open Access
Last Uploaded
2014-07-31 16:50:28

This data publication is cited in the following publications:
2673430
Towards Gene Recognition from Rare and Ambiguous Abbreviations using a Filtering Approach
Hartung M, Klinger R, Zwick M, Cimiano P (2014)
Presented at the BioNLP Workshop. The 52nd Annual Meeting of the Association for Computational Linguistics (ACL), Baltimore.
This publication cites the following data publications:

Export

0 Marked Publications

Open Data PUB

Search this title in

Google Scholar