RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations

Hartung M, Zwick M (2014)
Bielefeld University.

Datenpublikation | Englisch
 
Download
OA
Creator
;
Abstract / Bemerkung
We release the RAMBO 800+ corpus providing manual annotations for Rare and AMBiguOus abbreviations of gene names in about 800 MEDLINE abstracts. It can be used to train gene recognition systems for this class of abbreviations, as discussed in Hartung et al. (BioNLP 2014). The corpus covers eight gene name abbreviation types: AHR, CLI, CLU, COPD, HF, MOX, PLS, SAH. For each of these types, 100 (in case of MOX: 81) abstracts have been randomly sampled from MEDLINE. In each of these abstracts, every mention of an abbreviation of interest has been manually annotated as denoting a gene/protein or not. Plus, all other tokens in the 800 abstracts have been annotated in the same way.
Stichworte
gene/protein recognition named entity recognition gene/protein name abbreviations natural language processing machine learning life sciences
Erscheinungsjahr
2014
Page URI
https://pub.uni-bielefeld.de/record/2673424

Zitieren

Hartung M, Zwick M. RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University; 2014.
Hartung, M., & Zwick, M. (2014). RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University. doi:10.4119/unibi/2673424
Hartung, M., and Zwick, M. (2014). RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University.
Hartung, M., & Zwick, M., 2014. RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations, Bielefeld University.
M. Hartung and M. Zwick, RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations, Bielefeld University, 2014.
Hartung, M., Zwick, M.: RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University (2014).
Hartung, Matthias, and Zwick, Matthias. RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University, 2014.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-25T06:36:31Z
MD5 Prüfsumme
6a7ff5be1c40701c62b3caf1d41a7a1c

Material in PUB:
Wird zitiert von
Towards Gene Recognition from Rare and Ambiguous Abbreviations using a Filtering Approach
Hartung M, Klinger R, Zwick M, Cimiano P (2014)
Presented at the BioNLP Workshop. The 52nd Annual Meeting of the Association for Computational Linguistics (ACL), Baltimore.

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar