RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations
Hartung M, Zwick M (2014)
Bielefeld University.
Datenpublikation | Englisch
Download
Creator
Hartung, MatthiasUniBi ;
Zwick, Matthias
Einrichtung
Abstract / Bemerkung
We release the RAMBO 800+ corpus providing manual annotations for Rare and AMBiguOus abbreviations of gene names in about 800 MEDLINE abstracts. It can be used to train gene recognition systems for this class of abbreviations, as discussed in Hartung et al. (BioNLP 2014). The corpus covers eight gene name abbreviation types: AHR, CLI, CLU, COPD, HF, MOX, PLS, SAH. For each of these types, 100 (in case of MOX: 81) abstracts have been randomly sampled from MEDLINE. In each of these abstracts, every mention of an abbreviation of interest has been manually annotated as denoting a gene/protein or not. Plus, all other tokens in the 800 abstracts have been annotated in the same way.
Stichworte
gene/protein recognition
named entity recognition
gene/protein name abbreviations
natural language processing
machine learning
life sciences
Erscheinungsjahr
2014
Copyright und Lizenzen
Page URI
https://pub.uni-bielefeld.de/record/2673424
Zitieren
Hartung M, Zwick M. RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University; 2014.
Hartung, M., & Zwick, M. (2014). RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University. doi:10.4119/unibi/2673424
Hartung, Matthias, and Zwick, Matthias. 2014. RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University.
Hartung, M., and Zwick, M. (2014). RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University.
Hartung, M., & Zwick, M., 2014. RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations, Bielefeld University.
M. Hartung and M. Zwick, RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations, Bielefeld University, 2014.
Hartung, M., Zwick, M.: RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University (2014).
Hartung, Matthias, and Zwick, Matthias. RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations. Bielefeld University, 2014.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Open Data Commons Attribution License (ODC-By) v1.0:
Volltext(e)
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-25T06:36:31Z
MD5 Prüfsumme
6a7ff5be1c40701c62b3caf1d41a7a1c
Material in PUB:
Wird zitiert von
Towards Gene Recognition from Rare and Ambiguous Abbreviations using a Filtering Approach
Hartung M, Klinger R, Zwick M, Cimiano P (2014)
Presented at the BioNLP Workshop. The 52nd Annual Meeting of the Association for Computational Linguistics (ACL), Baltimore.
Hartung M, Klinger R, Zwick M, Cimiano P (2014)
Presented at the BioNLP Workshop. The 52nd Annual Meeting of the Association for Computational Linguistics (ACL), Baltimore.