Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction

Bobic T, Klinger R (2013)
In: Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics. Research in Computing Science.

Download
OA
Conference Paper | Published | English
Author
;
Abstract
Manual annotation is a tedious and time consuming process, usually needed for generating training corpora to be used in a machine learning scenario. The distant supervision paradigm aims at automatically generating such corpora from structured data. The active learning paradigm aims at reducing the effort needed for manual annotation. We explore active and distant learning approaches jointly to limit the amount of automatically generated data needed for the use case of relation extraction by increasing the quality of the annotations. The main idea of using distantly labeled corpora is that they can simplify and speed-up the generation of models, e. g. for extracting relationships between entities of interest, while the selection of instances is typically performed randomly. We propose the use of query-by-committee to select instances instead. This approach is similar to the active learning paradigm, with a difference that unlabeled instances are weakly annotated, rather than by human experts. Different strategies using low or high confidence are compared to random selection. Experiments on publicly available data sets for detection of protein-protein interactions show a statistically significant improvement in F1 measure when adding instances with a high agreement of the committee.
Publishing Year
Conference
International Conference on Intelligent Text Processing and Computational Linguistics
Location
Samos. Greece
Conference Date
2013-03-24 – 2013-03-30
PUB-ID

Cite this

Bobic T, Klinger R. Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction. In: Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics. Research in Computing Science; 2013.
Bobic, T., & Klinger, R. (2013). Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction. Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics(70).
Bobic, T., and Klinger, R. (2013). “Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction” in Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (Research in Computing Science).
Bobic, T., & Klinger, R., 2013. Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction. In Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics. Research in Computing Science.
T. Bobic and R. Klinger, “Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction”, Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics, Research in Computing Science, 2013.
Bobic, T., Klinger, R.: Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction. Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics. Research in Computing Science (2013).
Bobic, Tamara, and Klinger, Roman. “Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction”. Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics. Research in Computing Science, 2013.
Main File(s)
Access Level
OA Open Access
Last Uploaded
2013-09-26 21:45:57

This data publication is cited in the following publications:
This publication cites the following data publications:

Export

0 Marked Publications

Open Data PUB

Search this title in

Google Scholar