Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction
Bobic T, Klinger R (2013)
In: Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics.(70). Research in Computing Science.
Konferenzbeitrag
| Veröffentlicht | Englisch
Download
Autor*in
Bobic, Tamara;
Klinger, RomanUniBi
Einrichtung
Abstract / Bemerkung
Manual annotation is a tedious and time consuming process, usually
needed for generating training corpora to be used in a machine learning scenario.
The distant supervision paradigm aims at automatically generating such corpora
from structured data. The active learning paradigm aims at reducing the effort
needed for manual annotation. We explore active and distant learning approaches
jointly to limit the amount of automatically generated data needed for the use case
of relation extraction by increasing the quality of the annotations.
The main idea of using distantly labeled corpora is that they can simplify and
speed-up the generation of models, e. g. for extracting relationships between entities
of interest, while the selection of instances is typically performed randomly.
We propose the use of query-by-committee to select instances instead. This approach
is similar to the active learning paradigm, with a difference that unlabeled
instances are weakly annotated, rather than by human experts. Different strategies
using low or high confidence are compared to random selection. Experiments on
publicly available data sets for detection of protein-protein interactions show a
statistically significant improvement in F1 measure when adding instances with a
high agreement of the committee.
Erscheinungsjahr
2013
Titel des Konferenzbandes
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics
Ausgabe
70
Konferenz
International Conference on Intelligent Text Processing and Computational Linguistics
Konferenzort
Samos. Greece
Konferenzdatum
2013-03-24 – 2013-03-30
Page URI
https://pub.uni-bielefeld.de/record/2603361
Zitieren
Bobic T, Klinger R. Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction. In: Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics. Research in Computing Science; 2013.
Bobic, T., & Klinger, R. (2013). Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction. Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics Research in Computing Science.
Bobic, Tamara, and Klinger, Roman. 2013. “Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction”. In Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics. Research in Computing Science.
Bobic, T., and Klinger, R. (2013). “Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction” in Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (Research in Computing Science).
Bobic, T., & Klinger, R., 2013. Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction. In Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics. Research in Computing Science.
T. Bobic and R. Klinger, “Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction”, Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics, Research in Computing Science, 2013.
Bobic, T., Klinger, R.: Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction. Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics. Research in Computing Science (2013).
Bobic, Tamara, and Klinger, Roman. “Committee-based Selection of Weakly Labeled Instances for Learning Relation Extraction”. Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics. Research in Computing Science, 2013.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:15Z
MD5 Prüfsumme
63ad71d2c1830bab9342b731037fac2f
Link(s) zu Volltext(en)
Access Level
Closed Access