Weakly Labeled Corpora as Silver Standard for Drug-Drug and Protein-Protein Interaction

Thomas P, Bobić T, Leser U, Hofmann-Apitius M, Klinger R (2012)
In: Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM) on Language Resources and Evaluation Conference (LREC). Istanbul, Turkey.

Konferenzbeitrag | Veröffentlicht | Englisch
 
Download
OA
Autor*in
Thomas, Philippe; Bobić, Tamara; Leser, Ulf; Hofmann-Apitius, Martin; Klinger, RomanUniBi
Abstract / Bemerkung
Relation extraction is frequently and successfully addressed by machine learning methods. The downside of this approach is the need for annotated training data, typically generated in tedious manual, cost intensive work. Distantly supervised approaches make use of weakly annotated data, which can be derived automatically. Recent work in the biomedical domain has applied distant supervision for protein-protein interaction (PPI) with reasonable results, by employing the IntAct database. Training from distantly labeled corpora is more challenging than from manually curated ones, as such data is inherently noisy. With this paper, we make two corpora publicly available to the community to allow for comparison of different methods that deal with the noise in a uniform setting. The first corpus is addressing protein-protein interaction (PPI), based on named entity recognition and the use of IntAct and KUPS databases, the second is concerned with drug-drug interaction (DDI), making use of the database DrugBank. Both corpora are in addition labeled with 5 state-of-the-art classifiers trained on annotated data, to allow for development of filter methods. Furthermore, we present in short our approach and results for distant supervision on these corpora as a strong baseline for future research.
Stichworte
Distant Supervision; Relation Extraction; Silver Standard
Erscheinungsjahr
2012
Titel des Konferenzbandes
Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM) on Language Resources and Evaluation Conference (LREC)
Page URI
https://pub.uni-bielefeld.de/record/2603565

Zitieren

Thomas P, Bobić T, Leser U, Hofmann-Apitius M, Klinger R. Weakly Labeled Corpora as Silver Standard for Drug-Drug and Protein-Protein Interaction. In: Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM) on Language Resources and Evaluation Conference (LREC). Istanbul, Turkey; 2012.
Thomas, P., Bobić, T., Leser, U., Hofmann-Apitius, M., & Klinger, R. (2012). Weakly Labeled Corpora as Silver Standard for Drug-Drug and Protein-Protein Interaction. Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM) on Language Resources and Evaluation Conference (LREC) Istanbul, Turkey.
Thomas, Philippe, Bobić, Tamara, Leser, Ulf, Hofmann-Apitius, Martin, and Klinger, Roman. 2012. “Weakly Labeled Corpora as Silver Standard for Drug-Drug and Protein-Protein Interaction”. In Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM) on Language Resources and Evaluation Conference (LREC). Istanbul, Turkey.
Thomas, P., Bobić, T., Leser, U., Hofmann-Apitius, M., and Klinger, R. (2012). “Weakly Labeled Corpora as Silver Standard for Drug-Drug and Protein-Protein Interaction” in Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM) on Language Resources and Evaluation Conference (LREC) (Istanbul, Turkey).
Thomas, P., et al., 2012. Weakly Labeled Corpora as Silver Standard for Drug-Drug and Protein-Protein Interaction. In Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM) on Language Resources and Evaluation Conference (LREC). Istanbul, Turkey.
P. Thomas, et al., “Weakly Labeled Corpora as Silver Standard for Drug-Drug and Protein-Protein Interaction”, Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM) on Language Resources and Evaluation Conference (LREC), Istanbul, Turkey: 2012.
Thomas, P., Bobić, T., Leser, U., Hofmann-Apitius, M., Klinger, R.: Weakly Labeled Corpora as Silver Standard for Drug-Drug and Protein-Protein Interaction. Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM) on Language Resources and Evaluation Conference (LREC). Istanbul, Turkey (2012).
Thomas, Philippe, Bobić, Tamara, Leser, Ulf, Hofmann-Apitius, Martin, and Klinger, Roman. “Weakly Labeled Corpora as Silver Standard for Drug-Drug and Protein-Protein Interaction”. Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM) on Language Resources and Evaluation Conference (LREC). Istanbul, Turkey, 2012.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T09:18:15Z
MD5 Prüfsumme
b73b664545ef38fa00b07124d58f03b5


Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar