MENLI: Robust Evaluation Metrics from Natural Language Inference

Chen, Yanran; Eger, Steffen

MENLI: Robust Evaluation Metrics from Natural Language Inference

Chen Y, Eger S (2023)
Transactions of the Association for Computational Linguistics TACL 11: 804-825.

Zeitschriftenaufsatz | Veröffentlicht | Englisch

Download

Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!

DOI

https://doi.org/10.1162/tacl_a_00576

Autor*in

Chen, Yanran; Eger, Steffen^UniBi

Einrichtung

Technische Fakultät > AG Semantische Datenbanken

Abstract / Bemerkung

Recently proposed BERT-based evaluation metrics for text generation perform well on standard benchmarks but are vulnerable to adversarial attacks, e.g., relating to information correctness. We argue that this stems (in part) from the fact that they are models of semantic similarity. In contrast, we develop evaluation metrics based on Natural Language Inference (NLI), which we deem a more appropriate modeling. We design a preference-based adversarial attack framework and show that our NLI based metrics are much more robust to the attacks than the recent BERT-based metrics. On standard benchmarks, our NLI based metrics outperform existing summarization metrics, but perform below SOTA MT metrics. However, when combining existing metrics with our NLI metrics, we obtain both higher adversarial robustness (15%-30%) and higher quality metrics as measured on standard benchmarks (+5% to 30%).

Erscheinungsjahr

2023

Zeitschriftentitel

Transactions of the Association for Computational Linguistics TACL

Band

Seite(n)

804-825

eISSN

2307-387X

Page URI

https://pub.uni-bielefeld.de/record/2981372

Zitieren

Chen Y, Eger S. MENLI: Robust Evaluation Metrics from Natural Language Inference. Transactions of the Association for Computational Linguistics TACL. 2023;11:804-825.

Chen, Y., & Eger, S. (2023). MENLI: Robust Evaluation Metrics from Natural Language Inference. Transactions of the Association for Computational Linguistics TACL, 11, 804-825. https://doi.org/10.1162/tacl_a_00576

Chen, Yanran, and Eger, Steffen. 2023. “MENLI: Robust Evaluation Metrics from Natural Language Inference”. Transactions of the Association for Computational Linguistics TACL 11: 804-825.

Chen, Y., and Eger, S. (2023). MENLI: Robust Evaluation Metrics from Natural Language Inference. Transactions of the Association for Computational Linguistics TACL 11, 804-825.

Chen, Y., & Eger, S., 2023. MENLI: Robust Evaluation Metrics from Natural Language Inference. Transactions of the Association for Computational Linguistics TACL, 11, p 804-825.

Y. Chen and S. Eger, “MENLI: Robust Evaluation Metrics from Natural Language Inference”, Transactions of the Association for Computational Linguistics TACL, vol. 11, 2023, pp. 804-825.

Chen, Y., Eger, S.: MENLI: Robust Evaluation Metrics from Natural Language Inference. Transactions of the Association for Computational Linguistics TACL. 11, 804-825 (2023).

Chen, Yanran, and Eger, Steffen. “MENLI: Robust Evaluation Metrics from Natural Language Inference”. Transactions of the Association for Computational Linguistics TACL 11 (2023): 804-825.

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Suchen in

Google Scholar

PUB - Publikationen an der Universität Bielefeld