The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns

Bunzeck, Bastian; Zarrieß, Sina

The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns

Bunzeck B, Zarrieß S (2024)
In: Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. Hupkes D, Dankers V, Batsuren K, Kazemnejad A, Christodoulopoulos C, Giulianelli M, Cotterell R (Eds); Miami, Florida, USA: Association for Computational Linguistics: 42-53.

Konferenzbeitrag | Englisch

Download

Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!

URL

https://aclanthology.org/2024.genbench-1.3/

Autor*in

Bunzeck, Bastian^UniBi ; Zarrieß, Sina^UniBi

Herausgeber*in

Hupkes, Dieuwke; Dankers, Verna; Batsuren, Khuyagbaatar; Kazemnejad, Amirhossein; Christodoulopoulos, Christos; Giulianelli, Mario; Cotterell, Ryan

Einrichtung

Fakultät für Linguistik und Literaturwissenschaft > Department Linguistik

Abstract / Bemerkung

We introduce SlayQA, a novel benchmark data set designed to evaluate language models{'} ability to handle gender-inclusive language, specifically the use of neopronouns, in a question-answering setting. Derived from the Social IQa data set, SlayQA modifies context-question-answer triples to include gender-neutral pronouns, creating a significant linguistic distribution shift in comparison to common pre-training corpora like C4 or Dolma. Our results show that state-of-the-art language models struggle with the challenge, exhibiting small, but noticeable performance drops when answering question containing neopronouns compared to those without.

Erscheinungsjahr

2024

Titel des Konferenzbandes

Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP

Seite(n)

42-53

Konferenz

2nd GenBench Workshop on Generalisation (Benchmarking) in NLP

Konferenzort

Miami, Florida

Page URI

https://pub.uni-bielefeld.de/record/2994136

Zitieren

Bunzeck B, Zarrieß S. The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns. In: Hupkes D, Dankers V, Batsuren K, et al., eds. Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. Miami, Florida, USA: Association for Computational Linguistics; 2024: 42-53.

Bunzeck, B., & Zarrieß, S. (2024). The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns. In D. Hupkes, V. Dankers, K. Batsuren, A. Kazemnejad, C. Christodoulopoulos, M. Giulianelli, & R. Cotterell (Eds.), Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP (pp. 42-53). Miami, Florida, USA: Association for Computational Linguistics.

Bunzeck, Bastian, and Zarrieß, Sina. 2024. “The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns”. In Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP, ed. Dieuwke Hupkes, Verna Dankers, Khuyagbaatar Batsuren, Amirhossein Kazemnejad, Christos Christodoulopoulos, Mario Giulianelli, and Ryan Cotterell, 42-53. Miami, Florida, USA: Association for Computational Linguistics.

Bunzeck, B., and Zarrieß, S. (2024). “The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns” in Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP, Hupkes, D., Dankers, V., Batsuren, K., Kazemnejad, A., Christodoulopoulos, C., Giulianelli, M., and Cotterell, R. eds. (Miami, Florida, USA: Association for Computational Linguistics), 42-53.

Bunzeck, B., & Zarrieß, S., 2024. The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns. In D. Hupkes, et al., eds. Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. Miami, Florida, USA: Association for Computational Linguistics, pp. 42-53.

B. Bunzeck and S. Zarrieß, “The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns”, Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP, D. Hupkes, et al., eds., Miami, Florida, USA: Association for Computational Linguistics, 2024, pp.42-53.

Bunzeck, B., Zarrieß, S.: The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns. In: Hupkes, D., Dankers, V., Batsuren, K., Kazemnejad, A., Christodoulopoulos, C., Giulianelli, M., and Cotterell, R. (eds.) Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. p. 42-53. Association for Computational Linguistics, Miami, Florida, USA (2024).

Bunzeck, Bastian, and Zarrieß, Sina. “The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns”. Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. Ed. Dieuwke Hupkes, Verna Dankers, Khuyagbaatar Batsuren, Amirhossein Kazemnejad, Christos Christodoulopoulos, Mario Giulianelli, and Ryan Cotterell. Miami, Florida, USA: Association for Computational Linguistics, 2024. 42-53.

Link(s) zu Volltext(en)

URL

https://aclanthology.org/2024.genbench-1.3/

Access Level

Open Access

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar

PUB - Publikationen an der Universität Bielefeld

The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns

Zitieren