The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns
Bunzeck B, Zarrieß S (2024)
In: Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. Hupkes D, Dankers V, Batsuren K, Kazemnejad A, Christodoulopoulos C, Giulianelli M, Cotterell R (Eds); Miami, Florida, USA: Association for Computational Linguistics: 42-53.
Konferenzbeitrag | Englisch
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Herausgeber*in
Hupkes, Dieuwke;
Dankers, Verna;
Batsuren, Khuyagbaatar;
Kazemnejad, Amirhossein;
Christodoulopoulos, Christos;
Giulianelli, Mario;
Cotterell, Ryan
Abstract / Bemerkung
We introduce SlayQA, a novel benchmark data set designed to evaluate language models{'} ability to handle gender-inclusive language, specifically the use of neopronouns, in a question-answering setting. Derived from the Social IQa data set, SlayQA modifies context-question-answer triples to include gender-neutral pronouns, creating a significant linguistic distribution shift in comparison to common pre-training corpora like C4 or Dolma. Our results show that state-of-the-art language models struggle with the challenge, exhibiting small, but noticeable performance drops when answering question containing neopronouns compared to those without.
Erscheinungsjahr
2024
Titel des Konferenzbandes
Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP
Seite(n)
42-53
Konferenz
2nd GenBench Workshop on Generalisation (Benchmarking) in NLP
Konferenzort
Miami, Florida
Page URI
https://pub.uni-bielefeld.de/record/2994136
Zitieren
Bunzeck B, Zarrieß S. The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns. In: Hupkes D, Dankers V, Batsuren K, et al., eds. Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. Miami, Florida, USA: Association for Computational Linguistics; 2024: 42-53.
Bunzeck, B., & Zarrieß, S. (2024). The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns. In D. Hupkes, V. Dankers, K. Batsuren, A. Kazemnejad, C. Christodoulopoulos, M. Giulianelli, & R. Cotterell (Eds.), Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP (pp. 42-53). Miami, Florida, USA: Association for Computational Linguistics.
Bunzeck, Bastian, and Zarrieß, Sina. 2024. “The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns”. In Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP, ed. Dieuwke Hupkes, Verna Dankers, Khuyagbaatar Batsuren, Amirhossein Kazemnejad, Christos Christodoulopoulos, Mario Giulianelli, and Ryan Cotterell, 42-53. Miami, Florida, USA: Association for Computational Linguistics.
Bunzeck, B., and Zarrieß, S. (2024). “The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns” in Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP, Hupkes, D., Dankers, V., Batsuren, K., Kazemnejad, A., Christodoulopoulos, C., Giulianelli, M., and Cotterell, R. eds. (Miami, Florida, USA: Association for Computational Linguistics), 42-53.
Bunzeck, B., & Zarrieß, S., 2024. The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns. In D. Hupkes, et al., eds. Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. Miami, Florida, USA: Association for Computational Linguistics, pp. 42-53.
B. Bunzeck and S. Zarrieß, “The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns”, Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP, D. Hupkes, et al., eds., Miami, Florida, USA: Association for Computational Linguistics, 2024, pp.42-53.
Bunzeck, B., Zarrieß, S.: The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns. In: Hupkes, D., Dankers, V., Batsuren, K., Kazemnejad, A., Christodoulopoulos, C., Giulianelli, M., and Cotterell, R. (eds.) Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. p. 42-53. Association for Computational Linguistics, Miami, Florida, USA (2024).
Bunzeck, Bastian, and Zarrieß, Sina. “The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns”. Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. Ed. Dieuwke Hupkes, Verna Dankers, Khuyagbaatar Batsuren, Amirhossein Kazemnejad, Christos Christodoulopoulos, Mario Giulianelli, and Ryan Cotterell. Miami, Florida, USA: Association for Computational Linguistics, 2024. 42-53.