12 Publikationen
-
-
-
2025 | Konferenzbeitrag | PUB-ID: 3000275Bunzeck, Bastian, Duran, Daniel, Schade, Leonie, and Zarrieß, Sina. “Small Language Models Also Work With Small Vocabularies: Probing the Linguistic Abilities of Grapheme- and Phoneme-Based Baby Llamas”. Proceedings of the 31st International Conference on Computational Linguistics. Ed. Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, and Steven Schockaert. Abu Dhabi, UAE: Association for Computational Linguistics, 2025. 6039-6048.PUB | PDF | Download (ext.)
-
2024 | Konferenzbeitrag | PUB-ID: 3001254Bunzeck, Bastian, Duran, Daniel, Schade, Leonie, and Zarrieß, Sina. “Graphemes vs. phonemes: battling it out in character-based language models”. The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning. Ed. Michael Y. Hu, Aaron Mueller, Candace Ross, Adina Williams, Tal Linzen, Chengxu Zhuang, Leshem Choshen, Ryan Cotterell, Alex Warstadt, and Ethan Gotlieb Wilcox. Miami, FL, USA: Association for Computational Linguistics, 2024. 54-64.PUB | PDF | Download (ext.)
-
2024 | Konferenzbeitrag | PUB-ID: 2993430Bunzeck, Bastian, and Zarrieß, Sina. “Fifty shapes of BLiMP: syntactic learning curves in language models are not uniform, but sometimes unruly”. Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning. Ed. Amy Qiu, Bill Noble, David Pagmar, Vladislav Maraev, and Nikolai Ilinykh. Kerrville, TX: Association for Computational Linguistics, 2024. 39-55.PUB | PDF | Download (ext.)
-
-
2024 | Konferenzbeitrag | PUB-ID: 2994136Bunzeck, Bastian, and Zarrieß, Sina. “The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns”. Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. Ed. Dieuwke Hupkes, Verna Dankers, Khuyagbaatar Batsuren, Amirhossein Kazemnejad, Christos Christodoulopoulos, Mario Giulianelli, and Ryan Cotterell. Miami, Florida, USA: Association for Computational Linguistics, 2024. 42-53.PUB | Download (ext.)
-
2023 | Datenpublikation | PUB-ID: 2993810Wojcik, Paula, Bunzeck, Bastian, and Zarrieß, Sina. Replication Data for: "The Wikipedia Republic of Literary Characters". Harvard Dataverse, 2023.PUB | Dateien verfügbar | DOI
-
-
2023 | Konferenzbeitrag | Veröffentlicht | PUB-ID: 2985109Bunzeck, Bastian, and Zarrieß, Sina. “GPT-wee: How Small Can a Small Language Model Really Get?”. Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning. Ed. Alex Warstadt, Aaron Mueller, Leshem Choshen, Ethan Wilcox, Chengxu Zhuang, Juan Ciro, Rafael Mosquera, Bhargavi Paranjabe, Adina Williams, Tal Linzen, and Ryan Cotterell. Stroudsburg, PA: Association for Computational Linguistics, 2023. 35-46.PUB | PDF | DOI | Download (ext.)
-
2023 | Zeitschriftenaufsatz | Veröffentlicht | PUB-ID: 2980943Druskat, Stephan, Krause, Thomas, Lachenmaier, Clara, and Bunzeck, Bastian. “Hexatomic: An extensible, OS-independent platform fordeep multi-layer linguistic annotation of corpora”. Journal of Open Source Software 8.86 (2023): 4825.PUB | PDF | DOI
-
2023 | Konferenzbeitrag | Veröffentlicht | PUB-ID: 2982902Bunzeck, Bastian, and Zarrieß, Sina. “Entrenchment Matters: Investigating Positional and Constructional Sensitivity in Small and Large Language Models”. Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD). Ed. Ellen Breitholtz, Shalom Lappin, Sharid Loaiciga, Nikolai Ilinykh, and Simon Dobnik. Stroudsburg, PA: Association for Computational Linguistics, 2023. 25-37.PUB | PDF