ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries

Chen Q, Luo W, Huang Z, Lin T, Wang X, Soylu A, Ell B, Zhou B, Kharlamov E, Cheng G (2024)
In: Proceedings of the 47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024. New York: Association for Computing Machinery: 303-312.

Konferenzbeitrag | Veröffentlicht | Englisch
 
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Chen, Qiaosheng; Luo, Weiqing; Huang, Zixian; Lin, Tengteng; Wang, Xiaxia; Soylu, Ahmet; Ell, BasilUniBi ; Zhou, Baifan; Kharlamov, Evgeny; Cheng, Gong
Abstract / Bemerkung
Dataset search, or more specifically, ad hoc dataset retrieval which is a trending specialized IR task, has received increasing attention in both academia and industry. While methods and systems continue evolving, existing test collections for this task exhibit shortcomings, particularly suffering from lexical bias in pooling and limited to keyword-style queries for evaluation. To address these limitations, in this paper, we construct ACORDAR 2.0, a new test collection for this task which is also the largest to date. To reduce lexical bias in pooling, we adapt dense retrieval models to large structured data, using them to find an extended set of semantically relevant datasets to be annotated. To diversify query forms, we employ a large language model to rewrite keyword queries into high-quality question-style queries. We use the test collection to evaluate popular sparse and dense retrieval models to establish a baseline for future studies. The test collection and source code are publicly available.
Stichworte
dataset search; ad hoc dataset retrieval; test collection; RDF
Erscheinungsjahr
2024
Titel des Konferenzbandes
Proceedings of the 47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024
Seite(n)
303-312
Konferenz
47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Konferenzort
Washington, DC
Konferenzdatum
2024-07-14 – 2024-07-18
ISBN
979-8-4007-0431-4
Page URI
https://pub.uni-bielefeld.de/record/2993435

Zitieren

Chen Q, Luo W, Huang Z, et al. ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries. In: Proceedings of the 47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024. New York: Association for Computing Machinery; 2024: 303-312.
Chen, Q., Luo, W., Huang, Z., Lin, T., Wang, X., Soylu, A., Ell, B., et al. (2024). ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries. Proceedings of the 47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024, 303-312. New York: Association for Computing Machinery. https://doi.org/10.1145/3626772.3657866
Chen, Qiaosheng, Luo, Weiqing, Huang, Zixian, Lin, Tengteng, Wang, Xiaxia, Soylu, Ahmet, Ell, Basil, Zhou, Baifan, Kharlamov, Evgeny, and Cheng, Gong. 2024. “ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries”. In Proceedings of the 47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024, 303-312. New York: Association for Computing Machinery.
Chen, Q., Luo, W., Huang, Z., Lin, T., Wang, X., Soylu, A., Ell, B., Zhou, B., Kharlamov, E., and Cheng, G. (2024). “ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries” in Proceedings of the 47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024 (New York: Association for Computing Machinery), 303-312.
Chen, Q., et al., 2024. ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries. In Proceedings of the 47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024. New York: Association for Computing Machinery, pp. 303-312.
Q. Chen, et al., “ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries”, Proceedings of the 47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024, New York: Association for Computing Machinery, 2024, pp.303-312.
Chen, Q., Luo, W., Huang, Z., Lin, T., Wang, X., Soylu, A., Ell, B., Zhou, B., Kharlamov, E., Cheng, G.: ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries. Proceedings of the 47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024. p. 303-312. Association for Computing Machinery, New York (2024).
Chen, Qiaosheng, Luo, Weiqing, Huang, Zixian, Lin, Tengteng, Wang, Xiaxia, Soylu, Ahmet, Ell, Basil, Zhou, Baifan, Kharlamov, Evgeny, and Cheng, Gong. “ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries”. Proceedings of the 47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024. New York: Association for Computing Machinery, 2024. 303-312.