TRAVELER: A Benchmark for Evaluating Temporal Reasoning Across Vague, Implicit and Explicit References
Kenneweg S, Deigmöller J, Cimiano P, Eggert J (2026)
SN Computer Science 7(5): 379.
Zeitschriftenaufsatz
| Veröffentlicht | Englisch
Download
s42979-026-04973-y.pdf
2.10 MB
Autor*in
Einrichtung
Abstract / Bemerkung
Temporal references to future, past or ongoing events are ubiquitous in natural language. Yet, there is no comprehensive benchmark dataset allowing to assess the ability of Large Language Models (LLMs) to correctly resolve these references. In this work we close this gap and provide such a benchmark that supports the evaluation of the interpretation of Explicit (e.g., “on 30.12.2024”), Implicit (e.g., “yesterday”) and Vague (e.g., “recently”) temporal references. We introduce TRAVELER, a synthetic question-answering benchmark that evaluates a system’s ability to perform temporal reasoning with respect to a set of past events. The benchmark comprises 3,300 English questions, categorized by their temporal reference and automatically generated via four templates from events sets that contain 5-100 events from a household domain. For the Vague category, ground-truth answers were established via human surveys on Prolific, following a procedure inspired by Kenneweg et al. As reported by Kenneweg et al. (in: Proceedings of the Workshop on Cognitive Aspects of the Lexicon @, Torino, 2024). LREC-COLING 2024). To demonstrate the benchmark’s applicability, we evaluate four state-of-the-art LLMs on it. All benchmarked LLMs can answer questions over events sets with a handful of events and Explicit temporal references successfully, but performance clearly deteriorates with larger event set length and when temporal references get less explicit. Notably, the Vague question category exhibits the lowest performance across all models. TRAVELER exposes limitations in current LLMs’ event temporal reasoning capabilities: Their performance clearly deteriorates with longer event sets and when includingVague temporal references. The benchmark, which is publicly available (https://gitlab.ub.uni-bielefeld.de/s.kenneweg/TRAVELER) also offers the possibility to test the event-temporal reasoning capabilities of other models beyond those tested in this work.
Stichworte
Temporal question answering;
Vagueness;
Events;
Synthetic benchmark
Erscheinungsjahr
2026
Zeitschriftentitel
SN Computer Science
Band
7
Ausgabe
5
Art.-Nr.
379
Urheberrecht / Lizenzen
eISSN
2661-8907
Page URI
https://pub.uni-bielefeld.de/record/3016157
Zitieren
Kenneweg S, Deigmöller J, Cimiano P, Eggert J. TRAVELER: A Benchmark for Evaluating Temporal Reasoning Across Vague, Implicit and Explicit References. SN Computer Science. 2026;7(5): 379.
Kenneweg, S., Deigmöller, J., Cimiano, P., & Eggert, J. (2026). TRAVELER: A Benchmark for Evaluating Temporal Reasoning Across Vague, Implicit and Explicit References. SN Computer Science, 7(5), 379. https://doi.org/10.1007/s42979-026-04973-y
Kenneweg, Svenja, Deigmöller, Jörg, Cimiano, Philipp, and Eggert, Julian. 2026. “TRAVELER: A Benchmark for Evaluating Temporal Reasoning Across Vague, Implicit and Explicit References”. SN Computer Science 7 (5): 379.
Kenneweg, S., Deigmöller, J., Cimiano, P., and Eggert, J. (2026). TRAVELER: A Benchmark for Evaluating Temporal Reasoning Across Vague, Implicit and Explicit References. SN Computer Science 7:379.
Kenneweg, S., et al., 2026. TRAVELER: A Benchmark for Evaluating Temporal Reasoning Across Vague, Implicit and Explicit References. SN Computer Science, 7(5): 379.
S. Kenneweg, et al., “TRAVELER: A Benchmark for Evaluating Temporal Reasoning Across Vague, Implicit and Explicit References”, SN Computer Science, vol. 7, 2026, : 379.
Kenneweg, S., Deigmöller, J., Cimiano, P., Eggert, J.: TRAVELER: A Benchmark for Evaluating Temporal Reasoning Across Vague, Implicit and Explicit References. SN Computer Science. 7, : 379 (2026).
Kenneweg, Svenja, Deigmöller, Jörg, Cimiano, Philipp, and Eggert, Julian. “TRAVELER: A Benchmark for Evaluating Temporal Reasoning Across Vague, Implicit and Explicit References”. SN Computer Science 7.5 (2026): 379.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Volltext(e)
Name
s42979-026-04973-y.pdf
2.10 MB
Access Level
Open Access
Zuletzt Hochgeladen
2026-05-11T13:11:45Z
MD5 Prüfsumme
5fd77fda86b2f62318884a659bc4e051
Link(s) zu Volltext(en)
Access Level
Open Access
Software:
