Towards an Interaction-Centered and Dynamically Constructed Episodic Memory for Social Robots

This paper outlines an interaction-centered and dynamically constructed episodic memory for social robots, in order to enable naturalistic, social human-robot interaction. The proposed model includes a record of multi-timescale events stored in the event history, a record of multi-timescale interval definitions stored as interaction episodes, and a set of links associating specific elements of the two records. The event history is constructed dynamically, depending on the occurrence of internal and external events. The interaction episodes are defined on the basis of robot-initiated and user-initiated interactions. The episodic memory is realised within a social human-robot interaction architecture, whose components generate events pertaining to the context and state of interaction.


INTRODUCTION
Social interaction evolves over time and thus requires access to past experiences and events. In humans, this information is provided by an episodic memory [11]. With intelligent agents and social robots finding increased presence in social contexts as interaction partners, it becomes essential that they too possess the capacity to evolve, learn, and adapt based on past experiences, in order to enable naturalistic interaction with humans as well as the environment. Several models have been proposed for building episodic memories within a cognitive or interaction architecture for intelligent agents (e.g. [6,7]) or social robots (e.g. [1,3]). These were either Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s  [9] or cognitive models (e.g. in the SOAR architecture [6]), or computational models (e.g. [2,3,7]) that either pre-define the type of information to be stored in each episode (cf. [3,7]) or define episodes as a sequence of events that occurred during the execution of a task (cf. [1,3]). However, due to the complexity and highly dynamic nature of social interactions, it is not possible to predict which internal or external events will occur during an interaction, and when they will occur. Therefore, pre-defined templates for episodes [3,7] cannot cover all the variances involved in an interaction and its context. Moreover, an interaction could be influenced by events that occurred before the start of the interaction, and an interaction itself could become the cause of future events, whose temporal distances to the current time instant are difficult to predict. Therefore, defining episodes based on tasks [1,3] makes it difficult to establish associations between events and episodes, in cases where the events appear before or after the tasks.
To overcome these limitations, we propose a computational model for episodic memory that (i) separately records the internal and external events through an event history that grows dynamically, i.e. as and when the events occur; (ii) separately records interaction episodes that are time intervals tagged on the basis of interactions, which may have been initiated either by the robot via a plan of actions or by the user via dialogue; (iii) defines associative links between events in the event history and time intervals within the interaction episodes. The episodes are structured based on interactions, and the internal and external events are dynamically embedded or linked to the different stages of an episode. This episodic memory is realised within an interaction architecture [10] that supports needs-based, lively behaviour for social robots, unlike the architectures used in [1,3]. The architecture in [10] is eventbased, and the components deliver information about the context and state of interactions. The proposed episodic memory could enable several interesting long-term social human-robot interactions. Analysis of events, episodes, and their associations could reveal patterns in user activity, causing the robot to form expectations about user behaviour and interaction outcome. Based on interaction episodes and associated events, user preferences could be inferred, which could influence future interactions. The temporal structure of episodes and the links between episodes and events could enable the robot to construct rich spoken narratives. This could involve recalling past experiences, or offering diverse, situation-specific explanations about its current behaviour. The following sections introduce the architecture and the proposed episodic memory. Figure 1 shows an adapted version of the interaction architecture in [10] that is being developed as part of this work. The perception Late-Breaking Report HRI '20 Companion, March 23-26, 2020, Cambridge, United Kingdom components provide continuous or event-based streams of data about audiovisual stimuli (e.g. face, sound). The multimodal perception processor integrates perception information and triggers reactive behaviours. Memory is a central component that temporally integrates information about perception, decision, and action occurring or performed at different timescales [4,5], i.e. orders of magnitude of time. The memory models and keeps track of the current state of the user, the robot, and the interaction between them. It is composed of two sub-components: working memory and episodic memory. The working memory stores and processes recent events.

INTERACTION ARCHITECTURE
The episodic memory integrates information over time, enabling a declarative representation of sequences of interactions and events. The needs engine controls and regulates the intrinsic needs of the robot, which are influenced by updates from the memory about changes in the interaction context. The decision engine chooses a plan of actions (referred to as strategy) based on the user's and the robot's needs, in order to fulfil an intention or goal. It sends a decision snapshot to memory indicating 'which' decision was made (i.e. the strategy to be executed), 'why' this decision was made (i.e. the needs of user and robot, and expected utility of the strategy), and optionally, 'how' this decision was made (for example, an instantiated decision rule, or a path through a decision tree). A selected strategy is executed via the action planner and executor, the behaviour controller, and the different hardware-control engines (e.g. speech, head, body). The components and interfaces of the architecture have been implemented in Python.

EPISODIC MEMORY
The proposed episodic memory is composed of two records: (i) a multi-timescale [4,5] event history containing processed perception events from the working memory, decision snapshots from the decision engine, and other events communicating changes in user and robot states; and (ii) multi-timescale interaction episodes that primarily tag the start and end times of interactive sessions involving the robot and the human or the environment. These interactions can be initiated either by the user (e.g. by starting a conversation) or by the robot (e.g. by triggering strategies or actions to fulfil own needs). Accordingly, interaction episodes are defined as continuous temporal intervals during which the robot and the user engage in a conversation and/or the robot executes one or more strategies. The action planner and executor component of the architecture provides information about the active strategy and the active actions. Information about an ongoing conversation with the user is reported by a dialogue manager integrated within the architecture. Figure 2 illustrates the proposed structure of episodic memory. In the event history, each perception event is represented as a discrete-valued, continuous-time signal that captures the state changes associated with the event. For compactness, only the discrete state changes in the perception events are recorded in the event history. However, decision snapshots are recorded as instantaneous events. But, depending on the models used for decision-making, the decision snapshots could hold information spread over a nonzero interval of time in their 'why' and 'how' parts. The entries in the event history and the elements of the interaction episodes could be connected through associative links that are either known or inferred until the time of last update of the episodic memory. These could be either causal, enabler, or goal links. Interactions could be initiated as a response to internal or external events that are recorded in the event history. In these cases, the corresponding events are linked to the elements of the interaction episodes via causal links. Events representing preconditions and postconditions of actions are connected with the elements of the interaction episodes through enabler and goal links, respectively. Decision snapshots are connected with interaction episodes through causal links, if and when the selected strategy is started by the action planner and executor component. Elements of interaction episodes could themselves cause events. This can be represented via causal links from episodes to events.

THE NEXT STEPS
The implementation of the event history and interaction episodes has been carried out in the architecture and was tested on the Pepper robot [8]. Ongoing work is directed to using this model as a basis for generating behaviour explanations [10]. As a next step, the creation of causal links between interaction episodes and events will be examined, in order to create expectations about future events. User studies are planned to test the explanations and expectations constructed online from the dynamic episodic memory.

ACKNOWLEDGMENTS
This work was funded by the German Federal Ministry of Education and Research (BMBF) through the project 'VIVA' (FKZ 16SV7959). Special thanks to Hendrik Buschmeier and Sonja Stange for their feedback on the contents of this paper.