Towards an Hierarchical Model-Based Reinforcement Learning Approach to Dynamic Decision-Making in Uncertain Environment

Österdiekhoff, Annika; Heinrich, Nils Wendel; Rußwinkel, Nele; Kopp, Stefan

Towards an Hierarchical Model-Based Reinforcement Learning Approach to Dynamic Decision-Making in Uncertain Environment

Österdiekhoff A, Heinrich NW, Rußwinkel N, Kopp S (Accepted)
Presented at the 15th International Conference on Agents and Artificial Intelligence (ICAART), Lissabon.

Kurzbeitrag Konferenz / Poster | Angenommen | Englisch

Download

abstract_icaart.pdf 16.59 KB

URN

urn:nbn:de:0070-pub-29694706

Autor*in

Österdiekhoff, Annika^UniBi ; Heinrich, Nils Wendel; Rußwinkel, Nele; Kopp, Stefan^UniBi

Einrichtung

Technische Fakultät > AG Kognitive Systeme und soziale Interaktion
Center of Excellence - Cognitive Interaction Technology CITEC > Kognitive Systeme und soziale Interaktion

Projekt

COMPAS: Computational cognitive modeling of the predictive active seif in situated action (COMPAS)

Abstract / Bemerkung

Autonomous intelligent agents often fail to act in uncertain environments which can be easily solved by humans. A possible reason of the performance gap is that humans have a sense of control (SoC) which helps them successfully acting in an environment. The SoC is the subjective feeling of control of a specific action. It can range between everything happens as intended and feeling full control to feeling out of control, when nothing happens as intended. However, it is unclear how to model a SoC and use it for autonomous intelligent agents. Currently, the best approach for modeling the learning process of an agent on how to act in an uncertain environment is reinforcement learning (RL). We assume including a SoC into a RL-based autonomous intelligent agent will improve the agents behavior. For this we implement a so-called “moonlander” environment where an agent has to steer a spaceship through a world. This world is framed at the left-hand and right-hand side with walls. The distance between the walls can vary so that walls build a funnel which the agent has to pass. Moreover, there are difficulties like obstacles which have to be circumnavigated. In addition to the overall level design, we include a drift. This drift can push the agent into the left or right direction. The agent should successfully learn how to steer through this world. The learning process is modeled as a reinforcement learning set-up with an actor-critic algorithm. The agent can take actions in form of going to the left or right or staying in the same position. With every action the agent goes automatically one step further in the world. When taking the action, the agent gets back an observation and a reward. A positive reward indicates that the agent has not crashed in a wall or obstacle, whereas a negative reward indicates a crash. The observation is a pixel map of the next few steps of the world, including the positions of the agent, the walls, the obstacles and drifts. The SoC is calculated from sensorimotor prediction-errors implemented in previous research (Kahl et al., 2022). In every situation the agent is equipped with a value on a range from low to high SoC indicating how high the SoC in this situation is. We assume that the agent performs better when being equipped with a SoC because of choosing different actions when having different levels of control. Imagine we have a situation where the agent is near the left wall. An obstacle occurs on the left side of the world, but it is still possible to pass the obstacle on the left side. The agent does not have to steer in any direction, but may just stay in the same position. But the distance between the left wall and the obstacle is very small. When a drift occurs it would be difficult to counter-steer against it. The agent could also pass the obstacle on its right-hand side. The distance there between the obstacle and the right wall is much bigger, but the agent has to steer to the right to get in the position of passing the obstacle there. We assume that the chosen path depends on the level of SoC. When the agent has a high SoC, then it would stay and pass the obstacle in the small pass on the left side of the obstacle. It can choose the more risky path because it feels in full control over the spaceship. On the other hand, when the agent has a low SoC, it would chose the safer path and steer to the right for passing the obstacle on the right-hand side of the obstacle. This means, a SoC would help RL-based autonomous intelligent agents in uncertain environments to find better performing strategies. In general, equipping agents with a SoC lead to better agent behavior and improves the overall performance.

Stichworte

Model-Based Reinforcement Learning; Dynamic Decision-Making; Goal-Directed Planning

Erscheinungsjahr

2023

Urheberrecht / Lizenzen

Creative Commons Public Domain Dedication (CC0 1.0)

Konferenz

15th International Conference on Agents and Artificial Intelligence (ICAART)

Konferenzort

Lissabon

Konferenzdatum

2023-02-22 – 2023-02-24

Page URI

https://pub.uni-bielefeld.de/record/2969470

Zitieren

Österdiekhoff A, Heinrich NW, Rußwinkel N, Kopp S. Towards an Hierarchical Model-Based Reinforcement Learning Approach to Dynamic Decision-Making in Uncertain Environment. Presented at the 15th International Conference on Agents and Artificial Intelligence (ICAART), Lissabon.

Österdiekhoff, A., Heinrich, N. W., Rußwinkel, N., & Kopp, S. (Accepted). Towards an Hierarchical Model-Based Reinforcement Learning Approach to Dynamic Decision-Making in Uncertain Environment. Presented at the 15th International Conference on Agents and Artificial Intelligence (ICAART), Lissabon.

Österdiekhoff, Annika, Heinrich, Nils Wendel, Rußwinkel, Nele, and Kopp, Stefan. Accepted. “Towards an Hierarchical Model-Based Reinforcement Learning Approach to Dynamic Decision-Making in Uncertain Environment”. Presented at the 15th International Conference on Agents and Artificial Intelligence (ICAART), Lissabon .

Österdiekhoff, A., Heinrich, N. W., Rußwinkel, N., and Kopp, S. (Accepted).“Towards an Hierarchical Model-Based Reinforcement Learning Approach to Dynamic Decision-Making in Uncertain Environment”. Presented at the 15th International Conference on Agents and Artificial Intelligence (ICAART), Lissabon.

Österdiekhoff, A., et al., Accepted. Towards an Hierarchical Model-Based Reinforcement Learning Approach to Dynamic Decision-Making in Uncertain Environment. Presented at the 15th International Conference on Agents and Artificial Intelligence (ICAART), Lissabon.

A. Österdiekhoff, et al., “Towards an Hierarchical Model-Based Reinforcement Learning Approach to Dynamic Decision-Making in Uncertain Environment”, Presented at the 15th International Conference on Agents and Artificial Intelligence (ICAART), Lissabon, Accepted.

Österdiekhoff, A., Heinrich, N.W., Rußwinkel, N., Kopp, S.: Towards an Hierarchical Model-Based Reinforcement Learning Approach to Dynamic Decision-Making in Uncertain Environment. Presented at the 15th International Conference on Agents and Artificial Intelligence (ICAART), Lissabon (Accepted).

Österdiekhoff, Annika, Heinrich, Nils Wendel, Rußwinkel, Nele, and Kopp, Stefan. “Towards an Hierarchical Model-Based Reinforcement Learning Approach to Dynamic Decision-Making in Uncertain Environment”. Presented at the 15th International Conference on Agents and Artificial Intelligence (ICAART), Lissabon, Accepted.

Alle Dateien verfügbar unter der/den folgenden Lizenz(en):

Creative Commons Public Domain Dedication (CC0 1.0):