Agent navigation

In this tutorial we show how to perform active inference in a simple agent navigation model. In the explanation of active inference given in a previous tutorial we emphasized the fact that active inference agents need an environment to interact with. We will use the classic GridWorld environment to demonstrate agent navigation.

In this tutorial you will learn:

  • How to build POMDP models

  • How to perform action selection in the model editor or the Python SDK

  • How to interpret the results of action selection

The model file associated with this example is show below:

Model file for the agent navigation example

The GridWorld environment

In GridWorld, an agent inhabits a simple environment consisting of 9 squares:

The GridWorld environment.

The agent will start the simulation in one of these nine squares. The agent's goal is then to navigate to another square on the grid. To make this example more realistic, let's suppose that these squares correspond to physical locations or areas of a large, simplified model of a real world warehouse. Below is a list of each of these locations and the corresponding description:

Grid #
Position
Description

0

Loading dock (load)

A large area near the entrance where goods are loaded and unloaded

1

Aisle near shelving units (aisle)

An aisle between rows of storage shelves

2

Bulk storage (store)

A large open space or room where large or bulk items are stored on pallets

3

Inventory office (office)

An office or small room where warehouse operations are monitored and controlled

4

Conveyor belt area (convey)

A section of the warehouse with moving conveyor belts used to transport items

5

Forklift charging station (charge)

A designated area where forklifts are recharged or stored when not in use

6

Packing station (pack)

An area where goods are being packaged or processed, with tables and packing materials

7

Shipping and receiving desk (desk)

An office or small room where warehouse operations are monitored and controlled

8

End of aisle (end)

A location near the end of a particular aisle that may have exits or walkways leading outside

Problem statement: If the agent begins in bulk storage (position 2) how can it get to the packing station?

According to this problem statement, the agent must begin at the START and navigate to the red star:

The agent's goal.

The active inference agent

The first goal will be to develop an active inference agent. As explained in the active inference tutorial, active inference agents utilize a Bayesian network with a special structure known as a partially observable Markov decision process (POMDP). This model has the following structure:

The active inference agent's model.

Defining the model variables

Creating an active inference agent therefore involves defining the structure of the variables and factors in the model. Since we are operating under assumptions of partial observability, the agent would not know its actual position in the warehouse at any given moment in time. All it has access to are its sensors which stream in data that we will call the agent's observations. The agent must use these observations alone to infer its actual position in the environment. We can think of the observations as context clues that uniquely identify its position. With this in mind, we can define our states and observations.

States: The agent's position in the warehouse which will change over time as it moves. The position variable will have nine categories in it corresponding to the unique warehouse locations.

  • State: The agent's belief about its position in the warehouse which will change over time as it moves.

  • Observations: What the agent observes when it is in a particular position in the warehouse. These observations can help the agent identify where it is because certain features of its world (the warehouse) are associated with its position. For example, when the agent is in the state aisle it would expect to observe shelves.

Position Category
Observation Category
Observation description

load

door

A large overhead door with a textured ramp

aisle

shelf

Rows of shelves filled with boxes or goods

store

pallet

Large stacks of pallets on industrial shelves

office

table

Tables and desks with packing tape and materials

convey

belt

Boxes moving on a belt

charge

light

Forklift charging lights

pack

box

Boxes being packed

desk

paper

Desks with paperwork and packing lists

end

ramp

A large concerete ramp

It is important to note from these observations that the agent may not be able to uniquely identify its position from the observation alone. For example, multiple positions are associated with shelves, desk, or ramps. This is not a problem in itself as it reflects a situation of decision-making under uncertainty.

We also need to select the possible actions the agent can perform in this environment. We will assume that the agent has five possible options. It can move UP, DOWN, LEFT, RIGHT, or STAY in order to move between the different warehouse locations.

Defining the model factors

Now our goal is to set the model factors. The model factors capture the probabilities in the model that relate the model variables. We start with the likelihood which defines agent's belief about how states generate observations. In other words, if the agent is in a particular state (position in the warehouse) what does the agent believe is the probability of receiving a particular observation (sensor data)? This can be compactly represented in a likelihood tensor which represents a piece of the agent's understanding about the structure of the world. We will use the following likelihood tensor:

The agent's likelihood tensor

This likelihood tensor has the state categories along the columns and the observation categories along the rows. Each column is a categorical distribution that must sum to 1. As we can see, the agent has the highest probability of identifying the correct association between a state and the resulting observations (main diagonal of the likelihood tensor, top-left corner to bottom-right corner). However, there is a small probability that the same state is associated with other observations.

For example, suppose the agent believes it is in position 3 (office). To see what observations it believes are most likely, we take this slice out and examine it. Each element in this slice now corresponds to a particular observation category where have locked in state=office. We can see here that the observation category table has a probability of 0.7. This means that when an agent is in the office it expects to see a table with the highest probability. However, it may also expect to see paper, shelves, or a door.

The agent's belief about observations it will receive when in the office

Next, we define the initial state prior factor. This is the agent's belief about its position at the beginning of the experiment. We are free to specify this model component as we like for the simulation. We will assume that the agent (correctly) believes that it is in its actual starting position in the packing area. We indicate this by creating a state vector where each element corresponds to the agent's belief about what position it is in and mark the packing area with a 1. This denotes completely certainty by the agent about where it is. Note that this may not be where the agent actually is, just where it believes that it is before it has seen any data to confirm or deny this initial hypothesis.

The agent's initial belief about its position

Next, we need a state-transition factor. This factor specifies how the agent believes the environment will change over time given a particular action it performs.

The agent's belief about state transitions for each possible action it can take

The above five tensors represent "slices" of the full state-transition tensor, one for each possible action. To determine the agent's belief about the next state of the environment, the agent will follow these steps:

  1. Determine the action the agent will perform. We will say the agent is going to move DOWN. If the agent is moving DOWN it will use the third tensor to determine the next state (green box with Action: DOWN).

  2. Determine the agent's current belief about its position. We will say the agent believes it is in position 2 (store). We use this to pick the second column of the matrix (vertical red box).

  3. Determine the probabilities of transitioning to the next state. According to the column we selected in step 2, the agent believes it has a probability of 1 of transitioning to position 5 (charge).

circle-info

In the agent's model, the five slices of the state-transition tensor would be combined into a single object rather than five separate objects as shown in the figure.

Finally, we need a preference factor. This preference factor encodes the agent's "desire", "goal", or, more accurately, expectations about the types of observations it wishes to receive from the environment. Not that we define preferences here in terms of the observations the agent expects to receive and not the state it desires to be in. For this example, the agent desires to be in the packing station, pack. So, it would expect to receive observations that indicate it is in this position (box):

The agent's preferences for observations

This vector has a "1" in the entry corresponding to box which indicates that its goal is to receive observations from the environment correspond to boxes. Note that according to the agent's model, boxes are most highly associated with the packing area which is the position it would end up in if it pursued actions that would bring about the highest probability of observing boxes. However, there are other positions it could be in within the warehouse that would also result in this observation. This fact once again highlights that the agent must make decisions under conditions of uncertainty.

The action-perception loop

As indicated in the previous active inference tutorial, the agent uses its model to select sequences of actions to obtain its goal. Each time it executes an action it will alter the environment (its true position) which will lead to a new observation being generated. A full step of the simulation would consist of the following steps:

  1. Agent:

    1. Receive observation from warehouse environment

    2. Use observation to determine a belief about the current state of the environment (perception)

    3. Determine the correct action to take based on prior preferences (decision-making / action selection)

    4. Execute action and move to new position in the warehouse

  2. Environment:

    1. Receive action from agent

    2. Use action to transition to a new state (state-transition)

    3. Generate observation based on this new state

  3. Repeat steps 1 and 2

This is sometimes referred to as the action-perception cycle in the active inference literature. In our particular example, this process entails the following:

  • The agent determines its position in the warehouse from observation received in the environment

  • The agent uses its preferences to decide and execute an action that will bring it closer to its goals

  • The environment (the agent's position) changes to a new state as a result of the action

  • This new state generates new observation that the agent can receive

Querying the model

In this section we demonstrate how to perform active inference in the model editor and in the Python SDK.

To query the model, we need to first connect to the agent, load a JSON model file, and send the loaded model to the agent. We will use the GridWorld POMDP file which can either be pasted into the box during loading or saved locally. After this is done, we are ready to query the model.

To perform active inference in this scenario we will need to manually specify the agent's new position given the action they perform. At the start of the simulation, the agent believes that it is in position 2 (store) which means that they are the storage area. Since the agent really is here (its beliefs conform with reality) it will receives the pallet observation when in this part of the warehouse. We provide this observation to the agent and examine the result:

The agent's first action

If the agent moves left, it will be in the aisle position. The correspond observation it would receive here is shelf. We provide this observation to the agent and see that the selected action is DOWN which moves it into the convey position.

From here, the agent receives the belt observation and selects the DOWN action. This brings it to the desk position where the agent receives the paper observation. Continuing in this fashion we see that the agent makes its way to the final pack position where it performs the STAY action.

Interpreting the results

According to these results, the agent takes six total actions. It starts in position 2 (bulk storage) and moves to position 6 (packing station) where it stays until the end of the simulation. The figure below shows the agent's progression toward its goal. Once it reaches the goal square it stays there for two further time steps.

The agent's path to its goal in the warehouse

Each time we perform action selection the Genius agent will return information that reveals how the agent made its decision about which action to perform. This information is returned in both the model editor and Python SDK.

circle-info

The action data reference page gives a more thorough analysis of all elements that are returned from action selection and their purpose.

For example, at any given time step we can calculate which state the agent thinks is most likely. We could also see the agent's evaluation of different policies at a given time step. Below we analyze the agent's behavior when in position 4 (convey)

When the agent is in position 4 (convey) it's belief state is:

The agent's state belief after receiving an observation in position 4

This indicates that the after receiving the belt observation the agent believes it is in the conveyer belt room. We can see that it has selected to move DOWN. Examining the action values we can see that DOWN was selected at the highest probability:

The agent's policy belief after receiving an observation in position 4
circle-info

Inverse EFE and utility are explained in more detail in the multi-armed bandit tutorial. They do not apply in this simple agent navigation example.

Last updated