Insulin pump

One of the most powerful features of Genius is that agents can learn to make decisions under uncertainty. In this example, we will use the Genius SDK to build an agent that acts as an insulin pump for users with Type I diabetes. This will showcase a Genius agent's ability to perform continual parameter learning.

Whenever foods containing carbohydrates are consumed, the glucose molecules in the food will be released into the bloodstream as a source of energy for the body. Individuals with Type I diabetes cannot regulate glucose levels in their bloodstream which may become dangerously high and cause numerous medical complications. The administration of the hormone insulin allows these glucose levels to return to normal levels within the bloodstream.

According to The Cleveland Clinic an insulin pump is

A wearable medical device that supplies a continuous flow of rapid-acting insulin underneath your skin. Most pumps are small, computerized devices that are roughly the size of a juice box or a deck of cards. Insulin pumps are an alternative to multiple daily injection (MDI) insulin therapy (syringe or pen injections) for people with diabetes who require insulin to manage the condition.

We can simulate diabetic users with the OpenAI gym environment called SimGlucose which is based on the FDA-approved UVa/Padova Simulator (2008 version). It models 30 virtual patients (10 adolescents, 10 adults, and 10 children) who consume meals at random times, causing their blood glucose levels to rise. Besides the patient's age, our agent will only be able to observe the user's CGM reading, which is a noisy signal of their blood glucose levels. Minute-by-minute, the agent must administer the right level of basal insulin to keep the user's blood glucose levels in a healthy range.

For Type I diabetes in general, the healthy range for blood glucose levels is around 70-80 mg/dL. "Time in range" (TIR) is the percentage of time that the blood glucose level is in the healthy range, and the general target for insulin pumps is 70% TIR. For children, who have a higher risk of hypoglycemia, 70% TIR is quite challenging, and the main goal is to avoid hypoglycemia. In our example, we will only use the "adult" and "adolescent" patients.

Building active inference models for real-world example is more complex and challenging than other toy examples like agent navigation and the multi-armed bandit. As you will see below, there is a significant amount of extra overhead that we need to make the model performant. The model we will be building has the following form when represented as a factor graph:

Each variable and factor in this model will be explained in more detail in the model variables and model factors sections below.

The model file associated with this example is available below:

Imports

In order to build the insulin pump model we will need to allow our Genius agent to interface with the SimGlucose gym environment. The following imports pull in the components we need to create our Genius agent/model with the SDK and build the SimGlucose environment. We will also import matplotlib so that we can create plotting tools to visualize the results.

For reproducibility, please match the following versions:

genius-client-sdk=5.0.0
gymnasium>=0.29.1
gym>=0.9.4
numpy>=1.26.4
pyvfg=6.1.0
simglucose>=0.2.9
setuptools>=78.1.0
matplotlib>=3.8.0

import gymnasium as gym
import matplotlib.pyplot as plt
import numpy as np
import random

from datetime import datetime
from IPython import display

from genius_client_sdk.agent import GeniusAgent
from genius_client_sdk.auth import ApiKeyConfig
from genius_client_sdk.pomdp import POMDPModel
from genius_client_sdk.utils import control_map

from gymnasium.envs.registration import EnvSpec
from simglucose.actuator.pump import InsulinPump
from simglucose.controller.base import Action
from simglucose.patient.t1dpatient import T1DPatient
from simglucose.sensor.cgm import CGMSensor
from simglucose.simulation.env import T1DSimEnv
from simglucose.simulation.scenario_gen import RandomScenario

Building the simulation environment

Here we set up the ModernGlucoseEnv class which specifies the environment. This environment represents a simulated adult or adolescent patient.

def init_glucose_env(env_class, patient_age, patient_number):
    """
    Creates a SimGlucose environment for a given patient age and number.
    """

    assert patient_age in ["adolescent", "child", "adult"]
    assert patient_number in range(0, 10)
    # set number to a string with two digits
    patient_number = str(patient_number).zfill(2)

    try:
        env = env_class(patient_name=f'{patient_age}#0{patient_number}')
    except:
        raise "Use an Environment with gym 0.9.4 to use SimGlucose."
    return env

def get_env_name(patient_age, patient_number):
    return f"simglucose-{patient_age}-v{patient_number}"

def generate_glucose_gyms(env_class):
    """
    Generates a dictionary of SimGlucose environments for 10 patients of each 
    age group.
    """

    patient_type_list = ["adolescent", "adult"]
    
    GLUCOSE_GYMS = dict()
    for patient_age in patient_type_list:
        for patient_number in range(1, 10):
            env_name = get_env_name(patient_age, patient_number)
            GLUCOSE_GYMS[env_name] = init_glucose_env(env_class, patient_age, patient_number)
    return GLUCOSE_GYMS


def get_response_from_glucose_gym(action, GYM):
    """
    Executes an action in the SimGlucose environment and returns 
    the response.
    """

    obs, reward, done, info = GYM.step(float(action))
    info["CGM"] = float(obs[0])
    
    if "adolescent" in info["patient_name"]:
        info["patient_age"] = "adolescent"
    elif "child" in info["patient_name"]:
        info["patient_age"] = "child"
    else:
        info["patient_age"] = "adult"
    
    info["time"] = int(info["time"].strftime("%Y%m%d%H%M%S"))
    response = {
        "state": info,
        "reward": reward,
        "done": done,
        "action": 0,
    }
    return response


class ModernGlucoseEnv(gym.Wrapper):
    """
    A wrapper around the SimGlucose environment.
    """

    def __init__(self, patient_name):
        MAX_BG = 1000
        # Create base environment
        patient = T1DPatient.withName(patient_name)
        sensor = CGMSensor.withName('Dexcom')
        pump = InsulinPump.withName('Insulet')
        scenario = RandomScenario(start_time=datetime.now())
        env = T1DSimEnv(patient, sensor, pump, scenario)
        super().__init__(env)
        
        self.observation_space = gym.spaces.Box(
            low=0, high=MAX_BG, shape=(1,), dtype=np.float32
        )
        self.action_space = gym.spaces.Box(
            low=0, high=self.env.pump._params["max_basal"], shape=(1,), dtype=np.float32
        )
        # Create a proper spec
        self._spec = EnvSpec(
            id="glucose"
        )

    @property
    def spec(self):
        return self._spec

    def step(self, action):
        """
        Take one action in the environment and return the result.

        Though the reward is returned, we will not use it and instead
        use the CGM value as our reward.

        Returns:
            observation: The CGM observation from the environment.
            reward: The reward from the environment.
            done: Whether the episode is done.
            info: Additional information from the environment.
        """
        act = Action(basal=float(action), bolus=0)
        state = self.env.step(act)
        observation = np.array([state.observation.CGM], dtype=np.float32)
        return observation, state.reward, state.done, state.info

    def reset(self, seed=None, options=None):
        if seed is not None:
            np.random.seed(seed)
            
        state = self.env.reset()
        if "adolescent" in state.info["patient_name"]:
            patient_age = "adolescent"
        elif "child" in state.info["patient_name"]:
            patient_age = "child"
        else:
            # "adult" in info["patient_name"]:
            patient_age = "adult"
        # observation = np.array([state.observation.CGM], dtype=np.float32)
        observation = {"CGM": state.observation.CGM, "patient_age": patient_age}
        return observation, state.info

if "gyms" not in globals():
    # This line should only be run once per session, even if the cell is run multiple times.
    gyms = generate_glucose_gyms(ModernGlucoseEnv)

POMDP model helper functions

In order to build the POMDP model we need some helper functions to initialize the POMDP factors and a binning function which chunks continuous observations into discrete elements.

def create_normalized_random_tensor(shape: tuple) -> np.ndarray:
    """
    Creates a tensor of the given shape with random uniform values,
    normalized along the first axis to create a probability distribution.

    The parameters will be learned continuously by interacting with the environment.
    """
    tensor = np.random.uniform(size=shape)
    sum_ax0 = np.sum(tensor, axis=0, keepdims=True)
    # Use np.where to handle potential zero sums gracefully
    normalized_tensor = np.where(sum_ax0 == 0, 1.0 / shape[0], tensor / sum_ax0)
    normalized_tensor /= np.sum(normalized_tensor, axis=0, keepdims=True)
    return normalized_tensor


def create_normalized_identity_transition_tensor(shape: tuple) -> np.ndarray:
    """
    Creates a tensor of the given shape with identity values, normalized
    along the first axis to create a probability distribution.
    Assumes a 3d tensor.
    """
    tensor = np.eye(shape[0])
    L = shape[0]
    A = shape[2]
    tensor = np.broadcast_to(
        np.eye(L)[:, :, np.newaxis],
        (L, L, A)
    )
    return tensor

def create_initialized_likelihood_tensor(
    cgm_values: list[str],
    state1_values: list[str], # e.g., glucose_utilization (Insulin Effect)
    state2_values: list[str], # e.g., uncontrolled_glucose_dynamics (Baseline State)
    center_mode: float = 0.4, # Target mode for 'ideal' CGM (adjust based on cgm_values bins)
    baseline_influence: float = 0.3, # How much state2 shifts the mode (+/-)
    insulin_influence: float = -0.5, # How much state1 shifts the mode (should be negative)
    default_std: float = 1.5,
    default_skew: float = 0.0 # Skew: positive = tail to right, negative = tail to left
    ) -> np.ndarray:
    """
    Creates an initialized likelihood tensor P(CGM | State1, State2).

    Assumes State1 relates to insulin effect (higher index = more effect -> lower CGM)
    and State2 relates to baseline glucose factors (higher index = higher baseline -> higher CGM).

    Args:
        cgm_values: List of discretized CGM observation values (strings).
        state1_values: List of possible values for latent state 1 (strings).
        state2_values: List of possible values for latent state 2 (strings).
        center_mode: The target scaled_mode for CGM when states are 'neutral'.
        baseline_influence: Factor controlling how much state 2 shifts the mode.
        insulin_influence: Factor controlling how much state 1 shifts the mode (negative).
        default_std: Default standard deviation for the gaussian distributions.
        default_skew: Default skewness for the gaussian distributions.

    Returns:
        A numpy array of shape (len(cgm_values), len(state1_values), len(state2_values)).
    """
    n_cgm = len(cgm_values)
    n_s1 = len(state1_values)
    n_s2 = len(state2_values)

    likelihood_tensor = np.zeros((n_cgm, n_s1, n_s2))

    # Avoid division by zero if a state has only one value
    norm_factor_s1 = n_s1 - 1 if n_s1 > 1 else 1
    norm_factor_s2 = n_s2 - 1 if n_s2 > 1 else 1

    for s1_idx in range(n_s1):
        for s2_idx in range(n_s2):
            # Normalize indices to range [0, 1]
            # If only one state value exists, norm is 0.5 to represent 'medium'
            norm_s1 = s1_idx / norm_factor_s1 if n_s1 > 1 else 0.5
            norm_s2 = s2_idx / norm_factor_s2 if n_s2 > 1 else 0.5

            # Calculate the target mode for the CGM distribution based on state levels
            # State 2 shifts mode around the center: high s2 -> higher mode
            base_mode = center_mode + baseline_influence * (norm_s2 * 2 - 1)
            # State 1 shifts mode based on its level: high s1 -> lower mode
            mode_shift = insulin_influence * norm_s1
            
            target_mode = base_mode + mode_shift

            # Clip mode to avoid extreme values near 0 or 1
            target_mode = np.clip(target_mode, 0.05, 0.95)

            # Generate the probability distribution P(CGM | s1, s2)
            # You could potentially adjust std or skew here based on states too
            cgm_dist = dist_gaussian(
                N=n_cgm,
                scaled_mode=target_mode,
                std=default_std,
                # Use skew to potentially model faster drops vs rises, etc.
                # e.g., left_skew = default_std * (1 - default_skew),
                #       right_skew = default_std * (1 + default_skew)
            )

            likelihood_tensor[:, s1_idx, s2_idx] = cgm_dist

    return likelihood_tensor


def discretize_observation(obs: dict, observation_elements: dict[str, list]) -> dict:
    """
    Convert continuous observations to discrete elements.

    obs: A dictionary of continuous observations.
    observation_elements: A dictionary of variables and the bin edges to use for discretization.

    Returns: A dictionary of discrete observations.
    """
    for map_key, map_elements in observation_elements.items():
        if map_key in obs:
            obs[map_key] = map_elements[np.digitize(obs[map_key], [float(x) for x in map_elements[:-1]])]
    return obs

def dist_gaussian(N, scaled_mode=0.5, left_skew=None, right_skew=None, std=1.0):
    """
    Returns a length-N array shaped like a skewed discrete Gaussian.
    scaled_mode: float between 0 and 1 indicating relative position of peak in array (default: 0.5)
    left_skew: spread parameter for values less than mode (default: std)
    right_skew: spread parameter for values greater than mode (default: std)
    std: overall scale factor for the spread (default: 1.0)
    """

    # Handle edge cases for scaled_mode
    scaled_mode = np.clip(scaled_mode, 0.0, 1.0)  # Constrain to valid range

    if left_skew is None:
        left_skew = 1.0  # will be scaled by std
    if right_skew is None:
        right_skew = 1.0  # will be scaled by std

    # Ensure positive spread parameters
    left_skew = abs(left_skew)
    right_skew = abs(right_skew)
    std = abs(std)

    # Scale the skew values by std
    left_skew *= std
    right_skew *= std

    mode = scaled_mode * (N - 1)
    i_vals = np.arange(N)

    # Use different spreads for left and right sides of the mode
    raw = np.zeros(N)
    left_mask = i_vals <= mode
    right_mask = i_vals > mode

    # Calculate exponential terms separately for left and right sides
    raw[left_mask] = np.exp(-0.5 * ((i_vals[left_mask] - mode) / left_skew) ** 2)
    raw[right_mask] = np.exp(-0.5 * ((i_vals[right_mask] - mode) / right_skew) ** 2)

    total = raw.sum()
    if total > 0:
        return raw / total
    else:
        return np.ones(N) / N


def create_modulated_tensor(shape: tuple, std: float = 1.0, max_shift_effect: float = 0.2) -> np.ndarray:
    """
    Creates a 3D tensor P(Dim0 | Dim1, Dim2).

    The value P(Dim0=i | Dim1=j, Dim2=k) is derived from a Gaussian function
    whose mode depends on the target index 'i' and is modulated by the index 'k'.
    The final tensor is normalized such that Sum_i T[i, j, k] = 1 for all j, k.

    Args:
        shape: A tuple (N0, N1, N2).
        std: Standard deviation for the Gaussian distributions used internally.
        max_shift_effect: Controls how much the index 'k' (Dim2) can shift the mode.
                          The shift ranges proportionally from -max_shift_effect (k=0)
                          to +max_shift_effect (k=N2-1), being zero at the midpoint.

    Returns:
        A numpy array of the specified shape, where sum over axis 0 is 1.0 for each [:, j, k].
    """
    if len(shape) != 3:
        raise ValueError(f"Shape must be a 3-tuple (N0, N1, N2), got {shape}")

    N0, N1, N2 = shape
    tensor = np.zeros(shape)

    # Avoid division by zero if only one state or action
    norm_factor_0 = N0 - 1 if N0 > 1 else 1
    norm_factor_2 = N2 - 1 if N2 > 1 else 1

    # Generate the tensor slice by slice for the third dimension (k)
    for k in range(N2):
        # Calculate shift factor for this k: ranges from -1 to 1
        mod_factor = ((k / norm_factor_2) * 2 - 1) if N2 > 1 else 0.0
        shift = mod_factor * max_shift_effect

        # Create the intermediate N0 x N1 matrix for this k
        # This matrix holds the unnormalized values derived from Gaussian distributions
        intermediate_matrix_k = np.zeros((N0, N1))
        for i in range(N0): # Loop over the first dimension index (target)
            # Base mode depends on i (normalized index of the first dimension)
            base_mode = i / norm_factor_0 if N0 > 1 else 0.5
            # Apply the shift based on k
            final_mode = np.clip(base_mode + shift, 0.0, 1.0)

            # Generate the i-th row: a Gaussian distribution of length N1,
            # with the calculated mode.
            row_dist = dist_gaussian(
                N=N1,
                scaled_mode=final_mode,
                std=std,
            )
            intermediate_matrix_k[i, :] = row_dist

        # Normalize the columns (axis 0) of the intermediate matrix
        # This ensures Sum_i M[i, j] = 1 for each j
        matrix_sum_axis0 = np.sum(intermediate_matrix_k, axis=0, keepdims=True)

        # Avoid division by zero if a column sums to 0
        normalized_matrix_k = np.divide(intermediate_matrix_k, matrix_sum_axis0,
                                        out=np.zeros_like(intermediate_matrix_k),
                                        where=matrix_sum_axis0 != 0)

        # Handle columns that summed to zero (assign uniform probability if N0 > 0)
        # This might happen if std is extremely small and mode is outside [0,1] range
        # before clipping, though dist_gaussian should handle most cases.
        zero_sum_mask = (matrix_sum_axis0 == 0)[0, :] # Get mask for columns j
        if N0 > 0 and np.any(zero_sum_mask):
            uniform_val = 1.0 / N0
            normalized_matrix_k[:, zero_sum_mask] = uniform_val

        # Assign the normalized matrix to the k-th slice of the final tensor
        tensor[:, :, k] = normalized_matrix_k

    return tensor

Now we have all the pieces we need to build the POMDP Model. In preparation for the next few steps, where we will build the model, let's initialize the POMDPModel class.

pump = POMDPModel()

Model variables

Actions

The POMDP has a single action variable: basal_insulin which corresponds to the insulin dose to be administered by the insulin pump. For this variable we will restrict the range of the insulin dose to a small range of values commonly used in insulin pumps. The insulin dose will be administered over the range of 0 to 0.06 units per minute which specify the discrete categories for this variable.

POMDP models require that transition factors depend on either another latent state or an action. For this reason, we will also include a dummy action variable called constant which fulfills the action dependency requirement for the transition factor but otherwise has no effect on the model itself.

The following code adds the action variables to the POMDP model.

# Define the variables in the model, with a string for each value the variable can take.
# We use strings because Genius requires the values of variables to be strings.

# The agent's action is to administer a basal insulin dose.
# A normal basal insulin dose is 0.008 - 0.033 units/min.
ACTION_VALUES_BASAL_INSULIN = [str(x) for x in [0, .008, .016, .032, .06]]
pump.add_action_variable(name="basal_insulin", values=ACTION_VALUES_BASAL_INSULIN)

# For a POMDP, we need all latent states to have either an action or another latent state as a parent.
# This is a dummy action that does nothing but be a parent for exogenous latent variables.
ACTION_VALUES_CONSTANT = ["constant"]
pump.add_action_variable(name="constant", values=ACTION_VALUES_CONSTANT)

Observations

There are two observations in this model which are available to the insulin pump Genius agent in 1-minute timestep intervals.

CGM - The continuous glucose monitoring (CGM) reading of the glucose levels. Glucose levels will change over the course of the simulation so that the insulin pump agent can administer the correct insulin levels to keep glucose in the correct range. Since CGM is a continuous variable its values will need to be binned. After binning, the categories range from 40-600 mg/dL at roughly 20 mg/dL increments.
patient_age - The age of the patient. This variable will remain constant throughout the simulation. There are two categories: "adult" or "adolescent".

The following code adds the observation variables to the POMDP model.

# The agent observes the patient's CGM reading and age group.
# ideal range is 70-180 mg/dL, but it can go as high as 600 mg/dL.
# Here are some values that are broken into fixed-width bins.
OBSERVATION_VALUES_CGM_LEVELS = 20
OBSERVATION_VALUES_CGM = [str(round(x)) for x in np.linspace(40, 600, OBSERVATION_VALUES_CGM_LEVELS)]

print("Observation values:")
print(OBSERVATION_VALUES_CGM)
pump.add_observation_variable(name="CGM", values=OBSERVATION_VALUES_CGM)

# The patient's age group is a categorical variable with 3 levels.
OBSERVATION_VALUES_PATIENT_AGE = ["adolescent", "adult"]
pump.add_observation_variable(name="patient_age", values=OBSERVATION_VALUES_PATIENT_AGE)

States

There are three hidden (latent) states of interest in this model:

glucose_utilization - The amount of glucose the patient's body consumes over time. This matters because glucose needs for children and adults are different and it is important that we consider how we expect the levels to change given the type of person we are modeling. There are few ordinal categories for glucose utilization ranging from 0 to 4.
uncontrolled_glucose_dynamics - The change in glucose levels in the body over time. By expressing this hidden state variable in our model we can infer how glucose levels will change over time. This change is represented in discrete categories ranging from 0 to 14.
latent_patient_age - The patient's age. Even though we directly observe the patient's age we need to include it in the model because it has an effect on glucose dynamics. There are two categories: "adult" or "adolescent".

The following code adds the hidden state variables to the POMDP model.

# A state for the patient's glucose utilization.
STATE_VALUES_GLUCOSE_UTILIZATION = [str(x) for x in range(5)]
pump.add_state_variable(name="glucose_utilization", values=STATE_VALUES_GLUCOSE_UTILIZATION)

# A state for the patient's glucose dynamics.
STATE_VALUES_UNCONTROLLED_GLUCOSE_DYNAMICS = [str(x) for x in range(15)]
pump.add_state_variable(name="uncontrolled_glucose_dynamics", values=STATE_VALUES_UNCONTROLLED_GLUCOSE_DYNAMICS)

# A state for the patient's age (doesn't change in an episode)
# Even though we observe this directly, we still add a latent state for it,
# because that state has an effect on the glucose dynamics and in turn the CGM.
STATE_VALUES_PATIENT_AGE = ["adolescent", "adult"]
pump.add_state_variable(name="latent_patient_age", values=STATE_VALUES_PATIENT_AGE)

Model factors

Preferences

The POMDP will have a preference that the CGM stays within the ideal range of 70-180 mg/dL. The figure below shows this relationship.

Here we see the individual discrete glucose measurements and the assigned preference level where 0 would be considered a neutral preference (lines added between discrete points to help make the trend easier to see). We focus in this model primarily on preventing hypoglycemia which occurs at dangerously low blood sugar levels.

The following code adds the preference factor to the POMDP model.

# The agent prefers to keep CGM within a certain range.
# We use a gaussian distribution to reward the model for 
# being in the center of the range.
preferences = dist_gaussian(
    N=len(OBSERVATION_VALUES_CGM),
    scaled_mode=0.3,
    right_skew=.1,
    std=1.0,
)
# Subtracting a small value creates a penalty for being outside the range.
TENSOR_CGM_PREFERENCES = np.array([x -.01 for x in preferences])


# Values at the extremes of the range should be penalized more heavily.
# This produces a reward gradient for the model to follow back
# to where it should be.
max_idx = np.argmax(TENSOR_CGM_PREFERENCES)

# For each negative value, multiply by distance from max
for i in range(len(TENSOR_CGM_PREFERENCES)):
    if TENSOR_CGM_PREFERENCES[i] < 0:
        distance = i - max_idx
        if distance < 0:
            # heavier penalty for hypoglycemia
            TENSOR_CGM_PREFERENCES[i] *= distance * -20
        else:
            TENSOR_CGM_PREFERENCES[i] *= distance

print("Preference values:")
print([round(x, 2) for x in TENSOR_CGM_PREFERENCES])

pump.add_preference_factor(
    values=np.array(TENSOR_CGM_PREFERENCES),
    target="CGM",
)

Transitions

The transition factors in this model specify how the hidden state variables in our model change over time.

Glucose utilization - This transition models how the glucose_utilization hidden state variable changes over time. It is initialized as a discrete Gausian distribution for each column with a unit standard deviation.
Uncontrolled glucose dynamics - This transition models how the uncontrolled glucose dynamics hidden state variable changes over time. It is initialized randomly (while ensuring columns sum to one).
Patient age - This transition models how the patient_age hidden state variable changes over time. It is initialized as an identity matrix.

The following code adds the transition factors to the POMDP model.

# A transition factor represents how a variable changes over time,
# given all the things that influence it (its parents). Since the
# next state of a variable is a function of its current state and its
# parents' states, the variable has itself as its first parent.

# In this version of Genius, every transition factor must be 3-dimensional.
# The first dimension is the conditioned variable (the "downstream" variable),
# which is the "output" of all the parents.
# The rest of the dimensions are the parents, and the first parent must also
# be the conditioned variable, as discussed above. That only leaves one dimension
# for another parent, which can be an action or another latent state.

# In a future version of Genius, we'll be able to use higher-dimensional
# transition factors, allowing us to represent more complex relationships.

TENSOR_GLUCOSE_UTILIZATION = create_modulated_tensor(
    shape=(
        len(STATE_VALUES_GLUCOSE_UTILIZATION),
        len(STATE_VALUES_GLUCOSE_UTILIZATION),
        len(ACTION_VALUES_BASAL_INSULIN),
    )
)       
pump.add_transition_factor(
    values=TENSOR_GLUCOSE_UTILIZATION,
    target="glucose_utilization",
    parents=[
        "glucose_utilization",
        "basal_insulin",
    ],
)

TENSOR_UNCONTROLLED_GLUCOSE_DYNAMICS = create_normalized_random_tensor(
    shape=(
        len(STATE_VALUES_UNCONTROLLED_GLUCOSE_DYNAMICS),
        len(STATE_VALUES_UNCONTROLLED_GLUCOSE_DYNAMICS),
        len(STATE_VALUES_PATIENT_AGE),
    )
)
pump.add_transition_factor(
    values=TENSOR_UNCONTROLLED_GLUCOSE_DYNAMICS,
    target="uncontrolled_glucose_dynamics",
    parents=[
        "uncontrolled_glucose_dynamics",
        "latent_patient_age",
    ],
)

# Patient age is a constant across an episode,
# so we can represent it as an identity matrix.
# We then add a third dimension for the action variable.
TENSOR_PATIENT_AGE = create_normalized_identity_transition_tensor(
    shape=(
        len(STATE_VALUES_PATIENT_AGE),
        len(STATE_VALUES_PATIENT_AGE),
        len(ACTION_VALUES_CONSTANT),
    )
)
pump.add_transition_factor(
    values=TENSOR_PATIENT_AGE,
    target="latent_patient_age",
    parents=[
        "latent_patient_age",
        "constant",
    ],
)

Likelihoods

The likelihood factors represent how likely an observation (data) is given the hidden states that influence it. We will initialize the likelihoods with random values which will be learned over the course of the simulation using the available data.

CGM - This likelihood models the relationship between CGM observation and the glucose_utilization, uncontrolled_glucose_dynamics , and latent_patient_age hidden states. It is initialized randomly (while ensuring columns sum to one).
Patient age - This likelihood models the relationship between the patient_age observation and the glucose_utilization, uncontrolled_glucose_dynamics , and latent_patient_age hidden states. It is initialized randomly (while ensuring columns sum to one).

The following code adds the likelihood factors to the POMDP model.

# A likelihood factor represents how likely an observation is,
# given the hidden states that influence it. This allows the POMDP
# to reason backwards to infer the most likely hidden states.

# In the current version of Genius, a likelihood factor must be conditioned
# an all the latent state variables. In a future version, we'll be able to
# condition on only some of the latent state variables, providing "sparse"
# likelihood factors that more faithfully represent the causal relationships.

# A likelihood factor for the CGM reading, which is 
# a function of the glucose utilization, endogenous insulin production,
# and patient age.
TENSOR_CGM = create_normalized_random_tensor(
    shape=(
        len(OBSERVATION_VALUES_CGM),
        len(STATE_VALUES_GLUCOSE_UTILIZATION),
        len(STATE_VALUES_UNCONTROLLED_GLUCOSE_DYNAMICS),
        len(STATE_VALUES_PATIENT_AGE),
    )
)
pump.add_likelihood_factor(
    values=TENSOR_CGM,
    target="CGM",
    parents=[
        "glucose_utilization",
        "uncontrolled_glucose_dynamics",
        "latent_patient_age",
    ],
)

# a likelihood factor for the patient's age, which is an indicator 
# of the endogenous insulin production.
TENSOR_PATIENT_AGE = create_normalized_random_tensor(
    shape=(
        len(OBSERVATION_VALUES_PATIENT_AGE),
        len(STATE_VALUES_GLUCOSE_UTILIZATION),
        len(STATE_VALUES_UNCONTROLLED_GLUCOSE_DYNAMICS),
        len(STATE_VALUES_PATIENT_AGE),
    )
)
pump.add_likelihood_factor(
    values=TENSOR_PATIENT_AGE,
    target="patient_age",
    parents=[
        "glucose_utilization",
        "uncontrolled_glucose_dynamics",
        "latent_patient_age",
    ],
)

Loading the model to the Genius agent

Now we can load this model we created into a Genius agent. Please be sure to set the correct URL to your agent and API key.

agent = GeniusAgent(
    agent_http_protocol=<my_agent_http_protocol>,
    agent_hostname=<my_agent_hostname>,
    agent_port=<my_agent_port>,
    auth_config=ApiKeyConfig(api_key=<my_api_key>)
)

agent.load_genius_model(model=pump)
agent.log_model()

Inference with continual learning

It's time to put our model to the test. We'll run through an episode with each patient, and see how well our model improves over time. First, we set some initial parameters of the simulation.

MAX_EPISODE_LENGTH - Corresponding to eight hours, this is the length of time over which the insulin pump will be active in the simulation.
SOJOURN_TIME - This is the total time the agent spends in a particular state.
POLICY_LEN - This is the length of the policy the agent will consider. Since the policy length is set at 1, the agent will only lookahead one action into the future.

By default we will learn every 100 episode steps. In general, it is good practice to avoid learning at every time step. Rather, active inference models perform better when learning occurs after some amount of time steps have passed. In this case, learning occurs at 20% intervals of the total episode length.

MAX_EPISODE_LENGTH = 500 # about 8 hours
SOJOURN_TIME = 1
POLICY_LEN = 1

# Tracking variables
total_steps_across_episodes = 0
total_episodes_completed = 0
total_survival_steps = 0
total_optimal_steps = 0

We will use two metrics of success:

Survival: How many episodes can the agent get through without risk reaching 100?
Time in range (TIR): What percentage of the time does the agent spend in the CGM range of 100-180 mg/dL

We should expect both metrics to improve as the agent learns over time. Additionally, the likelihoods will be learned over time continuously from the observed data.

Everything needed to actually run the agent is inside the section with the comment ACTION LOOP. The remainder of the code shown below is simply for plotting purposes and tracking statistics.

for gym_name, gym in random_gyms:
    
    # Get observation from environment 
    obs, state_info = gym.reset()
    obs = {"CGM": obs["CGM"]} # exclude patient_age
    obs = discretize_observation(obs, {"CGM": OBSERVATION_VALUES_CGM})
    step = 0

    # Begin interactive plotting
    plt.ioff() # Turn off interactive mode
    fig, ax = plt.subplots(3, 1, figsize=(10, 9), sharex=True)
    fig.suptitle(f"Gym: {gym_name} (Episode {total_episodes_completed + 1})")

    # Setup subplots
    ax[0].set_ylabel("Value")
    ax[0].set_title("Risk and CGM (Discretized) over Time")
    ax[0].grid(True)
    line_risk, = ax[0].plot([], [], label='Risk')
    line_cgm, = ax[0].plot([], [], label='CGM (Discretized)')
    line_cgm_raw, = ax[0].plot([], [], label='CGM (Raw)')
    ax[0].legend(loc='upper left')
    meal_lines_collection = []

    ax[1].set_ylabel("Insulin Action")
    ax[1].set_title("Action (Basal Insulin) over Time")
    ax[1].grid(True)
    line_action, = ax[1].plot([], [], marker='.')
    # Set fixed y-axis for insulin action
    max_insulin_value = max(float(x) for x in ACTION_VALUES_BASAL_INSULIN) if ACTION_VALUES_BASAL_INSULIN else 1 # Handle empty list case
    ax[1].set_ylim(-0.05 * max_insulin_value, max_insulin_value * 1.05) # Add 5% padding

    ax[2].set_ylabel("Score (%)")
    ax[2].set_title("Episodes Survived and Current TIR")
    ax[2].set_ylim(0, 105)
    ax[2].grid(True)
    line_optimality, = ax[2].plot([], [], label='Current TIR (%)', color='g')
    score_text = ax[2].text(0.02, 0.95, "", transform=ax[2].transAxes, verticalalignment='top')
    ax[2].set_xlabel("Step")
    # Add legend to the third subplot
    ax[2].legend(loc='lower left')

    # Data storage for plotting
    steps_list = []
    cgm_values_discrete = []
    cgm_values_raw = []
    risk_values = []
    action_values = []
    meal_steps = []
    optimal_steps_count_episode = 0
    optimality_perc_list = [] # <-- Added list for optimality percentage

    # Display initial plot structure
    display_handle = display.display(fig, display_id=True)
    plt.ion() # Turn interactive mode back on

    learn_history = []
    
    # Begin main loop
    while state_info["risk"] < 100 and step < MAX_EPISODE_LENGTH:

        # --------------------------- ACTION LOOP ---------------------------
        if step % SOJOURN_TIME == 0:
            control_result = agent.act(
                observation=obs,
                policy_len=POLICY_LEN,
                learn_likelihoods=True,
                # learn_transitions=True,
            )
            action = control_map(
                controls=ACTION_VALUES_BASAL_INSULIN,
                chosen_action=control_result["action_data"]["basal_insulin"]["selected_action"],
            )
        response = get_response_from_glucose_gym(action, gym)
        learn_history.append(obs)

        # Capture original CGM before discretization for optimality check
        original_cgm = response['state']['CGM']

        # Now discretize the state
        response["state"] = discretize_observation(response["state"], {"CGM": OBSERVATION_VALUES_CGM})
        obs = {"CGM": response["state"]["CGM"]}

        # --------------------------- / ACTION LOOP ---------------------------

        # Store history data
        steps_list.append(step)
        current_cgm_discrete = response['state']['CGM']
        cgm_values_discrete.append(float(current_cgm_discrete))
        cgm_values_raw.append(float(original_cgm))
        risk_values.append(response['state']['risk'])
        action_values.append(float(action))
        if response['state']['meal'] > 0:
            meal_steps.append(step)

        # Check for optimal range using the original CGM value
        if 99 <= original_cgm <= 180:
             optimal_steps_count_episode += 1
        current_optimality_perc = (optimal_steps_count_episode / (step + 1)) * 100 if step >= 0 else 0
        optimality_perc_list.append(current_optimality_perc)

        line_risk.set_data(steps_list, risk_values)
        line_cgm.set_data(steps_list, cgm_values_discrete)
        line_cgm_raw.set_data(steps_list, cgm_values_raw)
        line_action.set_data(steps_list, action_values)
        line_optimality.set_data(steps_list, optimality_perc_list)

        meal_lines_collection = ax[0].vlines(meal_steps, ymin=ax[0].get_ylim()[0], ymax=ax[0].get_ylim()[1], color='r', linestyle='--', lw=0.8) # No label here

        # Plotting details to rescale axes for interactive plot
        ax[0].relim()
        ax[0].autoscale_view()
        ax[1].relim()
        ax[1].autoscale_view(scalex=True, scaley=False)
        # Autoscale x-axis for the third plot, y-axis is fixed
        ax[2].relim()
        ax[2].autoscale_view(scalex=True, scaley=False) # <-- Rescale ax[2] x-axis

        # Metrics
        current_survival_perc = (step + 1) / MAX_EPISODE_LENGTH * 100 # Still needed for display below if desired, but not plotted
        avg_optimality_perc = (total_optimal_steps / total_steps_across_episodes * 100) if total_steps_across_episodes > 0 else 0.0
        avg_survival_perc = (total_survival_steps / total_episodes_completed / MAX_EPISODE_LENGTH * 100) if total_episodes_completed > 0 else 0.0

        score_str = (
            f"Avg Survival (Prev Episodes): {avg_survival_perc:.1f}%\n"
            f"Avg Time in Range (Prev Episodes): {avg_optimality_perc:.1f}%"
        )
        score_text.set_text(score_str)
        ax[2].legend(loc='lower left')

        # Plotting canvas update
        fig.canvas.draw()
        display_handle.update(fig)

        state_info = response["state"]
        step += 1

    final_step_count = step

    if meal_steps:
        ax[0].plot([], [], color='r', linestyle='--', label='Meal', lw=0.8)
    ax[0].legend(loc='upper left')

    fig.suptitle(f"Gym: {gym_name} (Episode {total_episodes_completed + 1}) - Ended at step {final_step_count}/{MAX_EPISODE_LENGTH}")
    display_handle.update(fig)

    plot_filename = f"out/{TIMESTAMP}/episode_{total_episodes_completed + 1}_{gym_name}.png"
    fig.savefig(plot_filename)
    print(f"Saved plot to {plot_filename}")
    plt.close(fig)
    
    total_steps_across_episodes += final_step_count
    total_episodes_completed += 1
    total_survival_steps += final_step_count
    total_optimal_steps += optimal_steps_count_episode

Finally, we can print the summary of the simulation:

print("\n--- Simulation Summary ---")
print(f"Total Episodes Completed: {total_episodes_completed}")
print(f"Total Steps Across All Episodes: {total_steps_across_episodes}")
if total_episodes_completed > 0:
    avg_steps = total_survival_steps / total_episodes_completed
    avg_survival_score = avg_steps / MAX_EPISODE_LENGTH * 100
    print(f"Average Steps Survived per Episode: {avg_steps:.2f}")
    print(f"Overall Average Survival Score: {avg_survival_score:.2f}%")
if total_steps_across_episodes > 0:
    overall_optimality_score = total_optimal_steps / total_steps_across_episodes * 100
    print(f"Overall Average Optimality Score (CGM 70-180): {overall_optimality_score:.2f}%")
else:
    print("No steps were taken across episodes.")
if total_episodes_completed == 0:
     print("No episodes were completed.")

Interpreting the results

According to our simulation summary:

--- Simulation Summary ---
Total Episodes Completed: 18
Total Steps Across All Episodes: 8756
Average Steps Survived per Episode: 486.44
Overall Average Survival Score: 97.29%
Overall Average Optimality Score (CGM 70-180): 38.07%

In total we ran 18 episodes or trials with a total of 8756 steps over all episodes. Agents survived for an average 486.44/500 steps with an overall survival score of 97.29%. Finally, the optimality score, specifying the time the simulated patient stayed in range, was 38.07%. Let's examine one of the results for more details:

The top panel shows the glucose levels as a function of time steps. Here we have overlaid the continuous glucose signal (green) over the discretized version of the signal (orange) that we fed to the model as observations. The dotted red lines indicate when a meal was eaten. As expected, when meals are eaten, there is a spike in blood sugar. It is this spike that the insulin pump needs to correct to bring glucose down to the appropriate range.

The middle panel shows the insulin released by the pump as a consequence of active inference. We see that the amount of insulin released increases to saturation after each meal which causes glucose levels to eventually decrease into acceptable ranges.

The bottom panel shows the percentage of time across the entire simulation that blood glucose remains in the correct range.

PreviousMulti-armed bandit NextCSV data format

Last updated 9 months ago

hashtagImports

hashtagBuilding the simulation environment

hashtagPOMDP model helper functions

hashtagModel variables

hashtagActions

hashtagObservations

hashtagStates

hashtagModel factors

hashtagPreferences

hashtagTransitions

hashtagLikelihoods

hashtagLoading the model to the Genius agent

hashtagInference with continual learning

hashtagInterpreting the results

Imports

Building the simulation environment

POMDP model helper functions

Model variables

Actions

Observations

States

Model factors

Preferences

Transitions

Likelihoods

Loading the model to the Genius agent

Inference with continual learning

Interpreting the results