Food delivery
This example using Genius to predict delivery acceptance behavior among food delivery workers. The Bayesian network will allow us to understand the complex interrelationships between various factors affecting a delivery worker's decision to accept or decline delivery offers. This example showcases the capabilities of Genius (both the model editor and SDK) in the following areas:
Continual learning
Learning in the presence of latent variables
Delivery acceptance decisions are influenced by many different variables. We will utilize the following variables in our model:
time_of_day
[early_morning, morning, midday, afternoon, evening, night]
When the delivery request was made
maintenance
[needed, not_needed]
Whether the vehicle needs maintenance
availability
[available, unavailable]
Delivery worker's general availability status
courier_experience
[novice, experienced]
Experience level of the delivery worker
trip_distance
[short, medium, long]
Estimated delivery distance
courier_customer_distance
[close, far]
Distance between the delivery worker and the pickup location
trip_efficiency
[low, medium, high]
Expected profitability of the trip
delivery_acceptance
[rejected (0), accepted (1)]
Whether the delivery was accepted or rejected
The model we will build to solve this problem will make the following assumptions:
time_of_dayandmaintenanceinfluenceavailabilitytrip_distanceandcourier_customer_distancedeterminetrip_efficiencyavailability,courier_experience, andtrip_efficiencydirectly affectdelivery_acceptance
Putting the above information together produces the following Bayesian network:

Using this model we will now perform continual learning and latent variable inference.
The model files and sample data associated with this example are provided below:
trip_efficiency unobservedBuilding the model
This section will show how to build the model in the Python SDK and the model editor.
Although it is possible to build the model in the model editor using the data-to-model wizard, we will instead build the model by hand in this tutorial. First, you must build the model in the editor by adding variable and factor nodes/edges to the editing canvas. Here is the resulting model:

Next, click on each variable and manually add the categories according to the table at the beginning of this tutorial. If you examine the factor probabilities you will see that they have been set automatically to discrete uniform.
The first step is to import the Genius model from the Python SDK:
Next, we build the model by adding the variables and factors. For factor probabilities, we will initialize all values with a discrete uniform distribution.
Our model is now prepared.
Continual learning
Continual learning refers to the idea that a model can be updated upon presentation of new data. For example, let's suppose that we collect 5000 samples of food delivery data for the above eight variables. Using this we can learn the parameters of our model ("training"). Now suppose that a few months later we want to make sure that our model is still up to date in case anything has changed in the real world. We can collect 5000 more samples and retrain the model to learn new parameter. Learning in this way, continuously, allows us to update the model taking into account the new information. We now demonstrate continual learning in both the model editor and Python SDK.
To perform continual learning the model editor, we merely need to train the model repeatedly with different batches of samples. Navigate to the menu Model > Train. A prompt will appear which will allow you to upload data. Upload the first batch which is available in the model file and sample data section of this tutorial. After training (parameter learning) the model probabilities will have changed from their initialization.
Suppose that some time has passed and another batch of data became available. To perform continual learning, you would simply train the model again with this new dataset. Each time a new batch of data is available the model can be trained again to learn the parameters.
First we import the necessary components. Remember to set your API key. Assume that the sample CSV dataset is titled bayesian_network_samples.
Next we initialize the Genius agent and load the model we created in the previous section of this tutorial.
First, we load the data and split it into 4 batches. This is meant to replicate the idea that we may have multiple batches of data available at different times.
Next, we will feed each batch to the agent for learning (training) and pull out the factor probabilities after training is complete on each batch. The first loop in the code below does not require learning because we are just extracting the initial state of the factor probabilities that we specified when we built the model in the previous section of this tutorial.
Although the code below has some extra details that will enable us to gather the data for plotting later, the key point is that each time we call agent.learn() with the same agent but pass in a different CSV, the factor probabilities will be updated.
The results variable now contains the results of learning which we will analyze below.
Preparing the results for visualization
Next let's analyze the results by looking at a visualization. If you are just interested in the results you can skip this subsection.
Below is the code used to prepare the data for graphing. First we create a graph_data dictionary that initializes empty arrays that we will use to get the data in a convenient format for graphing. Then we loop over our results and add the correct results to the graph_data dict. This is necessary because the results are in terms of batches but we want to plot in terms of the different factors.
Conditional distributions are flattened into a 1-dimensional array so we can plot all the parameters in two-dimensions.
We will also define the ground truth probabilities. These are the true probabilities of the model. Normally this information would not be available but in this teaching example, we have designed the model with specific probabilities in mind.
Now we plot the data
Analyzing the results

This figure shows each factor and how the parameter probabilities change across batches. For example, "time of day" has three parameter so the three lines show how each parameter changes from initialization to each presented batch of data. For the last three factors, which are multi-dimensional, we plot each individual parameter by flattening the factor to 1 dimension. The black circles denote the ground truth values.
The results indicate that all parameters start at the specific initialization values and quickly, through learning, converge upon the correct values as more data is presented. Since the data does not change much from batch to batch, the lines are relatively flat after the first batch. If the data changed drastically, which could be the case in the real world, continual learning would automatically adjust in response to obtain the correct parameters.
Latent variable learning
Latent variables refer to variables in the model for which no data is available. In this sense, latent variables are "unobservable" or "hidden". We will use the a special version of the food delivery dataset where the trip_efficiency variable is unavailable. This dataset is provided in the model file and sample data section below.
Below, we demonstrate how latent variable learning is done in both the model editor and Python SDK:
Using the model we created above, we click on the trip_efficiency variable node and change the Role of the variable to "hidden" as shown in the image below:

Now when we perform parameter learning (training), Genius will automatically attempt to learn the parameters of the factor despite lacking the data.
First, we recreate the model from the building the model section with the following difference:
Here we let Genius know that the trip_efficiency variable is a latent variable. Now that this variable is specified as latent, we merely need to call learn with the dataset missing the trip_efficiency variables.
Genius will now automatically attempt to learn the probabilities of this variable despite lacking data available. Since learning the presence of latent variable is more complex than learning in the fully observable case, it will take longer.
Analyzing the results
Let's examine the root mean squared error between the parameter estimates and the true probabilities. We see an RMSE of 0.158. This means that, on average, the parameters were estimated with an error 0.158 probability.
It is important to note that latent variable inference will always be approximate process and never guaranteed to converge upon the true probabilities. The more data that is available for variables interacting with these factors, the more accurate the stimation will be.
Although we were able to compare the estimate to the true probabilities in this example, note that this is not actually possible in a real world scenario. In this example we started with a complete dataset and simply removed the trip_efficiency column which enabled us to compare to the actual values.
Last updated