In this chapter and beyond, we will be using the term ‘event logging’ to describe the process of generating a step-by-step log of what happens to each entity as they pass through our system. The resulting file will be an ‘event log’.
But why bother with event logging? In previous chapters, we have managed to record a range of useful metrics from our simulation, and used these to display the performance of our model with different sets of parameters. In the debugging chapters, we have explored how to track down issues in our simulation with a combination of print statements and the logging module. So what else is there left to do?
Event logging can be a way to tackle both of the above in a robust, efficient and reusable way - while also opening up the use of a wide range of reusable boilerplate code for visualising, animating and formally testing our simulation outputs.
Tip
While this chapter - and the rest of the book - will make use of a particular structure for our event log, this isn’t the only way you can set up an event log. However, this structure has been tested to work well across a range of different models, and interfaces well with the vidigi package for animated simulation outputs.
30.1 A sample event log
First, it may be helpful to see what an event log looks like.
A sample log is displayed below:
import pandas as pdevent_log = pd.read_csv("resources/sample_event_log.csv")event_log
entity_id
pathway
event_type
event
time
resource_id
run
0
1
Simplest
arrival_departure
arrival
0.000000
NaN
0
1
1
Simplest
queue
treatment_wait_begins
0.000000
NaN
0
2
1
Simplest
resource_use
treatment_begins
0.000000
1.0
0
3
2
Simplest
arrival_departure
arrival
3.399660
NaN
0
4
2
Simplest
queue
treatment_wait_begins
3.399660
NaN
0
5
2
Simplest
resource_use
treatment_begins
3.399660
2.0
0
6
3
Simplest
arrival_departure
arrival
8.497645
NaN
0
7
3
Simplest
queue
treatment_wait_begins
8.497645
NaN
0
8
3
Simplest
resource_use
treatment_begins
8.497645
3.0
0
9
4
Simplest
arrival_departure
arrival
8.596678
NaN
0
10
4
Simplest
queue
treatment_wait_begins
8.596678
NaN
0
11
4
Simplest
resource_use
treatment_begins
8.596678
4.0
0
12
5
Simplest
arrival_departure
arrival
8.608025
NaN
0
13
5
Simplest
queue
treatment_wait_begins
8.608025
NaN
0
14
6
Simplest
arrival_departure
arrival
11.359739
NaN
0
15
6
Simplest
queue
treatment_wait_begins
11.359739
NaN
0
16
7
Simplest
arrival_departure
arrival
19.509442
NaN
0
17
7
Simplest
queue
treatment_wait_begins
19.509442
NaN
0
18
8
Simplest
arrival_departure
arrival
22.877356
NaN
0
19
8
Simplest
queue
treatment_wait_begins
22.877356
NaN
0
20
9
Simplest
arrival_departure
arrival
26.653863
NaN
0
21
9
Simplest
queue
treatment_wait_begins
26.653863
NaN
0
22
1
Simplest
resource_use_end
treatment_complete
40.317385
1.0
0
23
1
Simplest
arrival_departure
depart
40.317385
NaN
0
24
5
Simplest
resource_use
treatment_begins
40.317385
1.0
0
25
10
Simplest
arrival_departure
arrival
40.737793
NaN
0
26
10
Simplest
queue
treatment_wait_begins
40.737793
NaN
0
27
2
Simplest
resource_use_end
treatment_complete
42.443230
2.0
0
28
2
Simplest
arrival_departure
depart
42.443230
NaN
0
29
6
Simplest
resource_use
treatment_begins
42.443230
2.0
0
30
4
Simplest
resource_use_end
treatment_complete
48.809628
4.0
0
31
4
Simplest
arrival_departure
depart
48.809628
NaN
0
32
7
Simplest
resource_use
treatment_begins
48.809628
4.0
0
33
3
Simplest
resource_use_end
treatment_complete
51.483457
3.0
0
34
3
Simplest
arrival_departure
depart
51.483457
NaN
0
35
8
Simplest
resource_use
treatment_begins
51.483457
3.0
0
36
11
Simplest
arrival_departure
arrival
71.026558
NaN
0
37
11
Simplest
queue
treatment_wait_begins
71.026558
NaN
0
38
5
Simplest
resource_use_end
treatment_complete
77.447488
1.0
0
39
5
Simplest
arrival_departure
depart
77.447488
NaN
0
40
9
Simplest
resource_use
treatment_begins
77.447488
1.0
0
41
6
Simplest
resource_use_end
treatment_complete
83.962251
2.0
0
42
6
Simplest
arrival_departure
depart
83.962251
NaN
0
43
10
Simplest
resource_use
treatment_begins
83.962251
2.0
0
44
12
Simplest
arrival_departure
arrival
87.458700
NaN
0
45
12
Simplest
queue
treatment_wait_begins
87.458700
NaN
0
46
13
Simplest
arrival_departure
arrival
87.465138
NaN
0
47
13
Simplest
queue
treatment_wait_begins
87.465138
NaN
0
48
7
Simplest
resource_use_end
treatment_complete
95.498040
4.0
0
49
7
Simplest
arrival_departure
depart
95.498040
NaN
0
Let’s break down the key components.
30.1.1 Mandatory Columns
We have several mandatory columns:
entity_id
event_type
event
time
entity_id: a unique identifider to allow us to follow a given entity through their journey
event_type: this column is used to distinguish between three key kinds of events:
arrival_departure: an entity first entering the system, or the entity leaving the system
queue: an entity beginning to queue for a resource
this can also be used to just generally record the movement of someone through points of a system that don’t necessarily have a traditional ‘queue’
resource_use: this relates to anything where someone starts or ends their time with a resource in the system
event: this column further breaks down what is happening during each event type, such as what stage of the system people are waiting to interact with
time: this can be an absolute timestamp in the form of a datetime (e.g. 2027-01-01 23:01:47), or a relative timestamp in time units from the start of the simulation.
30.1.2 Recommended Columns
In addition, the following columns are highly recommended to include:
run: this separates out results across multiple simulations, allowing for easy grouping of results by run
30.1.3 Optional Columns
There are a few more recommended but optional columns:
resource_id
pathway
resource_id: While a traditional simpy resource does not have a concept of a resource ID, there are various ways we can tackle adding one into our system. Let’s assume we have 5 nurses - it’s very helpful to know which of these 5 nurses are in use at any given time for visualisation and auditing purposes. A discussion of approaches to this will be found in a later chapter (coming soon), or you can take a look at the populate_store function from vidigi for a discussion on one approach to this: click here
30.2 Exploring our log
Let’s explore what we can find out even from this small sample of logs.
30.2.1 Tracking the journey of a single entity/patient
We can easily filter down by entity ID to see how the journey of an individual looks
event_log[event_log["entity_id"] ==1]
entity_id
pathway
event_type
event
time
resource_id
run
0
1
Simplest
arrival_departure
arrival
0.000000
NaN
0
1
1
Simplest
queue
treatment_wait_begins
0.000000
NaN
0
2
1
Simplest
resource_use
treatment_begins
0.000000
1.0
0
22
1
Simplest
resource_use_end
treatment_complete
40.317385
1.0
0
23
1
Simplest
arrival_departure
depart
40.317385
NaN
0
We can see that this person arrived and was seen immediately.
30.2.2 Calculating waits
In this event log, patients who wait for treatment (an event of ‘treatment_wait_begins’) will have the next event of ‘treatment_wait’. We’ll start by just pulling out those instances.
Finally, let’s work out the average wait for only those patients who were seen by someone before our simulation finished. We first need to calculate the difference between the two stamps, which in this simulation, represents a number of minutes
This would be a lot more powerful if we weren’t just working with a small sample of 50 rows of an event log. Let’s instead load in the full event log from this model, which runs for a lot longer and includes multiple runs. All our existing code can be reused.
event_log = pd.read_csv("resources/sample_event_log_10_day_10_run.csv")# We'll limit this to the first 24 hours of our clinicevent_log = event_log[event_log["time"] <=60*24]event_log.sample(10)
entity_id
pathway
event_type
event
time
resource_id
run
40922
238
Simplest
queue
treatment_wait_begins
1082.291238
NaN
4
70708
165
Simplest
arrival_departure
arrival
782.125020
NaN
7
70407
73
Simplest
arrival_departure
arrival
389.943492
NaN
7
50236
15
Simplest
resource_use
treatment_begins
132.662770
3.0
5
468
69
Simplest
arrival_departure
depart
716.403642
NaN
0
913
139
Simplest
arrival_departure
depart
1404.088281
NaN
0
60922
105
Simplest
arrival_departure
depart
1109.931131
NaN
6
20023
5
Simplest
queue
treatment_wait_begins
42.496097
NaN
2
41064
278
Simplest
arrival_departure
arrival
1287.351088
NaN
4
879
242
Simplest
arrival_departure
arrival
1342.820571
NaN
0
Let’s recalculate our waits. In fact - let’s write a function that could be used to make a wait table for any pair of events!
We’ll also make sure it takes account of the ‘run’ column.
# Create a suitable class to pass in the resource numbers to the animation functionclass model_params():def__init__(self):self.n_cubicles =4# In this case, I know that there were four resources available.params = model_params()print(f"Number of nurses: {params.n_cubicles}")
Number of nurses: 4
from vidigi.animation import animate_activity_loganimate_activity_log(# We need to ensure we only pass in a single run of the model# We also need to ensure that the column 'entity_id' is renamed to 'patient' to match# vidigi's expectations - though this will be made more flexible in a future version# of the package event_log=event_log[event_log['run']==1].rename(columns={"entity_id":"patient"}), event_position_df=event_position_df, scenario=params, debug_mode=True, setup_mode=False, every_x_time_units=5, include_play_button=True, icon_and_text_size=20, gap_between_entities=6, gap_between_rows=25, plotly_height=700, frame_duration=200, plotly_width=1200, override_x_max=300, override_y_max=500, limit_duration=event_log["time"].max().astype('int'), wrap_queues_at=25, step_snapshot_max=125, time_display_units="dhm", display_stage_labels=False, add_background_image="https://raw.githubusercontent.com/Bergam0t/vidigi/refs/heads/main/examples/example_1_simplest_case/Simplest%20Model%20Background%20Image%20-%20Horizontal%20Layout.drawio.png", )
Animation function called at 17:38:12
Iteration through minute-by-minute logs complete 17:38:13
Snapshot df concatenation complete at 17:38:13
Reshaped animation dataframe finished construction at 17:38:14
Placement dataframe finished construction at 17:38:14
C:\Users\Sammi\anaconda3\envs\des\Lib\site-packages\vidigi\animation.py:119: FutureWarning:
The 'unit' keyword in TimedeltaIndex construction is deprecated and will be removed in a future version. Use pd.to_timedelta instead.
Output animation generation complete at 17:38:15
Total Time Elapsed: 2.19 seconds
Changed code will be highlighted in the right-hand column, with unchanged column displayed less brightly.
30.4.1 the g Class
Our g class is unchanged.
30.4.2 the Entity Class
Our entity class - in this case, Patient - is unchanged.
30.4.3 the Model Class
30.4.3.1 The init method
To our init method for the Model class, we add an empty list that will store event logs throughout the model run for each patient.
30.4.3.1.1 Original
def__init__(self, run_number):# Create a SimPy environment in which everything will liveself.env = simpy.Environment()# Create a patient counter (which we'll use as a patient ID)self.patient_counter =0# Create an empty list to store patient objects inself.patients = []# Create our resourcesself.init_resources()# Store the passed in run numberself.run_number = run_number# Create a new Pandas DataFrame that will store some results# against the patient ID (which we'll use as the index).self.results_df = pd.DataFrame()self.results_df["Patient ID"] = [1]self.results_df["Queue Time Cubicle"] = [0.0]self.results_df["Time with Nurse"] = [0.0]self.results_df.set_index("Patient ID", inplace=True)# Create an attribute to store the mean queuing times# across this run of the modelself.mean_q_time_cubicle =0self.patient_inter_arrival_dist = Exponential( mean = g.arrival_rate, random_seed =self.run_number*g.random_number_set )self.treat_dist = Lognormal( mean = g.trauma_treat_mean, stdev = g.trauma_treat_var, random_seed =self.run_number*g.random_number_set )
30.4.3.1.2 With Event Logging Modifications
def__init__(self, run_number):# Create a SimPy environment in which everything will liveself.env = simpy.Environment()# Add an empty list to store our event logs in self.event_log = [] # Create a patient counter (which we'll use as a patient ID)self.patient_counter =0# Create an empty list to store patient objects inself.patients = []# Create our resourcesself.init_resources()# Store the passed in run numberself.run_number = run_number# Create a new Pandas DataFrame that will store some results# against the patient ID (which we'll use as the index)self.results_df = pd.DataFrame()self.results_df["Patient ID"] = [1]self.results_df["Queue Time Cubicle"] = [0.0]self.results_df["Time with Nurse"] = [0.0]self.results_df.set_index("Patient ID", inplace=True)# Create an attribute to store the mean queuing times# across this run of the modelself.mean_q_time_cubicle =0self.patient_inter_arrival_dist = Exponential( mean = g.arrival_rate, random_seed =self.run_number*g.random_number_set )self.treat_dist = Lognormal( mean = g.trauma_treat_mean, stdev = g.trauma_treat_var, random_seed =self.run_number*g.random_number_set )
30.4.3.2 the generator_patient_arrivals method
This method is unchanged.
30.4.3.3 the attend_clinic method
This is the key place in which we add our logging. The logs are what vidigi relies on to calculate who should be where, when, within the animation.
Tip
Appending rows to a dataframe is a very slow and memory-intensive operation and could negatively affect our simulation runtime.
A more efficient approach is to add dictionaries to a list. Each dictionary corresponds to one row of our eventual event log dataframe, and contains the keys that will form our column names, and the values that will be the values in each row.
At the end of the process, we will pass these dictionaries to pandas and it will convert them into a dataframe - so the end result is the same, but this approach is much faster overall.
Event logging takes the format below:
self.event_log.append( {'patient': entity_identifier,'pathway': 'My_Pathway_Name','event_type': 'arrival_departure', # or 'queue', 'resource_use', or 'resource_use_end''event': 'arrival', # or 'depart', or for 'queue' and 'resource_use' or 'resource_use_end' you can determine your own event name'time': self.env.now} )
Tip
While these columns are the bare minimum, you can take this opportunity to add as many additional fields as you see fit!
Bear in mind that the event log files can get quite unwieldy, and it may not be very efficient if you are adding in extra columns where the values are not always present.
This is another benefit of using dictionaries, however - you don’t always have to pass the same number of key-value pairs; the resulting dataframe will just deal with this and leave any missing ‘columns’ empty for that row.
Here, we will begin to add our logs.
In this case, we are adding logging at 5 points
arriving
beginning to queue (in this case, this will be the same as the arrival time)
beginning treatment
ending treatment
departing
30.4.3.3.1 Original
def attend_clinic(self, patient):self.arrival =self.env.now# request examination resource start_wait =self.env.nowwithself.treatment_cubicles.request() as req:# Seize a treatment resource when availableyield req# record the waiting time for registrationself.wait_treat =self.env.now - start_wait# sample treatment durationself.treat_duration =self.treat_dist.sample()yieldself.env.timeout(self.treat_duration)# total time in systemself.total_time =self.env.now -self.arrival
30.4.3.3.2 With Event Logging Modifications
def attend_clinic(self, patient):self.arrival =self.env.nowself.event_log.append( {'patient': patient.identifier, 'pathway': 'Simplest', 'event_type': 'arrival_departure', # you must use this event name for arrival events'event': 'arrival', 'time': self.env.now} ) # request examination resource start_wait =self.env.nowself.event_log.append( {'patient': patient.identifier, 'pathway': 'Simplest', # for a queue, you can define your chosen event name'event': 'treatment_wait_begins', 'event_type': 'queue', 'time': self.env.now} ) # Seize a treatment resource when availablewithself.treatment_cubicles.request() as req:# Seize a treatment resource when availableyield req# record the waiting time for registrationself.wait_treat =self.env.now - start_waitself.event_log.append( {'patient': patient.identifier, 'pathway': 'Simplest', 'event': 'treatment_begins', # for a resource_use, you can define your chosen event name'event_type': 'resource_use', 'time': self.env.now } ) # sample treatment durationself.treat_duration =self.treat_dist.sample()yieldself.env.timeout(self.treat_duration)self.event_log.append( {'patient': patient.identifier, 'pathway': 'Simplest', # for a resource_use_end, you can define your chosen event name'event': 'treatment_complete', 'event_type': 'resource_use_end', 'time': self.env.now } ) # total time in systemself.total_time =self.env.now -self.arrivalself.event_log.append( {'patient': patient.identifier, 'pathway': 'Simplest', 'event': 'depart', # you must use this event name for departure events 'event_type': 'arrival_departure', 'time': self.env.now} )
30.4.3.4 the calculate_run_results method
This method is unchanged.
30.4.3.5 the run method
Our concern in the run method is adding a way to extract the event log.
It will also be important to augment the event log with a run number - we already pass a run number attribute when initialising our Model class, so we can just access that.
Finally, we change our model to output a dictionary of our results_df and our event log. You could approach this in different ways, like returning a list, or only returning the event log and calculating all results from the event log instead.
30.4.3.5.1 Original
def run(self):# Start up our DES entity generators that create new patients. We've# only got one in this model, but we'd need to do this for each one if# we had multiple generators.self.env.process(self.generator_patient_arrivals())# Run the model for the duration specified in g classself.env.run(until=g.sim_duration)# Now the simulation run has finished, call the method that calculates# run resultsself.calculate_run_results()
30.4.3.5.2 With Event Logging Modifications
def run(self):# Start up our DES entity generators that create new patients. We've# only got one in this model, but we'd need to do this for each one if# we had multiple generators.self.env.process(self.generator_patient_arrivals())# Run the model for the duration specified in g classself.env.run(until=g.sim_duration)# Now the simulation run has finished, call the method that calculates# run resultsself.calculate_run_results()self.event_log = pd.DataFrame(self.event_log) self.event_log["run"] =self.run_number return {'results': self.results_df, 'event_log': self.event_log}
30.4.4 the Trial Class
30.4.4.1 the init method
In this, we just need to add an empty list which will hold all the event log dataframes from the individual runs.
Finally, in every iteration, we split out the results from our model (the dictionary) into two objects - one of which is our event log.
We can then add this event log - remembering that it also has a column added identifying which run the event log relates to - into our list.
Once we have finished the for loop - i.e. we have completed every run of the model - we can join our event log into one large dataframe.
Finally, to make it easier to experiment with building visualisations, metrics, animations and more without having to rerun the model each time, we will save the resulting dataframe as a csv.
Tip
You may wish to explore a datatype-preserving format like pickle, arrow or parquet for this export so that you don’t run into issues with dates - but csv has the benefit of being cross-plaform, human readable and machine readable.
If you are using version control (Git/GitHub), it’s recommended to add this file to your .gitignore - it’s quite large and will change frequently!
If we want to make use of our full ability to monitor resource utilisation at a detailed level, and to create animations with the vidigi package, we also need to change the way we request resources to account for the fact we are now using a simpy store instead of directly interacting with our simpy resources.
Where we would have previously used
withself.treatment_cubicles.request() as req:# Seize a treatment resource when availableyield req# ALL CODE WHERE WE NEED TO KEEP HOLD OF THE RESOURCE# CONTINUE AFTER RELEASING RESOURCE HERE
we instead now use
# Seize a treatment resource when availabletreatment_resource =yieldself.treatment_cubicles.get()# ALL CODE WHERE WE NEED TO KEEP HOLD OF THE RESOURCE# CONTINUE AFTER RELEASING RESOURCE HERE# Resource is no longer in use, so put it back in the storeself.treatment_cubicles.put(treatment_resource)
30.5.1 Imports
We will need to add the following to our imports:
from vidigi.utils import populate_store
30.5.2 The Model Class
30.5.2.1 the init_resources method
Vidigi needs to know which resource a user made use of so that we can ensure it stays with the correct resource throughout its time in the animation.
The standard simpy Resource does not have a way of tracking that, so we need to do two things: - create a simpy Store that we will store our resources in - use the vidigi helper function populate_store() to generate a store full of special resources that each have a unique ID we can track when doing our event logging
Overall, the use of stores won’t generally change your code too much - and we cover exactly what needs to change a little later in this document.
If you are using priority resources, this step will be a little different - see Example 3 in the documents if you need to use Resources that prioritise some entities over others.
We will return to our attend_clinic method, where we previously added in event logging, and now
make use of this new resource ID attribute
make the required changes to how resources are requested and released for the next user
30.5.2.2.1 Original
def attend_clinic(self, patient):self.arrival =self.env.nowself.event_log.append( {'patient': patient.identifier,'pathway': 'Simplest','event_type': 'arrival_departure',# you must use this event name for arrival events'event': 'arrival','time': self.env.now} )# request examination resource start_wait =self.env.nowself.event_log.append( {'patient': patient.identifier,'pathway': 'Simplest',# for a queue, you can define your chosen event name'event': 'treatment_wait_begins','event_type': 'queue','time': self.env.now} )# Seize a treatment resource when availablewithself.treatment_cubicles.request() as req:# Seize a treatment resource when availableyield req# record the waiting time for registrationself.wait_treat =self.env.now - start_waitself.event_log.append( {'patient': patient.identifier,'pathway': 'Simplest','event': 'treatment_begins',# for a resource_use, you can define your chosen event name'event_type': 'resource_use','time': self.env.now } )# sample treatment durationself.treat_duration =self.treat_dist.sample()yieldself.env.timeout(self.treat_duration)self.event_log.append( {'patient': patient.identifier,'pathway': 'Simplest',# for a resource_use_end, you can define your chosen event name'event': 'treatment_complete','event_type': 'resource_use_end','time': self.env.now } )# total time in systemself.total_time =self.env.now -self.arrivalself.event_log.append( {'patient': patient.identifier,'pathway': 'Simplest','event': 'depart', # you must use this event name for departure events'event_type': 'arrival_departure','time': self.env.now} )
30.5.2.2.2 With Custom Resource Modifications
def attend_clinic(self, patient):self.arrival =self.env.nowself.event_log.append( {'patient': patient.identifier, 'pathway': 'Simplest', 'event_type': 'arrival_departure', # you must use this event name for arrival events'event': 'arrival', 'time': self.env.now} ) # request examination resource start_wait =self.env.nowself.event_log.append( {'patient': patient.identifier, 'pathway': 'Simplest', # for a queue, you can define your chosen event name'event': 'treatment_wait_begins', 'event_type': 'queue', 'time': self.env.now} ) # Seize a treatment resource when available treatment_resource =yieldself.treatment_cubicles.get() # record the waiting time for registrationself.wait_treat =self.env.now - start_waitself.event_log.append( {'patient': patient.identifier, 'pathway': 'Simplest', 'event': 'treatment_begins', # for a resource_use, you can define your chosen event name'event_type': 'resource_use', 'time': self.env.now, # grab the resource id from the treatment_resource requested'resource_id': treatment_resource.id_attribute } ) # sample treatment durationself.treat_duration =self.treat_dist.sample()yieldself.env.timeout(self.treat_duration)self.event_log.append( {'patient': patient.identifier, 'pathway': 'Simplest', # for a resource_use_end, you can define your chosen event name'event': 'treatment_complete', 'event_type': 'resource_use_end', 'time': self.env.now, 'resource_id': treatment_resource.id_attribute} ) # Resource is no longer in use, so put it back in the store self.treatment_cubicles.put(treatment_resource) # total time in systemself.total_time =self.env.now -self.arrivalself.event_log.append( {'patient': patient.identifier, 'pathway': 'Simplest', 'event': 'depart', # you must use this event name for departure events 'event_type': 'arrival_departure', 'time': self.env.now} )
30.6 Summary
In this chapter, we have demonstrated just a few of the things you can do with event logs, and how you can efficiently add them to an existing model.
In the rest of this section, we will dive deeper into some reproducible snippets and patterns to help you generate useful outputs from event logs, as well as a deeper dive into turning these into process maps with bupaR and animations with vidigi.