30  Event Logging

In this chapter and beyond, we will be using the term ‘event logging’ to describe the process of generating a step-by-step log of what happens to each entity as they pass through our system. The resulting file will be an ‘event log’.

But why bother with event logging? In previous chapters, we have managed to record a range of useful metrics from our simulation, and used these to display the performance of our model with different sets of parameters. In the debugging chapters, we have explored how to track down issues in our simulation with a combination of print statements and the logging module. So what else is there left to do?

Event logging can be a way to tackle both of the above in a robust, efficient and reusable way - while also opening up the use of a wide range of reusable boilerplate code for visualising, animating and formally testing our simulation outputs.

Tip

While this chapter - and the rest of the book - will make use of a particular structure for our event log, this isn’t the only way you can set up an event log. However, this structure has been tested to work well across a range of different models, and interfaces well with the vidigi package for animated simulation outputs.

30.1 A sample event log

First, it may be helpful to see what an event log looks like.

A sample log is displayed below:

import pandas as pd
event_log = pd.read_csv("resources/sample_event_log.csv")

event_log
entity_id pathway event_type event time resource_id run
0 1 Simplest arrival_departure arrival 0.000000 NaN 0
1 1 Simplest queue treatment_wait_begins 0.000000 NaN 0
2 1 Simplest resource_use treatment_begins 0.000000 1.0 0
3 2 Simplest arrival_departure arrival 3.399660 NaN 0
4 2 Simplest queue treatment_wait_begins 3.399660 NaN 0
5 2 Simplest resource_use treatment_begins 3.399660 2.0 0
6 3 Simplest arrival_departure arrival 8.497645 NaN 0
7 3 Simplest queue treatment_wait_begins 8.497645 NaN 0
8 3 Simplest resource_use treatment_begins 8.497645 3.0 0
9 4 Simplest arrival_departure arrival 8.596678 NaN 0
10 4 Simplest queue treatment_wait_begins 8.596678 NaN 0
11 4 Simplest resource_use treatment_begins 8.596678 4.0 0
12 5 Simplest arrival_departure arrival 8.608025 NaN 0
13 5 Simplest queue treatment_wait_begins 8.608025 NaN 0
14 6 Simplest arrival_departure arrival 11.359739 NaN 0
15 6 Simplest queue treatment_wait_begins 11.359739 NaN 0
16 7 Simplest arrival_departure arrival 19.509442 NaN 0
17 7 Simplest queue treatment_wait_begins 19.509442 NaN 0
18 8 Simplest arrival_departure arrival 22.877356 NaN 0
19 8 Simplest queue treatment_wait_begins 22.877356 NaN 0
20 9 Simplest arrival_departure arrival 26.653863 NaN 0
21 9 Simplest queue treatment_wait_begins 26.653863 NaN 0
22 1 Simplest resource_use_end treatment_complete 40.317385 1.0 0
23 1 Simplest arrival_departure depart 40.317385 NaN 0
24 5 Simplest resource_use treatment_begins 40.317385 1.0 0
25 10 Simplest arrival_departure arrival 40.737793 NaN 0
26 10 Simplest queue treatment_wait_begins 40.737793 NaN 0
27 2 Simplest resource_use_end treatment_complete 42.443230 2.0 0
28 2 Simplest arrival_departure depart 42.443230 NaN 0
29 6 Simplest resource_use treatment_begins 42.443230 2.0 0
30 4 Simplest resource_use_end treatment_complete 48.809628 4.0 0
31 4 Simplest arrival_departure depart 48.809628 NaN 0
32 7 Simplest resource_use treatment_begins 48.809628 4.0 0
33 3 Simplest resource_use_end treatment_complete 51.483457 3.0 0
34 3 Simplest arrival_departure depart 51.483457 NaN 0
35 8 Simplest resource_use treatment_begins 51.483457 3.0 0
36 11 Simplest arrival_departure arrival 71.026558 NaN 0
37 11 Simplest queue treatment_wait_begins 71.026558 NaN 0
38 5 Simplest resource_use_end treatment_complete 77.447488 1.0 0
39 5 Simplest arrival_departure depart 77.447488 NaN 0
40 9 Simplest resource_use treatment_begins 77.447488 1.0 0
41 6 Simplest resource_use_end treatment_complete 83.962251 2.0 0
42 6 Simplest arrival_departure depart 83.962251 NaN 0
43 10 Simplest resource_use treatment_begins 83.962251 2.0 0
44 12 Simplest arrival_departure arrival 87.458700 NaN 0
45 12 Simplest queue treatment_wait_begins 87.458700 NaN 0
46 13 Simplest arrival_departure arrival 87.465138 NaN 0
47 13 Simplest queue treatment_wait_begins 87.465138 NaN 0
48 7 Simplest resource_use_end treatment_complete 95.498040 4.0 0
49 7 Simplest arrival_departure depart 95.498040 NaN 0

Let’s break down the key components.

30.1.1 Mandatory Columns

We have several mandatory columns:

  • entity_id
  • event_type
  • event
  • time

entity_id: a unique identifider to allow us to follow a given entity through their journey

event_type: this column is used to distinguish between three key kinds of events:

  • arrival_departure: an entity first entering the system, or the entity leaving the system
  • queue: an entity beginning to queue for a resource
    • this can also be used to just generally record the movement of someone through points of a system that don’t necessarily have a traditional ‘queue’
  • resource_use: this relates to anything where someone starts or ends their time with a resource in the system

event: this column further breaks down what is happening during each event type, such as what stage of the system people are waiting to interact with

time: this can be an absolute timestamp in the form of a datetime (e.g. 2027-01-01 23:01:47), or a relative timestamp in time units from the start of the simulation.

30.1.3 Optional Columns

There are a few more recommended but optional columns:

  • resource_id
  • pathway

resource_id: While a traditional simpy resource does not have a concept of a resource ID, there are various ways we can tackle adding one into our system. Let’s assume we have 5 nurses - it’s very helpful to know which of these 5 nurses are in use at any given time for visualisation and auditing purposes. A discussion of approaches to this will be found in a later chapter (coming soon), or you can take a look at the populate_store function from vidigi for a discussion on one approach to this: click here

30.2 Exploring our log

Let’s explore what we can find out even from this small sample of logs.

30.2.1 Tracking the journey of a single entity/patient

We can easily filter down by entity ID to see how the journey of an individual looks

event_log[event_log["entity_id"] == 1]
entity_id pathway event_type event time resource_id run
0 1 Simplest arrival_departure arrival 0.000000 NaN 0
1 1 Simplest queue treatment_wait_begins 0.000000 NaN 0
2 1 Simplest resource_use treatment_begins 0.000000 1.0 0
22 1 Simplest resource_use_end treatment_complete 40.317385 1.0 0
23 1 Simplest arrival_departure depart 40.317385 NaN 0

We can see that this person arrived and was seen immediately.

30.2.2 Calculating waits

In this event log, patients who wait for treatment (an event of ‘treatment_wait_begins’) will have the next event of ‘treatment_wait’. We’ll start by just pulling out those instances.

treatment_waits_df = event_log[event_log["event"].isin(["treatment_wait_begins","treatment_begins"])]

treatment_waits_df
entity_id pathway event_type event time resource_id run
1 1 Simplest queue treatment_wait_begins 0.000000 NaN 0
2 1 Simplest resource_use treatment_begins 0.000000 1.0 0
4 2 Simplest queue treatment_wait_begins 3.399660 NaN 0
5 2 Simplest resource_use treatment_begins 3.399660 2.0 0
7 3 Simplest queue treatment_wait_begins 8.497645 NaN 0
8 3 Simplest resource_use treatment_begins 8.497645 3.0 0
10 4 Simplest queue treatment_wait_begins 8.596678 NaN 0
11 4 Simplest resource_use treatment_begins 8.596678 4.0 0
13 5 Simplest queue treatment_wait_begins 8.608025 NaN 0
15 6 Simplest queue treatment_wait_begins 11.359739 NaN 0
17 7 Simplest queue treatment_wait_begins 19.509442 NaN 0
19 8 Simplest queue treatment_wait_begins 22.877356 NaN 0
21 9 Simplest queue treatment_wait_begins 26.653863 NaN 0
24 5 Simplest resource_use treatment_begins 40.317385 1.0 0
26 10 Simplest queue treatment_wait_begins 40.737793 NaN 0
29 6 Simplest resource_use treatment_begins 42.443230 2.0 0
32 7 Simplest resource_use treatment_begins 48.809628 4.0 0
35 8 Simplest resource_use treatment_begins 51.483457 3.0 0
37 11 Simplest queue treatment_wait_begins 71.026558 NaN 0
40 9 Simplest resource_use treatment_begins 77.447488 1.0 0
43 10 Simplest resource_use treatment_begins 83.962251 2.0 0
45 12 Simplest queue treatment_wait_begins 87.458700 NaN 0
47 13 Simplest queue treatment_wait_begins 87.465138 NaN 0

Next, we’ll pivot these to a wide format.

treatment_waits_df_wide = treatment_waits_df.pivot(index="entity_id", columns="event", values="time")

treatment_waits_df_wide = treatment_waits_df_wide[['treatment_wait_begins', 'treatment_begins']]

treatment_waits_df_wide
event treatment_wait_begins treatment_begins
entity_id
1 0.000000 0.000000
2 3.399660 3.399660
3 8.497645 8.497645
4 8.596678 8.596678
5 8.608025 40.317385
6 11.359739 42.443230
7 19.509442 48.809628
8 22.877356 51.483457
9 26.653863 77.447488
10 40.737793 83.962251
11 71.026558 NaN
12 87.458700 NaN
13 87.465138 NaN

Finally, let’s work out the average wait for only those patients who were seen by someone before our simulation finished. We first need to calculate the difference between the two stamps, which in this simulation, represents a number of minutes

treatment_waits_df_wide["treatment_wait"] = treatment_waits_df_wide["treatment_begins"] - treatment_waits_df_wide["treatment_wait_begins"]

treatment_waits_df_wide
event treatment_wait_begins treatment_begins treatment_wait
entity_id
1 0.000000 0.000000 0.000000
2 3.399660 3.399660 0.000000
3 8.497645 8.497645 0.000000
4 8.596678 8.596678 0.000000
5 8.608025 40.317385 31.709360
6 11.359739 42.443230 31.083491
7 19.509442 48.809628 29.300186
8 22.877356 51.483457 28.606101
9 26.653863 77.447488 50.793625
10 40.737793 83.962251 43.224458
11 71.026558 NaN NaN
12 87.458700 NaN NaN
13 87.465138 NaN NaN

Now we can calculate the mean of that column:

treatment_waits_df_wide["treatment_wait"].mean()
np.float64(21.4717221)

Or the median:

treatment_waits_df_wide["treatment_wait"].median()
np.float64(28.9531435)

Or the max and min waits:

treatment_waits_df_wide["treatment_wait"].agg(['min', 'max'])
min     0.000000
max    50.793625
Name: treatment_wait, dtype: float64

Or plot the distribution of waits:

import plotly.express as px

px.histogram(treatment_waits_df_wide, x="treatment_wait")

Or a boxplot:

px.box(treatment_waits_df_wide, x="treatment_wait")

30.3 Exploring a bigger event log

This would be a lot more powerful if we weren’t just working with a small sample of 50 rows of an event log. Let’s instead load in the full event log from this model, which runs for a lot longer and includes multiple runs. All our existing code can be reused.

event_log = pd.read_csv("resources/sample_event_log_10_day_10_run.csv")

# We'll limit this to the first 24 hours of our clinic

event_log = event_log[event_log["time"] <= 60 * 24]

event_log.sample(10)
entity_id pathway event_type event time resource_id run
40922 238 Simplest queue treatment_wait_begins 1082.291238 NaN 4
70708 165 Simplest arrival_departure arrival 782.125020 NaN 7
70407 73 Simplest arrival_departure arrival 389.943492 NaN 7
50236 15 Simplest resource_use treatment_begins 132.662770 3.0 5
468 69 Simplest arrival_departure depart 716.403642 NaN 0
913 139 Simplest arrival_departure depart 1404.088281 NaN 0
60922 105 Simplest arrival_departure depart 1109.931131 NaN 6
20023 5 Simplest queue treatment_wait_begins 42.496097 NaN 2
41064 278 Simplest arrival_departure arrival 1287.351088 NaN 4
879 242 Simplest arrival_departure arrival 1342.820571 NaN 0

Let’s recalculate our waits. In fact - let’s write a function that could be used to make a wait table for any pair of events!

We’ll also make sure it takes account of the ‘run’ column.

def generate_difference_df(event_log, event_1, event_2, output_col_name):
  difference_df = event_log[event_log["event"].isin([event_1,event_2])]

  difference_df_wide = difference_df.pivot(index=["entity_id", "run"], columns="event", values="time")

  difference_df_wide = difference_df_wide[[event_1,event_2]]

  difference_df_wide[output_col_name] = difference_df_wide[event_2] - difference_df_wide[event_1]

  return difference_df_wide.reset_index()

Let’s run this and view the first 20 rows.

treatment_wait_df = generate_difference_df(event_log=event_log, event_1="treatment_wait_begins", event_2="treatment_begins", output_col_name="treatment_wait")

treatment_wait_df.head(20)
event entity_id run treatment_wait_begins treatment_begins treatment_wait
0 1 0 0.000000 0.000000 0.0
1 1 1 0.000000 0.000000 0.0
2 1 2 0.000000 0.000000 0.0
3 1 3 0.000000 0.000000 0.0
4 1 4 0.000000 0.000000 0.0
5 1 5 0.000000 0.000000 0.0
6 1 6 0.000000 0.000000 0.0
7 1 7 0.000000 0.000000 0.0
8 1 8 0.000000 0.000000 0.0
9 1 9 0.000000 0.000000 0.0
10 2 0 3.399660 3.399660 0.0
11 2 1 12.021043 12.021043 0.0
12 2 2 8.965271 8.965271 0.0
13 2 3 0.033294 0.033294 0.0
14 2 4 10.399752 10.399752 0.0
15 2 5 2.645552 2.645552 0.0
16 2 6 11.489413 11.489413 0.0
17 2 7 4.011208 4.011208 0.0
18 2 8 17.040648 17.040648 0.0
19 2 9 2.861402 2.861402 0.0

Here, we never have a wait because these people are entering an empty system.

Let’s instead look at a different patient.

treatment_wait_df[treatment_wait_df["entity_id"] == 40]
event entity_id run treatment_wait_begins treatment_begins treatment_wait
390 40 0 225.946145 374.475154 148.529008
391 40 1 166.195041 396.502222 230.307181
392 40 2 229.388022 384.693144 155.305122
393 40 3 198.590107 376.281301 177.691195
394 40 4 209.135743 370.569622 161.433878
395 40 5 227.132040 376.794591 149.662551
396 40 6 173.002070 383.515632 210.513562
397 40 7 223.578745 378.418622 154.839878
398 40 8 228.714086 385.852087 157.138001
399 40 9 176.753615 370.958861 194.205245

Let’s sample a range of other quick visualisations we can make.

30.3.1 Cumulative arrivals

Let’s first just take a look at the arrivals for a single run, with each dot representing a single person arriving in our system.

px.scatter(
  event_log[
    (event_log["run"]==1) &
    (event_log["event"]=="arrival")],
    x="time",
    y="entity_id"
    )

Here, we can see how cumulative arrivals varied by run.

px.line(
  event_log[event_log["event"]=="arrival"],
  x="time",
  y="entity_id",
  color="run"
  )

30.3.2 Event scatterplot

px.scatter(
  event_log[event_log["run"]==1],
  y="time",
  color="event",
  x="entity_id"
  )

We can tell that the treatment wait very quickly gets out of hand for our individuals as the simulation progresses.

30.3.3 Treatment duration

treatment_duration_df = generate_difference_df(
  event_log=event_log,
  event_1="treatment_begins",
  event_2="treatment_complete",
  output_col_name="treatment_duration")

treatment_duration_df.head(20)
event entity_id run treatment_begins treatment_complete treatment_duration
0 1 0 0.000000 40.317385 40.317385
1 1 1 0.000000 41.226014 41.226014
2 1 2 0.000000 32.732000 32.732000
3 1 3 0.000000 40.023184 40.023184
4 1 4 0.000000 37.026630 37.026630
5 1 5 0.000000 34.941718 34.941718
6 1 6 0.000000 36.785746 36.785746
7 1 7 0.000000 41.752122 41.752122
8 1 8 0.000000 38.779312 38.779312
9 1 9 0.000000 42.284465 42.284465
10 2 0 3.399660 42.443230 39.043571
11 2 1 12.021043 46.891254 34.870211
12 2 2 8.965271 46.903314 37.938043
13 2 3 0.033294 40.296613 40.263318
14 2 4 10.399752 53.054173 42.654422
15 2 5 2.645552 39.743828 37.098276
16 2 6 11.489413 49.827460 38.338047
17 2 7 4.011208 41.798441 37.787233
18 2 8 17.040648 51.984157 34.943509
19 2 9 2.861402 42.211944 39.350541
px.box(
  treatment_duration_df,
  x="treatment_duration",
  title="Variation in treatment duration"
  )
px.box(
  treatment_duration_df, x="treatment_duration", color="run",
  range_x=[
          0,
          max(treatment_duration_df["treatment_duration"]) * 1.1
          ],
  title="Variation in treatment duration by run"
        )

30.3.4 Throughput

patients_entering_system = event_log[event_log["event"] == "arrival"]

patients_entering_system = (
  patients_entering_system
  .groupby('run')[['entity_id']]
  .count()
  .rename(columns={"entity_id": "entering_system"})
  )

patients_entering_system
entering_system
run
0 259
1 303
2 276
3 285
4 302
5 270
6 292
7 294
8 258
9 286
patients_leaving_system = event_log[event_log["event"] == "depart"]

patients_leaving_system = (
  patients_leaving_system.groupby('run')[['entity_id']]
  .count()
  .rename(columns={"entity_id": "leaving_system"})
  )

patients_leaving_system
leaving_system
run
0 140
1 142
2 141
3 141
4 142
5 143
6 140
7 142
8 143
9 142
patient_throughput_df = patients_entering_system.merge(
  patients_leaving_system,
  left_index=True,
  right_index=True
  )

patient_throughput_df["throughput"] = patient_throughput_df["leaving_system"] / patient_throughput_df["entering_system"]

patient_throughput_df["throughput_display"] = patient_throughput_df["throughput"].apply(lambda x: f"{x:.1%}")

patient_throughput_df
entering_system leaving_system throughput throughput_display
run
0 259 140 0.540541 54.1%
1 303 142 0.468647 46.9%
2 276 141 0.510870 51.1%
3 285 141 0.494737 49.5%
4 302 142 0.470199 47.0%
5 270 143 0.529630 53.0%
6 292 140 0.479452 47.9%
7 294 142 0.482993 48.3%
8 258 143 0.554264 55.4%
9 286 142 0.496503 49.7%

30.3.5 Animation

Using the vidigi package, we can take our event log and - with only minor adjustments and additions - create a full animation of our system.

event_position_df = pd.DataFrame([
                    {'event': 'arrival',
                     'x':  50, 'y': 300,
                     'label': "Arrival" },

                    # Triage - minor and trauma
                    {'event': 'treatment_wait_begins',
                     'x':  205, 'y': 275,
                     'label': "Waiting for Treatment"},

                    {'event': 'treatment_begins',
                     'x':  205, 'y': 175,
                     'resource':'n_cubicles',
                     'label': "Being Treated"},

                    {'event': 'exit',
                     'x':  270, 'y': 70,
                     'label': "Exit"}

                ])
# Create a suitable class to pass in the resource numbers to the animation function
class model_params():
    def __init__(self):
        self.n_cubicles = 4 # In this case, I know that there were four resources available.

params = model_params()

print(f"Number of nurses: {params.n_cubicles}")
Number of nurses: 4
from vidigi.animation import animate_activity_log

animate_activity_log(
        # We need to ensure we only pass in a single run of the model
        # We also need to ensure that the column 'entity_id' is renamed to 'patient' to match
        # vidigi's expectations - though this will be made more flexible in a future version
        # of the package
        event_log=event_log[event_log['run']==1].rename(columns={"entity_id":"patient"}),
        event_position_df=event_position_df,
        scenario=params,
        debug_mode=True,
        setup_mode=False,
        every_x_time_units=5,
        include_play_button=True,
        icon_and_text_size=20,
        gap_between_entities=6,
        gap_between_rows=25,
        plotly_height=700,
        frame_duration=200,
        plotly_width=1200,
        override_x_max=300,
        override_y_max=500,
        limit_duration=event_log["time"].max().astype('int'),
        wrap_queues_at=25,
        step_snapshot_max=125,
        time_display_units="dhm",
        display_stage_labels=False,
        add_background_image="https://raw.githubusercontent.com/Bergam0t/vidigi/refs/heads/main/examples/example_1_simplest_case/Simplest%20Model%20Background%20Image%20-%20Horizontal%20Layout.drawio.png",
    )
Animation function called at 17:38:12
Iteration through minute-by-minute logs complete 17:38:13
Snapshot df concatenation complete at 17:38:13
Reshaped animation dataframe finished construction at 17:38:14
Placement dataframe finished construction at 17:38:14
C:\Users\Sammi\anaconda3\envs\des\Lib\site-packages\vidigi\animation.py:119: FutureWarning:

The 'unit' keyword in TimedeltaIndex construction is deprecated and will be removed in a future version. Use pd.to_timedelta instead.
Output animation generation complete at 17:38:15
Total Time Elapsed: 2.19 seconds

30.4 Adding event logging to your own model

Here, we’re going to work with the very basic model from the chapter ‘An Example SimPy Model’.

Tip

Changed code will be highlighted in the right-hand column, with unchanged column displayed less brightly.

30.4.1 the g Class

Our g class is unchanged.

30.4.2 the Entity Class

Our entity class - in this case, Patient - is unchanged.

30.4.3 the Model Class

30.4.3.1 The init method

To our init method for the Model class, we add an empty list that will store event logs throughout the model run for each patient.

30.4.3.1.1 Original
def __init__(self, run_number):
    # Create a SimPy environment in which everything will live
    self.env = simpy.Environment()

    # Create a patient counter (which we'll use as a patient ID)
    self.patient_counter = 0

    # Create an empty list to store patient objects in
    self.patients = []

    # Create our resources
    self.init_resources()

    # Store the passed in run number
    self.run_number = run_number

    # Create a new Pandas DataFrame that will store some results
    # against the patient ID (which we'll use as the index).
    self.results_df = pd.DataFrame()
    self.results_df["Patient ID"] = [1]
    self.results_df["Queue Time Cubicle"] = [0.0]
    self.results_df["Time with Nurse"] = [0.0]
    self.results_df.set_index("Patient ID", inplace=True)

    # Create an attribute to store the mean queuing times
    # across this run of the model
    self.mean_q_time_cubicle = 0

    self.patient_inter_arrival_dist = Exponential(
        mean = g.arrival_rate,
        random_seed = self.run_number*g.random_number_set
        )

    self.treat_dist = Lognormal(
        mean = g.trauma_treat_mean,
        stdev = g.trauma_treat_var,
        random_seed = self.run_number*g.random_number_set
        )
30.4.3.1.2 With Event Logging Modifications
def __init__(self, run_number):
    # Create a SimPy environment in which everything will live
    self.env = simpy.Environment()

    # Add an empty list to store our event logs in 
    self.event_log = [] 

    # Create a patient counter (which we'll use as a patient ID)
    self.patient_counter = 0

    # Create an empty list to store patient objects in
    self.patients = []

    # Create our resources
    self.init_resources()

    # Store the passed in run number
    self.run_number = run_number

    # Create a new Pandas DataFrame that will store some results
    # against the patient ID (which we'll use as the index)
    self.results_df = pd.DataFrame()
    self.results_df["Patient ID"] = [1]
    self.results_df["Queue Time Cubicle"] = [0.0]
    self.results_df["Time with Nurse"] = [0.0]
    self.results_df.set_index("Patient ID", inplace=True)

    # Create an attribute to store the mean queuing times
    # across this run of the model
    self.mean_q_time_cubicle = 0

    self.patient_inter_arrival_dist = Exponential(
        mean = g.arrival_rate,
        random_seed = self.run_number*g.random_number_set
        )

    self.treat_dist = Lognormal(
        mean = g.trauma_treat_mean,
        stdev = g.trauma_treat_var,
        random_seed = self.run_number*g.random_number_set
        )

30.4.3.2 the generator_patient_arrivals method

This method is unchanged.

30.4.3.3 the attend_clinic method

This is the key place in which we add our logging. The logs are what vidigi relies on to calculate who should be where, when, within the animation.

Tip

Appending rows to a dataframe is a very slow and memory-intensive operation and could negatively affect our simulation runtime.

A more efficient approach is to add dictionaries to a list. Each dictionary corresponds to one row of our eventual event log dataframe, and contains the keys that will form our column names, and the values that will be the values in each row.

At the end of the process, we will pass these dictionaries to pandas and it will convert them into a dataframe - so the end result is the same, but this approach is much faster overall.

Event logging takes the format below:

self.event_log.append(
        {'patient': entity_identifier,
            'pathway': 'My_Pathway_Name',
            'event_type': 'arrival_departure', # or 'queue', 'resource_use', or 'resource_use_end'
            'event': 'arrival', # or 'depart', or for 'queue' and 'resource_use' or 'resource_use_end' you can determine your own event name
            'time': self.env.now}
    )
Tip

While these columns are the bare minimum, you can take this opportunity to add as many additional fields as you see fit!

Bear in mind that the event log files can get quite unwieldy, and it may not be very efficient if you are adding in extra columns where the values are not always present.

This is another benefit of using dictionaries, however - you don’t always have to pass the same number of key-value pairs; the resulting dataframe will just deal with this and leave any missing ‘columns’ empty for that row.

Here, we will begin to add our logs.

In this case, we are adding logging at 5 points

  • arriving
  • beginning to queue (in this case, this will be the same as the arrival time)
  • beginning treatment
  • ending treatment
  • departing
30.4.3.3.1 Original
def attend_clinic(self, patient):
    self.arrival = self.env.now

    # request examination resource
    start_wait = self.env.now

    with self.treatment_cubicles.request() as req:
        # Seize a treatment resource when available
        yield req

        # record the waiting time for registration
        self.wait_treat = self.env.now - start_wait

        # sample treatment duration
        self.treat_duration = self.treat_dist.sample()
        yield self.env.timeout(self.treat_duration)

    # total time in system
    self.total_time = self.env.now - self.arrival
30.4.3.3.2 With Event Logging Modifications
def attend_clinic(self, patient):
    self.arrival = self.env.now
    self.event_log.append( 
        {'patient': patient.identifier, 
            'pathway': 'Simplest', 
            'event_type': 'arrival_departure', 
            # you must use this event name for arrival events
            'event': 'arrival',  
            'time': self.env.now} 
    ) 

    # request examination resource
    start_wait = self.env.now
    self.event_log.append( 
        {'patient': patient.identifier, 
            'pathway': 'Simplest', 
            # for a queue, you can define your chosen event name
            'event': 'treatment_wait_begins', 
            'event_type': 'queue', 
            'time': self.env.now} 
    ) 

    # Seize a treatment resource when available
    with self.treatment_cubicles.request() as req:
        # Seize a treatment resource when available
        yield req

        # record the waiting time for registration
        self.wait_treat = self.env.now - start_wait
        self.event_log.append( 
            {'patient': patient.identifier, 
                'pathway': 'Simplest', 
                'event': 'treatment_begins', 
                # for a resource_use, you can define your chosen event name
                'event_type': 'resource_use', 
                'time': self.env.now 
                } 
        ) 

        # sample treatment duration
        self.treat_duration = self.treat_dist.sample()
        yield self.env.timeout(self.treat_duration)

        self.event_log.append( 
            {'patient': patient.identifier, 
                'pathway': 'Simplest', 
                # for a resource_use_end, you can define your chosen event name
                'event': 'treatment_complete', 
                'event_type': 'resource_use_end', 
                'time': self.env.now 
                } 
        ) 

    # total time in system
    self.total_time = self.env.now - self.arrival
    self.event_log.append( 
        {'patient': patient.identifier, 
        'pathway': 'Simplest', 
        'event': 'depart', # you must use this event name for departure events 
        'event_type': 'arrival_departure', 
        'time': self.env.now} 
    ) 

30.4.3.4 the calculate_run_results method

This method is unchanged.

30.4.3.5 the run method

Our concern in the run method is adding a way to extract the event log.

It will also be important to augment the event log with a run number - we already pass a run number attribute when initialising our Model class, so we can just access that.

Finally, we change our model to output a dictionary of our results_df and our event log. You could approach this in different ways, like returning a list, or only returning the event log and calculating all results from the event log instead.

30.4.3.5.1 Original
def run(self):
    # Start up our DES entity generators that create new patients.  We've
    # only got one in this model, but we'd need to do this for each one if
    # we had multiple generators.
    self.env.process(self.generator_patient_arrivals())

    # Run the model for the duration specified in g class
    self.env.run(until=g.sim_duration)

    # Now the simulation run has finished, call the method that calculates
    # run results
    self.calculate_run_results()
30.4.3.5.2 With Event Logging Modifications
def run(self):
    # Start up our DES entity generators that create new patients.  We've
    # only got one in this model, but we'd need to do this for each one if
    # we had multiple generators.
    self.env.process(self.generator_patient_arrivals())

    # Run the model for the duration specified in g class
    self.env.run(until=g.sim_duration)

    # Now the simulation run has finished, call the method that calculates
    # run results
    self.calculate_run_results()

    self.event_log = pd.DataFrame(self.event_log) 

    self.event_log["run"] = self.run_number 

    return {'results': self.results_df, 'event_log': self.event_log}  

30.4.4 the Trial Class

30.4.4.1 the init method

In this, we just need to add an empty list which will hold all the event log dataframes from the individual runs.

30.4.4.1.1 Original
def  __init__(self):
    self.df_trial_results = pd.DataFrame()
    self.df_trial_results["Run Number"] = [0]
    self.df_trial_results["Arrivals"] = [0]
    self.df_trial_results["Mean Queue Time Cubicle"] = [0.0]
    self.df_trial_results.set_index("Run Number", inplace=True)
30.4.4.1.2 With Event Logging Modifications
def  __init__(self):
    self.df_trial_results = pd.DataFrame()
    self.df_trial_results["Run Number"] = [0]
    self.df_trial_results["Arrivals"] = [0]
    self.df_trial_results["Mean Queue Time Cubicle"] = [0.0]
    self.df_trial_results.set_index("Run Number", inplace=True)

    self.all_event_logs = [] 

30.4.4.2 the run_trial method

Finally, in every iteration, we split out the results from our model (the dictionary) into two objects - one of which is our event log.

We can then add this event log - remembering that it also has a column added identifying which run the event log relates to - into our list.

Once we have finished the for loop - i.e. we have completed every run of the model - we can join our event log into one large dataframe.

Finally, to make it easier to experiment with building visualisations, metrics, animations and more without having to rerun the model each time, we will save the resulting dataframe as a csv.

Tip

You may wish to explore a datatype-preserving format like pickle, arrow or parquet for this export so that you don’t run into issues with dates - but csv has the benefit of being cross-plaform, human readable and machine readable.

If you are using version control (Git/GitHub), it’s recommended to add this file to your .gitignore - it’s quite large and will change frequently!

30.4.4.2.1 Original
def run_trial(self):
    for run in range(g.number_of_runs):
        random.seed(run)

        my_model = Model(run)
        my_model.run()

        self.df_trial_results.loc[run] = [
            my_model.mean_q_time_cubicle
        ]

    return self.df_trial_results
30.4.4.2.2 With Event Logging Modifications
def run_trial(self):
    for run in range(g.number_of_runs):
        random.seed(run)

        my_model = Model(run)
        model_outputs = my_model.run()
        patient_level_results = model_outputs["results"] 
        event_log = model_outputs["event_log"] 

        self.df_trial_results.loc[run] = [
            my_model.mean_q_time_cubicle
        ]

        self.all_event_logs.append(event_log) 

    self.all_event_logs = pd.concat(self.all_event_logs) 

    self.all_event_logs.to_csv("event_logs.csv") 

30.5 Logging resource IDs

If we want to make use of our full ability to monitor resource utilisation at a detailed level, and to create animations with the vidigi package, we also need to change the way we request resources to account for the fact we are now using a simpy store instead of directly interacting with our simpy resources.

Where we would have previously used

with self.treatment_cubicles.request() as req:
    # Seize a treatment resource when available
    yield req

    # ALL CODE WHERE WE NEED TO KEEP HOLD OF THE RESOURCE

# CONTINUE AFTER RELEASING RESOURCE HERE

we instead now use

# Seize a treatment resource when available
treatment_resource = yield self.treatment_cubicles.get()

# ALL CODE WHERE WE NEED TO KEEP HOLD OF THE RESOURCE

# CONTINUE AFTER RELEASING RESOURCE HERE

# Resource is no longer in use, so put it back in the store
self.treatment_cubicles.put(treatment_resource)

30.5.1 Imports

We will need to add the following to our imports:

from vidigi.utils import populate_store

30.5.2 The Model Class

30.5.2.1 the init_resources method

Vidigi needs to know which resource a user made use of so that we can ensure it stays with the correct resource throughout its time in the animation.

The standard simpy Resource does not have a way of tracking that, so we need to do two things: - create a simpy Store that we will store our resources in - use the vidigi helper function populate_store() to generate a store full of special resources that each have a unique ID we can track when doing our event logging

Overall, the use of stores won’t generally change your code too much - and we cover exactly what needs to change a little later in this document.

If you are using priority resources, this step will be a little different - see Example 3 in the documents if you need to use Resources that prioritise some entities over others.

30.5.2.1.1 Original
def init_resources(self):
    self.treatment_cubicles = simpy.Resource(
        self.env,
        capacity=g.n_cubicles
        )
30.5.2.1.2 With Vidigi Modifications
def init_resources(self):
    self.treatment_cubicles = simpy.Store(self.env) 

    populate_store(  
        num_resources=g.n_cubicles, 
        simpy_store=self.treatment_cubicles, 
        sim_env=self.env
        ) 

30.5.2.2 the attend_clinic method

We will return to our attend_clinic method, where we previously added in event logging, and now

  • make use of this new resource ID attribute
  • make the required changes to how resources are requested and released for the next user
30.5.2.2.1 Original
def attend_clinic(self, patient):
    self.arrival = self.env.now
    self.event_log.append(
        {'patient': patient.identifier,
            'pathway': 'Simplest',
            'event_type': 'arrival_departure',
            # you must use this event name for arrival events
            'event': 'arrival',
            'time': self.env.now}
    )

    # request examination resource
    start_wait = self.env.now
    self.event_log.append(
        {'patient': patient.identifier,
            'pathway': 'Simplest',
            # for a queue, you can define your chosen event name
            'event': 'treatment_wait_begins',
            'event_type': 'queue',
            'time': self.env.now}
    )

    # Seize a treatment resource when available
    with self.treatment_cubicles.request() as req:
        # Seize a treatment resource when available
        yield req

        # record the waiting time for registration
        self.wait_treat = self.env.now - start_wait
        self.event_log.append(
            {'patient': patient.identifier,
                'pathway': 'Simplest',
                'event': 'treatment_begins',
                # for a resource_use, you can define your chosen event name
                'event_type': 'resource_use',
                'time': self.env.now
                }
        )

        # sample treatment duration
        self.treat_duration = self.treat_dist.sample()
        yield self.env.timeout(self.treat_duration)

        self.event_log.append(
            {'patient': patient.identifier,
                'pathway': 'Simplest',
                # for a resource_use_end, you can define your chosen event name
                'event': 'treatment_complete',
                'event_type': 'resource_use_end',
                'time': self.env.now
                }
        )

    # total time in system
    self.total_time = self.env.now - self.arrival
    self.event_log.append(
        {'patient': patient.identifier,
        'pathway': 'Simplest',
        'event': 'depart', # you must use this event name for departure events
        'event_type': 'arrival_departure',
        'time': self.env.now}
    )
30.5.2.2.2 With Custom Resource Modifications
def attend_clinic(self, patient):
    self.arrival = self.env.now
    self.event_log.append( 
        {'patient': patient.identifier, 
            'pathway': 'Simplest', 
            'event_type': 'arrival_departure', 
            # you must use this event name for arrival events
            'event': 'arrival',  
            'time': self.env.now} 
    ) 

    # request examination resource
    start_wait = self.env.now
    self.event_log.append( 
        {'patient': patient.identifier, 
            'pathway': 'Simplest', 
            # for a queue, you can define your chosen event name
            'event': 'treatment_wait_begins', 
            'event_type': 'queue', 
            'time': self.env.now} 
    ) 

    # Seize a treatment resource when available
    treatment_resource = yield self.treatment_cubicles.get() 

    # record the waiting time for registration
    self.wait_treat = self.env.now - start_wait
    self.event_log.append( 
        {'patient': patient.identifier, 
            'pathway': 'Simplest', 
            'event': 'treatment_begins', 
            # for a resource_use, you can define your chosen event name
            'event_type': 'resource_use', 
            'time': self.env.now, 
            # grab the resource id from the treatment_resource requested
            'resource_id': treatment_resource.id_attribute 
            } 
    ) 

    # sample treatment duration
    self.treat_duration = self.treat_dist.sample()
    yield self.env.timeout(self.treat_duration)

    self.event_log.append( 
        {'patient': patient.identifier, 
            'pathway': 'Simplest', 
            # for a resource_use_end, you can define your chosen event name
            'event': 'treatment_complete', 
            'event_type': 'resource_use_end', 
            'time': self.env.now, 
            'resource_id': treatment_resource.id_attribute} 
    ) 

    # Resource is no longer in use, so put it back in the store 
    self.treatment_cubicles.put(treatment_resource) 

    # total time in system
    self.total_time = self.env.now - self.arrival
    self.event_log.append( 
        {'patient': patient.identifier, 
        'pathway': 'Simplest', 
        'event': 'depart', # you must use this event name for departure events 
        'event_type': 'arrival_departure', 
        'time': self.env.now} 
    ) 

30.6 Summary

In this chapter, we have demonstrated just a few of the things you can do with event logs, and how you can efficiently add them to an existing model.

In the rest of this section, we will dive deeper into some reproducible snippets and patterns to help you generate useful outputs from event logs, as well as a deeper dive into turning these into process maps with bupaR and animations with vidigi.