30  Event Logging

In this chapter and beyond, we will be using the term ‘event logging’ to describe the process of generating a step-by-step log of what happens to each entity as they pass through our system. The resulting file will be an ‘event log’.

But why bother with event logging? In previous chapters, we have managed to record a range of useful metrics from our simulation, and used these to display the performance of our model with different sets of parameters. In the debugging chapters, we have explored how to track down issues in our simulation with a combination of print statements and the logging module. So what else is there left to do?

Event logging can be a way to tackle both of the above in a robust, efficient and reusable way - while also opening up the use of a wide range of reusable boilerplate code for visualising, animating and formally testing our simulation outputs.

Tip

While this chapter - and the rest of the book - will make use of a particular structure for our event log, this isn’t the only way you can set up an event log. However, this structure has been tested to work well across a range of different models, and interfaces well with the vidigi package for animated simulation outputs.

30.1 A sample event log

First, it may be helpful to see what an event log looks like.

A sample log is displayed below:

import pandas as pd
event_log = pd.read_csv("resources/sample_event_log.csv")

event_log
entity_id pathway event_type event time resource_id run
0 1 Simplest arrival_departure arrival 0.000000 NaN 0
1 1 Simplest queue treatment_wait_begins 0.000000 NaN 0
2 1 Simplest resource_use treatment_begins 0.000000 1.0 0
3 2 Simplest arrival_departure arrival 3.399660 NaN 0
4 2 Simplest queue treatment_wait_begins 3.399660 NaN 0
5 2 Simplest resource_use treatment_begins 3.399660 2.0 0
6 3 Simplest arrival_departure arrival 8.497645 NaN 0
7 3 Simplest queue treatment_wait_begins 8.497645 NaN 0
8 3 Simplest resource_use treatment_begins 8.497645 3.0 0
9 4 Simplest arrival_departure arrival 8.596678 NaN 0
10 4 Simplest queue treatment_wait_begins 8.596678 NaN 0
11 4 Simplest resource_use treatment_begins 8.596678 4.0 0
12 5 Simplest arrival_departure arrival 8.608025 NaN 0
13 5 Simplest queue treatment_wait_begins 8.608025 NaN 0
14 6 Simplest arrival_departure arrival 11.359739 NaN 0
15 6 Simplest queue treatment_wait_begins 11.359739 NaN 0
16 7 Simplest arrival_departure arrival 19.509442 NaN 0
17 7 Simplest queue treatment_wait_begins 19.509442 NaN 0
18 8 Simplest arrival_departure arrival 22.877356 NaN 0
19 8 Simplest queue treatment_wait_begins 22.877356 NaN 0
20 9 Simplest arrival_departure arrival 26.653863 NaN 0
21 9 Simplest queue treatment_wait_begins 26.653863 NaN 0
22 1 Simplest resource_use_end treatment_complete 40.317385 1.0 0
23 1 Simplest arrival_departure depart 40.317385 NaN 0
24 5 Simplest resource_use treatment_begins 40.317385 1.0 0
25 10 Simplest arrival_departure arrival 40.737793 NaN 0
26 10 Simplest queue treatment_wait_begins 40.737793 NaN 0
27 2 Simplest resource_use_end treatment_complete 42.443230 2.0 0
28 2 Simplest arrival_departure depart 42.443230 NaN 0
29 6 Simplest resource_use treatment_begins 42.443230 2.0 0
30 4 Simplest resource_use_end treatment_complete 48.809628 4.0 0
31 4 Simplest arrival_departure depart 48.809628 NaN 0
32 7 Simplest resource_use treatment_begins 48.809628 4.0 0
33 3 Simplest resource_use_end treatment_complete 51.483457 3.0 0
34 3 Simplest arrival_departure depart 51.483457 NaN 0
35 8 Simplest resource_use treatment_begins 51.483457 3.0 0
36 11 Simplest arrival_departure arrival 71.026558 NaN 0
37 11 Simplest queue treatment_wait_begins 71.026558 NaN 0
38 5 Simplest resource_use_end treatment_complete 77.447488 1.0 0
39 5 Simplest arrival_departure depart 77.447488 NaN 0
40 9 Simplest resource_use treatment_begins 77.447488 1.0 0
41 6 Simplest resource_use_end treatment_complete 83.962251 2.0 0
42 6 Simplest arrival_departure depart 83.962251 NaN 0
43 10 Simplest resource_use treatment_begins 83.962251 2.0 0
44 12 Simplest arrival_departure arrival 87.458700 NaN 0
45 12 Simplest queue treatment_wait_begins 87.458700 NaN 0
46 13 Simplest arrival_departure arrival 87.465138 NaN 0
47 13 Simplest queue treatment_wait_begins 87.465138 NaN 0
48 7 Simplest resource_use_end treatment_complete 95.498040 4.0 0
49 7 Simplest arrival_departure depart 95.498040 NaN 0

Let’s break down the key components.

30.1.1 Mandatory Columns

We have several mandatory columns:

  • entity_id
  • event_type
  • event
  • time

entity_id: a unique identifider to allow us to follow a given entity through their journey

event_type: this column is used to distinguish between three key kinds of events:

  • arrival_departure: an entity first entering the system, or the entity leaving the system
  • queue: an entity beginning to queue for a resource
    • this can also be used to just generally record the movement of someone through points of a system that don’t necessarily have a traditional ‘queue’
  • resource_use: this relates to anything where someone starts or ends their time with a resource in the system

event: this column further breaks down what is happening during each event type, such as what stage of the system people are waiting to interact with

time: this can be an absolute timestamp in the form of a datetime (e.g. 2027-01-01 23:01:47), or a relative timestamp in time units from the start of the simulation.

30.1.3 Optional Columns

There are a few more recommended but optional columns:

  • resource_id
  • pathway

resource_id: While a traditional simpy resource does not have a concept of a resource ID, there are various ways we can tackle adding one into our system. Let’s assume we have 5 nurses - it’s very helpful to know which of these 5 nurses are in use at any given time for visualisation and auditing purposes. A discussion of approaches to this will be found in a later chapter (coming soon), or you can take a look at the populate_store function from vidigi for a discussion on one approach to this: click here

30.2 Exploring our log

Let’s explore what we can find out even from this small sample of logs.

30.2.1 Tracking the journey of a single entity/patient

We can easily filter down by entity ID to see how the journey of an individual looks

event_log[event_log["entity_id"] == 1]
entity_id pathway event_type event time resource_id run
0 1 Simplest arrival_departure arrival 0.000000 NaN 0
1 1 Simplest queue treatment_wait_begins 0.000000 NaN 0
2 1 Simplest resource_use treatment_begins 0.000000 1.0 0
22 1 Simplest resource_use_end treatment_complete 40.317385 1.0 0
23 1 Simplest arrival_departure depart 40.317385 NaN 0

We can see that this person arrived and was seen immediately.

30.2.2 Calculating waits

In this event log, patients who wait for treatment (an event of ‘treatment_wait_begins’) will have the next event of ‘treatment_wait’. We’ll start by just pulling out those instances.

treatment_waits_df = event_log[event_log["event"].isin(["treatment_wait_begins","treatment_begins"])]

treatment_waits_df
entity_id pathway event_type event time resource_id run
1 1 Simplest queue treatment_wait_begins 0.000000 NaN 0
2 1 Simplest resource_use treatment_begins 0.000000 1.0 0
4 2 Simplest queue treatment_wait_begins 3.399660 NaN 0
5 2 Simplest resource_use treatment_begins 3.399660 2.0 0
7 3 Simplest queue treatment_wait_begins 8.497645 NaN 0
8 3 Simplest resource_use treatment_begins 8.497645 3.0 0
10 4 Simplest queue treatment_wait_begins 8.596678 NaN 0
11 4 Simplest resource_use treatment_begins 8.596678 4.0 0
13 5 Simplest queue treatment_wait_begins 8.608025 NaN 0
15 6 Simplest queue treatment_wait_begins 11.359739 NaN 0
17 7 Simplest queue treatment_wait_begins 19.509442 NaN 0
19 8 Simplest queue treatment_wait_begins 22.877356 NaN 0
21 9 Simplest queue treatment_wait_begins 26.653863 NaN 0
24 5 Simplest resource_use treatment_begins 40.317385 1.0 0
26 10 Simplest queue treatment_wait_begins 40.737793 NaN 0
29 6 Simplest resource_use treatment_begins 42.443230 2.0 0
32 7 Simplest resource_use treatment_begins 48.809628 4.0 0
35 8 Simplest resource_use treatment_begins 51.483457 3.0 0
37 11 Simplest queue treatment_wait_begins 71.026558 NaN 0
40 9 Simplest resource_use treatment_begins 77.447488 1.0 0
43 10 Simplest resource_use treatment_begins 83.962251 2.0 0
45 12 Simplest queue treatment_wait_begins 87.458700 NaN 0
47 13 Simplest queue treatment_wait_begins 87.465138 NaN 0

Next, we’ll pivot these to a wide format.

treatment_waits_df_wide = treatment_waits_df.pivot(index="entity_id", columns="event", values="time")

treatment_waits_df_wide = treatment_waits_df_wide[['treatment_wait_begins', 'treatment_begins']]

treatment_waits_df_wide
event treatment_wait_begins treatment_begins
entity_id
1 0.000000 0.000000
2 3.399660 3.399660
3 8.497645 8.497645
4 8.596678 8.596678
5 8.608025 40.317385
6 11.359739 42.443230
7 19.509442 48.809628
8 22.877356 51.483457
9 26.653863 77.447488
10 40.737793 83.962251
11 71.026558 NaN
12 87.458700 NaN
13 87.465138 NaN

Finally, let’s work out the average wait for only those patients who were seen by someone before our simulation finished. We first need to calculate the difference between the two stamps, which in this simulation, represents a number of minutes

treatment_waits_df_wide["treatment_wait"] = treatment_waits_df_wide["treatment_begins"] - treatment_waits_df_wide["treatment_wait_begins"]

treatment_waits_df_wide
event treatment_wait_begins treatment_begins treatment_wait
entity_id
1 0.000000 0.000000 0.000000
2 3.399660 3.399660 0.000000
3 8.497645 8.497645 0.000000
4 8.596678 8.596678 0.000000
5 8.608025 40.317385 31.709360
6 11.359739 42.443230 31.083491
7 19.509442 48.809628 29.300186
8 22.877356 51.483457 28.606101
9 26.653863 77.447488 50.793625
10 40.737793 83.962251 43.224458
11 71.026558 NaN NaN
12 87.458700 NaN NaN
13 87.465138 NaN NaN

Now we can calculate the mean of that column:

treatment_waits_df_wide["treatment_wait"].mean()
21.4717221

Or the median:

treatment_waits_df_wide["treatment_wait"].median()
28.9531435

Or the max and min waits:

treatment_waits_df_wide["treatment_wait"].agg(['min', 'max'])
min     0.000000
max    50.793625
Name: treatment_wait, dtype: float64

Or plot the distribution of waits:

import plotly.express as px

px.histogram(treatment_waits_df_wide, x="treatment_wait")

Or a boxplot:

px.box(treatment_waits_df_wide, x="treatment_wait")

30.3 Exploring a bigger event log

This would be a lot more powerful if we weren’t just working with a small sample of 50 rows of an event log. Let’s instead load in the full event log from this model, which runs for a lot longer and includes multiple runs. All our existing code can be reused.

event_log = pd.read_csv("resources/sample_event_log_10_day_10_run.csv")

# We'll limit this to the first 24 hours of our clinic

event_log = event_log[event_log["time"] <= 60 * 24]

event_log.sample(10)
entity_id pathway event_type event time resource_id run
30944 135 Simplest resource_use_end treatment_complete 1364.238468 4.0 3
10340 114 Simplest arrival_departure arrival 520.944947 NaN 1
20788 219 Simplest arrival_departure arrival 1159.087158 NaN 2
10497 75 Simplest resource_use treatment_begins 739.629092 2.0 1
61025 251 Simplest queue treatment_wait_begins 1245.354799 NaN 6
40725 174 Simplest queue treatment_wait_begins 832.687462 NaN 4
10649 93 Simplest resource_use_end treatment_complete 957.492104 1.0 1
60712 161 Simplest arrival_departure arrival 806.687920 NaN 6
40438 91 Simplest arrival_departure arrival 440.216636 NaN 4
70631 141 Simplest queue treatment_wait_begins 687.885873 NaN 7

Let’s recalculate our waits. In fact - let’s write a function that could be used to make a wait table for any pair of events!

We’ll also make sure it takes account of the ‘run’ column.

def generate_difference_df(event_log, event_1, event_2, output_col_name):
  difference_df = event_log[event_log["event"].isin([event_1,event_2])]

  difference_df_wide = difference_df.pivot(index=["entity_id", "run"], columns="event", values="time")

  difference_df_wide = difference_df_wide[[event_1,event_2]]

  difference_df_wide[output_col_name] = difference_df_wide[event_2] - difference_df_wide[event_1]

  return difference_df_wide.reset_index()

Let’s run this and view the first 20 rows.

treatment_wait_df = generate_difference_df(event_log=event_log, event_1="treatment_wait_begins", event_2="treatment_begins", output_col_name="treatment_wait")

treatment_wait_df.head(20)
event entity_id run treatment_wait_begins treatment_begins treatment_wait
0 1 0 0.000000 0.000000 0.0
1 1 1 0.000000 0.000000 0.0
2 1 2 0.000000 0.000000 0.0
3 1 3 0.000000 0.000000 0.0
4 1 4 0.000000 0.000000 0.0
5 1 5 0.000000 0.000000 0.0
6 1 6 0.000000 0.000000 0.0
7 1 7 0.000000 0.000000 0.0
8 1 8 0.000000 0.000000 0.0
9 1 9 0.000000 0.000000 0.0
10 2 0 3.399660 3.399660 0.0
11 2 1 12.021043 12.021043 0.0
12 2 2 8.965271 8.965271 0.0
13 2 3 0.033294 0.033294 0.0
14 2 4 10.399752 10.399752 0.0
15 2 5 2.645552 2.645552 0.0
16 2 6 11.489413 11.489413 0.0
17 2 7 4.011208 4.011208 0.0
18 2 8 17.040648 17.040648 0.0
19 2 9 2.861402 2.861402 0.0

Here, we never have a wait because these people are entering an empty system.

Let’s instead look at a different patient.

treatment_wait_df[treatment_wait_df["entity_id"] == 40]
event entity_id run treatment_wait_begins treatment_begins treatment_wait
390 40 0 225.946145 374.475154 148.529008
391 40 1 166.195041 396.502222 230.307181
392 40 2 229.388022 384.693144 155.305122
393 40 3 198.590107 376.281301 177.691195
394 40 4 209.135743 370.569622 161.433878
395 40 5 227.132040 376.794591 149.662551
396 40 6 173.002070 383.515632 210.513562
397 40 7 223.578745 378.418622 154.839878
398 40 8 228.714086 385.852087 157.138001
399 40 9 176.753615 370.958861 194.205245

Let’s sample a range of other quick visualisations we can make.

30.3.1 Cumulative arrivals

Let’s first just take a look at the arrivals for a single run, with each dot representing a single person arriving in our system.

px.scatter(
  event_log[
    (event_log["run"]==1) &
    (event_log["event"]=="arrival")],
    x="time",
    y="entity_id"
    )

Here, we can see how cumulative arrivals varied by run.

px.line(
  event_log[event_log["event"]=="arrival"],
  x="time",
  y="entity_id",
  color="run"
  )

30.3.2 Event scatterplot

px.scatter(
  event_log[event_log["run"]==1],
  y="time",
  color="event",
  x="entity_id"
  )

We can tell that the treatment wait very quickly gets out of hand for our individuals as the simulation progresses.

30.3.3 Treatment duration

treatment_duration_df = generate_difference_df(
  event_log=event_log,
  event_1="treatment_begins",
  event_2="treatment_complete",
  output_col_name="treatment_duration")

treatment_duration_df.head(20)
event entity_id run treatment_begins treatment_complete treatment_duration
0 1 0 0.000000 40.317385 40.317385
1 1 1 0.000000 41.226014 41.226014
2 1 2 0.000000 32.732000 32.732000
3 1 3 0.000000 40.023184 40.023184
4 1 4 0.000000 37.026630 37.026630
5 1 5 0.000000 34.941718 34.941718
6 1 6 0.000000 36.785746 36.785746
7 1 7 0.000000 41.752122 41.752122
8 1 8 0.000000 38.779312 38.779312
9 1 9 0.000000 42.284465 42.284465
10 2 0 3.399660 42.443230 39.043571
11 2 1 12.021043 46.891254 34.870211
12 2 2 8.965271 46.903314 37.938043
13 2 3 0.033294 40.296613 40.263318
14 2 4 10.399752 53.054173 42.654422
15 2 5 2.645552 39.743828 37.098276
16 2 6 11.489413 49.827460 38.338047
17 2 7 4.011208 41.798441 37.787233
18 2 8 17.040648 51.984157 34.943509
19 2 9 2.861402 42.211944 39.350541
px.box(
  treatment_duration_df,
  x="treatment_duration",
  title="Variation in treatment duration"
  )
px.box(
  treatment_duration_df, x="treatment_duration", color="run",
  range_x=[
          0,
          max(treatment_duration_df["treatment_duration"]) * 1.1
          ],
  title="Variation in treatment duration by run"
        )

30.3.4 Throughput

patients_entering_system = event_log[event_log["event"] == "arrival"]

patients_entering_system = (
  patients_entering_system
  .groupby('run')[['entity_id']]
  .count()
  .rename(columns={"entity_id": "entering_system"})
  )

patients_entering_system
entering_system
run
0 259
1 303
2 276
3 285
4 302
5 270
6 292
7 294
8 258
9 286
patients_leaving_system = event_log[event_log["event"] == "depart"]

patients_leaving_system = (
  patients_leaving_system.groupby('run')[['entity_id']]
  .count()
  .rename(columns={"entity_id": "leaving_system"})
  )

patients_leaving_system
leaving_system
run
0 140
1 142
2 141
3 141
4 142
5 143
6 140
7 142
8 143
9 142
patient_throughput_df = patients_entering_system.merge(
  patients_leaving_system,
  left_index=True,
  right_index=True
  )

patient_throughput_df["throughput"] = patient_throughput_df["leaving_system"] / patient_throughput_df["entering_system"]

patient_throughput_df["throughput_display"] = patient_throughput_df["throughput"].apply(lambda x: f"{x:.1%}")

patient_throughput_df
entering_system leaving_system throughput throughput_display
run
0 259 140 0.540541 54.1%
1 303 142 0.468647 46.9%
2 276 141 0.510870 51.1%
3 285 141 0.494737 49.5%
4 302 142 0.470199 47.0%
5 270 143 0.529630 53.0%
6 292 140 0.479452 47.9%
7 294 142 0.482993 48.3%
8 258 143 0.554264 55.4%
9 286 142 0.496503 49.7%

30.3.5 Animation

Using the vidigi package, we can take our event log and - with only minor adjustments and additions - create a full animation of our system.

event_position_df = pd.DataFrame([
                    {'event': 'arrival',
                     'x':  50, 'y': 300,
                     'label': "Arrival" },

                    # Triage - minor and trauma
                    {'event': 'treatment_wait_begins',
                     'x':  205, 'y': 275,
                     'label': "Waiting for Treatment"},

                    {'event': 'treatment_begins',
                     'x':  205, 'y': 175,
                     'resource':'n_cubicles',
                     'label': "Being Treated"},

                    {'event': 'exit',
                     'x':  270, 'y': 70,
                     'label': "Exit"}

                ])
# Create a suitable class to pass in the resource numbers to the animation function
class model_params():
    def __init__(self):
        self.n_cubicles = 4 # In this case, I know that there were four resources available.

params = model_params()

print(f"Number of nurses: {params.n_cubicles}")
Number of nurses: 4
from vidigi.animation import animate_activity_log

animate_activity_log(
        # We need to ensure we only pass in a single run of the model
        # We also need to ensure that the column 'entity_id' is renamed to 'patient' to match
        # vidigi's expectations - though this will be made more flexible in a future version
        # of the package
        event_log=event_log[event_log['run']==1],
        event_position_df=event_position_df,
        scenario=params,
        debug_mode=True,
        setup_mode=False,
        every_x_time_units=5,
        include_play_button=True,
        entity_icon_size=20,
        text_size=20,
        gap_between_entities=6,
        gap_between_queue_rows=25,
        plotly_height=700,
        frame_duration=200,
        plotly_width=1200,
        override_x_max=300,
        override_y_max=500,
        limit_duration=event_log["time"].max().astype('int'),
        wrap_queues_at=25,
        step_snapshot_max=125,
        time_display_units="dhm",
        display_stage_labels=False,
        add_background_image="https://raw.githubusercontent.com/Bergam0t/vidigi/refs/heads/main/examples/example_1_simplest_case/Simplest%20Model%20Background%20Image%20-%20Horizontal%20Layout.drawio.png",
    )
Animation function called at 11:44:12
C:\simviz\vidigi\vidigi\animation.py:1169: UserWarning:

`limit_duration` was provided as int32 (1439); rounding to nearest integer (1439).
Iteration through time-unit-by-time-unit logs complete 11:44:13
Snapshot df concatenation complete at 11:44:14
Reshaped animation dataframe finished construction at 11:44:14
Placement dataframe finished construction at 11:44:14
Output animation generation complete at 11:44:16
Total Time Elapsed: 4.17 seconds