30 Event Logging

Author

In this chapter and beyond, we will be using the term ‘event logging’ to describe the process of generating a step-by-step log of what happens to each entity as they pass through our system. The resulting file will be an ‘event log’.

But why bother with event logging? In previous chapters, we have managed to record a range of useful metrics from our simulation, and used these to display the performance of our model with different sets of parameters. In the debugging chapters, we have explored how to track down issues in our simulation with a combination of print statements and the logging module. So what else is there left to do?

Event logging can be a way to tackle both of the above in a robust, efficient and reusable way - while also opening up the use of a wide range of reusable boilerplate code for visualising, animating and formally testing our simulation outputs.

Tip

While this chapter - and the rest of the book - will make use of a particular structure for our event log, this isn’t the only way you can set up an event log. However, this structure has been tested to work well across a range of different models, and interfaces well with the vidigi package for animated simulation outputs.

30.1 A sample event log

First, it may be helpful to see what an event log looks like.

A sample log is displayed below:

import pandas as pd
event_log = pd.read_csv("resources/sample_event_log.csv")

event_log

	entity_id	pathway	event_type	event	time	resource_id
0	1	Simplest	arrival_departure	arrival	0.000000	NaN
1	1	Simplest	queue	treatment_wait_begins	0.000000	NaN
2	1	Simplest	resource_use	treatment_begins	0.000000	1.0
3	2	Simplest	arrival_departure	arrival	3.399660	NaN
4	2	Simplest	queue	treatment_wait_begins	3.399660	NaN
5	2	Simplest	resource_use	treatment_begins	3.399660	2.0
6	3	Simplest	arrival_departure	arrival	8.497645	NaN
7	3	Simplest	queue	treatment_wait_begins	8.497645	NaN
8	3	Simplest	resource_use	treatment_begins	8.497645	3.0
9	4	Simplest	arrival_departure	arrival	8.596678	NaN
10	4	Simplest	queue	treatment_wait_begins	8.596678	NaN
11	4	Simplest	resource_use	treatment_begins	8.596678	4.0
12	5	Simplest	arrival_departure	arrival	8.608025	NaN
13	5	Simplest	queue	treatment_wait_begins	8.608025	NaN
14	6	Simplest	arrival_departure	arrival	11.359739	NaN
15	6	Simplest	queue	treatment_wait_begins	11.359739	NaN
16	7	Simplest	arrival_departure	arrival	19.509442	NaN
17	7	Simplest	queue	treatment_wait_begins	19.509442	NaN
18	8	Simplest	arrival_departure	arrival	22.877356	NaN
19	8	Simplest	queue	treatment_wait_begins	22.877356	NaN
20	9	Simplest	arrival_departure	arrival	26.653863	NaN
21	9	Simplest	queue	treatment_wait_begins	26.653863	NaN
22	1	Simplest	resource_use_end	treatment_complete	40.317385	1.0
23	1	Simplest	arrival_departure	depart	40.317385	NaN
24	5	Simplest	resource_use	treatment_begins	40.317385	1.0
25	10	Simplest	arrival_departure	arrival	40.737793	NaN
26	10	Simplest	queue	treatment_wait_begins	40.737793	NaN
27	2	Simplest	resource_use_end	treatment_complete	42.443230	2.0
28	2	Simplest	arrival_departure	depart	42.443230	NaN
29	6	Simplest	resource_use	treatment_begins	42.443230	2.0
30	4	Simplest	resource_use_end	treatment_complete	48.809628	4.0
31	4	Simplest	arrival_departure	depart	48.809628	NaN
32	7	Simplest	resource_use	treatment_begins	48.809628	4.0
33	3	Simplest	resource_use_end	treatment_complete	51.483457	3.0
34	3	Simplest	arrival_departure	depart	51.483457	NaN
35	8	Simplest	resource_use	treatment_begins	51.483457	3.0
36	11	Simplest	arrival_departure	arrival	71.026558	NaN
37	11	Simplest	queue	treatment_wait_begins	71.026558	NaN
38	5	Simplest	resource_use_end	treatment_complete	77.447488	1.0
39	5	Simplest	arrival_departure	depart	77.447488	NaN
40	9	Simplest	resource_use	treatment_begins	77.447488	1.0
41	6	Simplest	resource_use_end	treatment_complete	83.962251	2.0
42	6	Simplest	arrival_departure	depart	83.962251	NaN
43	10	Simplest	resource_use	treatment_begins	83.962251	2.0
44	12	Simplest	arrival_departure	arrival	87.458700	NaN
45	12	Simplest	queue	treatment_wait_begins	87.458700	NaN
46	13	Simplest	arrival_departure	arrival	87.465138	NaN
47	13	Simplest	queue	treatment_wait_begins	87.465138	NaN
48	7	Simplest	resource_use_end	treatment_complete	95.498040	4.0
49	7	Simplest	arrival_departure	depart	95.498040	NaN

Let’s break down the key components.

30.1.1 Mandatory Columns

We have several mandatory columns:

entity_id
event_type
event
time

entity_id: a unique identifider to allow us to follow a given entity through their journey

event_type: this column is used to distinguish between three key kinds of events:

arrival_departure: an entity first entering the system, or the entity leaving the system
queue: an entity beginning to queue for a resource
- this can also be used to just generally record the movement of someone through points of a system that don’t necessarily have a traditional ‘queue’
resource_use: this relates to anything where someone starts or ends their time with a resource in the system

event: this column further breaks down what is happening during each event type, such as what stage of the system people are waiting to interact with

time: this can be an absolute timestamp in the form of a datetime (e.g. 2027-01-01 23:01:47), or a relative timestamp in time units from the start of the simulation.

30.1.2 Recommended Columns

In addition, the following columns are highly recommended to include:

run: this separates out results across multiple simulations, allowing for easy grouping of results by run

30.1.3 Optional Columns

There are a few more recommended but optional columns:

resource_id
pathway

resource_id: While a traditional simpy resource does not have a concept of a resource ID, there are various ways we can tackle adding one into our system. Let’s assume we have 5 nurses - it’s very helpful to know which of these 5 nurses are in use at any given time for visualisation and auditing purposes. A discussion of approaches to this will be found in a later chapter (coming soon), or you can take a look at the populate_store function from vidigi for a discussion on one approach to this: click here

30.2 Exploring our log

Let’s explore what we can find out even from this small sample of logs.

30.2.1 Tracking the journey of a single entity/patient

We can easily filter down by entity ID to see how the journey of an individual looks

event_log[event_log["entity_id"] == 1]

	entity_id	pathway	event_type	event	time	resource_id
0	1	Simplest	arrival_departure	arrival	0.000000	NaN
1	1	Simplest	queue	treatment_wait_begins	0.000000	NaN
2	1	Simplest	resource_use	treatment_begins	0.000000	1.0
22	1	Simplest	resource_use_end	treatment_complete	40.317385	1.0
23	1	Simplest	arrival_departure	depart	40.317385	NaN

We can see that this person arrived and was seen immediately.

30.2.2 Calculating waits

In this event log, patients who wait for treatment (an event of ‘treatment_wait_begins’) will have the next event of ‘treatment_wait’. We’ll start by just pulling out those instances.

treatment_waits_df = event_log[event_log["event"].isin(["treatment_wait_begins","treatment_begins"])]

treatment_waits_df

	entity_id	pathway	event_type	event	time	resource_id
1	1	Simplest	queue	treatment_wait_begins	0.000000	NaN
2	1	Simplest	resource_use	treatment_begins	0.000000	1.0
4	2	Simplest	queue	treatment_wait_begins	3.399660	NaN
5	2	Simplest	resource_use	treatment_begins	3.399660	2.0
7	3	Simplest	queue	treatment_wait_begins	8.497645	NaN
8	3	Simplest	resource_use	treatment_begins	8.497645	3.0
10	4	Simplest	queue	treatment_wait_begins	8.596678	NaN
11	4	Simplest	resource_use	treatment_begins	8.596678	4.0
13	5	Simplest	queue	treatment_wait_begins	8.608025	NaN
15	6	Simplest	queue	treatment_wait_begins	11.359739	NaN
17	7	Simplest	queue	treatment_wait_begins	19.509442	NaN
19	8	Simplest	queue	treatment_wait_begins	22.877356	NaN
21	9	Simplest	queue	treatment_wait_begins	26.653863	NaN
24	5	Simplest	resource_use	treatment_begins	40.317385	1.0
26	10	Simplest	queue	treatment_wait_begins	40.737793	NaN
29	6	Simplest	resource_use	treatment_begins	42.443230	2.0
32	7	Simplest	resource_use	treatment_begins	48.809628	4.0
35	8	Simplest	resource_use	treatment_begins	51.483457	3.0
37	11	Simplest	queue	treatment_wait_begins	71.026558	NaN
40	9	Simplest	resource_use	treatment_begins	77.447488	1.0
43	10	Simplest	resource_use	treatment_begins	83.962251	2.0
45	12	Simplest	queue	treatment_wait_begins	87.458700	NaN
47	13	Simplest	queue	treatment_wait_begins	87.465138	NaN

Next, we’ll pivot these to a wide format.

treatment_waits_df_wide = treatment_waits_df.pivot(index="entity_id", columns="event", values="time")

treatment_waits_df_wide = treatment_waits_df_wide[['treatment_wait_begins', 'treatment_begins']]

treatment_waits_df_wide

event	treatment_wait_begins	treatment_begins
entity_id
1	0.000000	0.000000
2	3.399660	3.399660
3	8.497645	8.497645
4	8.596678	8.596678
5	8.608025	40.317385
6	11.359739	42.443230
7	19.509442	48.809628
8	22.877356	51.483457
9	26.653863	77.447488
10	40.737793	83.962251
11	71.026558	NaN
12	87.458700	NaN
13	87.465138	NaN

Finally, let’s work out the average wait for only those patients who were seen by someone before our simulation finished. We first need to calculate the difference between the two stamps, which in this simulation, represents a number of minutes

treatment_waits_df_wide["treatment_wait"] = treatment_waits_df_wide["treatment_begins"] - treatment_waits_df_wide["treatment_wait_begins"]

treatment_waits_df_wide

event	treatment_wait_begins	treatment_begins	treatment_wait
entity_id
1	0.000000	0.000000	0.000000
2	3.399660	3.399660	0.000000
3	8.497645	8.497645	0.000000
4	8.596678	8.596678	0.000000
5	8.608025	40.317385	31.709360
6	11.359739	42.443230	31.083491
7	19.509442	48.809628	29.300186
8	22.877356	51.483457	28.606101
9	26.653863	77.447488	50.793625
10	40.737793	83.962251	43.224458
11	71.026558	NaN	NaN
12	87.458700	NaN	NaN
13	87.465138	NaN	NaN

Now we can calculate the mean of that column:

treatment_waits_df_wide["treatment_wait"].mean()

np.float64(21.4717221)

Or the median:

treatment_waits_df_wide["treatment_wait"].median()

np.float64(28.9531435)

Or the max and min waits:

treatment_waits_df_wide["treatment_wait"].agg(['min', 'max'])

min     0.000000
max    50.793625
Name: treatment_wait, dtype: float64

Or plot the distribution of waits:

import plotly.express as px

px.histogram(treatment_waits_df_wide, x="treatment_wait")

Or a boxplot:

px.box(treatment_waits_df_wide, x="treatment_wait")

30.3 Exploring a bigger event log

This would be a lot more powerful if we weren’t just working with a small sample of 50 rows of an event log. Let’s instead load in the full event log from this model, which runs for a lot longer and includes multiple runs. All our existing code can be reused.

event_log = pd.read_csv("resources/sample_event_log_10_day_10_run.csv")

# We'll limit this to the first 24 hours of our clinic

event_log = event_log[event_log["time"] <= 60 * 24]

event_log.sample(10)

	entity_id	pathway	event_type	event	time	resource_id	run
40922	238	Simplest	queue	treatment_wait_begins	1082.291238	NaN	4
70708	165	Simplest	arrival_departure	arrival	782.125020	NaN	7
70407	73	Simplest	arrival_departure	arrival	389.943492	NaN	7
50236	15	Simplest	resource_use	treatment_begins	132.662770	3.0	5
468	69	Simplest	arrival_departure	depart	716.403642	NaN	0
913	139	Simplest	arrival_departure	depart	1404.088281	NaN	0
60922	105	Simplest	arrival_departure	depart	1109.931131	NaN	6
20023	5	Simplest	queue	treatment_wait_begins	42.496097	NaN	2
41064	278	Simplest	arrival_departure	arrival	1287.351088	NaN	4
879	242	Simplest	arrival_departure	arrival	1342.820571	NaN	0

Let’s recalculate our waits. In fact - let’s write a function that could be used to make a wait table for any pair of events!

We’ll also make sure it takes account of the ‘run’ column.

def generate_difference_df(event_log, event_1, event_2, output_col_name):
  difference_df = event_log[event_log["event"].isin([event_1,event_2])]

  difference_df_wide = difference_df.pivot(index=["entity_id", "run"], columns="event", values="time")

  difference_df_wide = difference_df_wide[[event_1,event_2]]

  difference_df_wide[output_col_name] = difference_df_wide[event_2] - difference_df_wide[event_1]

  return difference_df_wide.reset_index()

Let’s run this and view the first 20 rows.

treatment_wait_df = generate_difference_df(event_log=event_log, event_1="treatment_wait_begins", event_2="treatment_begins", output_col_name="treatment_wait")

treatment_wait_df.head(20)

event	entity_id	run	treatment_wait_begins	treatment_begins
0	1	0	0.000000	0.000000
1	1	1	0.000000	0.000000
2	1	2	0.000000	0.000000
3	1	3	0.000000	0.000000
4	1	4	0.000000	0.000000
5	1	5	0.000000	0.000000
6	1	6	0.000000	0.000000
7	1	7	0.000000	0.000000
8	1	8	0.000000	0.000000
9	1	9	0.000000	0.000000
10	2	0	3.399660	3.399660
11	2	1	12.021043	12.021043
12	2	2	8.965271	8.965271
13	2	3	0.033294	0.033294
14	2	4	10.399752	10.399752
15	2	5	2.645552	2.645552
16	2	6	11.489413	11.489413
17	2	7	4.011208	4.011208
18	2	8	17.040648	17.040648
19	2	9	2.861402	2.861402

Here, we never have a wait because these people are entering an empty system.

Let’s instead look at a different patient.

treatment_wait_df[treatment_wait_df["entity_id"] == 40]

event	entity_id	run	treatment_wait_begins	treatment_begins	treatment_wait
390	40	0	225.946145	374.475154	148.529008
391	40	1	166.195041	396.502222	230.307181
392	40	2	229.388022	384.693144	155.305122
393	40	3	198.590107	376.281301	177.691195
394	40	4	209.135743	370.569622	161.433878
395	40	5	227.132040	376.794591	149.662551
396	40	6	173.002070	383.515632	210.513562
397	40	7	223.578745	378.418622	154.839878
398	40	8	228.714086	385.852087	157.138001
399	40	9	176.753615	370.958861	194.205245

Let’s sample a range of other quick visualisations we can make.

30.3.1 Cumulative arrivals

Let’s first just take a look at the arrivals for a single run, with each dot representing a single person arriving in our system.

px.scatter(
  event_log[
    (event_log["run"]==1) &
    (event_log["event"]=="arrival")],
    x="time",
    y="entity_id"
    )

Here, we can see how cumulative arrivals varied by run.

px.line(
  event_log[event_log["event"]=="arrival"],
  x="time",
  y="entity_id",
  color="run"
  )

30.3.2 Event scatterplot

px.scatter(
  event_log[event_log["run"]==1],
  y="time",
  color="event",
  x="entity_id"
  )

We can tell that the treatment wait very quickly gets out of hand for our individuals as the simulation progresses.

30.3.3 Treatment duration

treatment_duration_df = generate_difference_df(
  event_log=event_log,
  event_1="treatment_begins",
  event_2="treatment_complete",
  output_col_name="treatment_duration")

treatment_duration_df.head(20)

event	entity_id	run	treatment_begins	treatment_complete	treatment_duration
0	1	0	0.000000	40.317385	40.317385
1	1	1	0.000000	41.226014	41.226014
2	1	2	0.000000	32.732000	32.732000
3	1	3	0.000000	40.023184	40.023184
4	1	4	0.000000	37.026630	37.026630
5	1	5	0.000000	34.941718	34.941718
6	1	6	0.000000	36.785746	36.785746
7	1	7	0.000000	41.752122	41.752122
8	1	8	0.000000	38.779312	38.779312
9	1	9	0.000000	42.284465	42.284465
10	2	0	3.399660	42.443230	39.043571
11	2	1	12.021043	46.891254	34.870211
12	2	2	8.965271	46.903314	37.938043
13	2	3	0.033294	40.296613	40.263318
14	2	4	10.399752	53.054173	42.654422
15	2	5	2.645552	39.743828	37.098276
16	2	6	11.489413	49.827460	38.338047
17	2	7	4.011208	41.798441	37.787233
18	2	8	17.040648	51.984157	34.943509
19	2	9	2.861402	42.211944	39.350541

px.box(
  treatment_duration_df,
  x="treatment_duration",
  title="Variation in treatment duration"
  )

px.box(
  treatment_duration_df, x="treatment_duration", color="run",
  range_x=[
          0,
          max(treatment_duration_df["treatment_duration"]) * 1.1
          ],
  title="Variation in treatment duration by run"
        )

30.3.4 Throughput

patients_entering_system = event_log[event_log["event"] == "arrival"]

patients_entering_system = (
  patients_entering_system
  .groupby('run')[['entity_id']]
  .count()
  .rename(columns={"entity_id": "entering_system"})
  )

patients_entering_system

	entering_system
run
0	259
1	303
2	276
3	285
4	302
5	270
6	292
7	294
8	258
9	286

patients_leaving_system = event_log[event_log["event"] == "depart"]

patients_leaving_system = (
  patients_leaving_system.groupby('run')[['entity_id']]
  .count()
  .rename(columns={"entity_id": "leaving_system"})
  )

patients_leaving_system

	leaving_system
run
0	140
1	142
2	141
3	141
4	142
5	143
6	140
7	142
8	143
9	142

patient_throughput_df = patients_entering_system.merge(
  patients_leaving_system,
  left_index=True,
  right_index=True
  )

patient_throughput_df["throughput"] = patient_throughput_df["leaving_system"] / patient_throughput_df["entering_system"]

patient_throughput_df["throughput_display"] = patient_throughput_df["throughput"].apply(lambda x: f"{x:.1%}")

patient_throughput_df

	entering_system	leaving_system	throughput	throughput_display
run
0	259	140	0.540541	54.1%
1	303	142	0.468647	46.9%
2	276	141	0.510870	51.1%
3	285	141	0.494737	49.5%
4	302	142	0.470199	47.0%
5	270	143	0.529630	53.0%
6	292	140	0.479452	47.9%
7	294	142	0.482993	48.3%
8	258	143	0.554264	55.4%
9	286	142	0.496503	49.7%

30.3.5 Animation

Using the vidigi package, we can take our event log and - with only minor adjustments and additions - create a full animation of our system.

event_position_df = pd.DataFrame([
                    {'event': 'arrival',
                     'x':  50, 'y': 300,
                     'label': "Arrival" },

                    # Triage - minor and trauma
                    {'event': 'treatment_wait_begins',
                     'x':  205, 'y': 275,
                     'label': "Waiting for Treatment"},

                    {'event': 'treatment_begins',
                     'x':  205, 'y': 175,
                     'resource':'n_cubicles',
                     'label': "Being Treated"},

                    {'event': 'exit',
                     'x':  270, 'y': 70,
                     'label': "Exit"}

                ])

# Create a suitable class to pass in the resource numbers to the animation function
class model_params():
    def __init__(self):
        self.n_cubicles = 4 # In this case, I know that there were four resources available.

params = model_params()

print(f"Number of nurses: {params.n_cubicles}")

Number of nurses: 4

from vidigi.animation import animate_activity_log

animate_activity_log(
        # We need to ensure we only pass in a single run of the model
        # We also need to ensure that the column 'entity_id' is renamed to 'patient' to match
        # vidigi's expectations - though this will be made more flexible in a future version
        # of the package
        event_log=event_log[event_log['run']==1].rename(columns={"entity_id":"patient"}),
        event_position_df=event_position_df,
        scenario=params,
        debug_mode=True,
        setup_mode=False,
        every_x_time_units=5,
        include_play_button=True,
        icon_and_text_size=20,
        gap_between_entities=6,
        gap_between_rows=25,
        plotly_height=700,
        frame_duration=200,
        plotly_width=1200,
        override_x_max=300,
        override_y_max=500,
        limit_duration=event_log["time"].max().astype('int'),
        wrap_queues_at=25,
        step_snapshot_max=125,
        time_display_units="dhm",
        display_stage_labels=False,
        add_background_image="https://raw.githubusercontent.com/Bergam0t/vidigi/refs/heads/main/examples/example_1_simplest_case/Simplest%20Model%20Background%20Image%20-%20Horizontal%20Layout.drawio.png",
    )

Animation function called at 17:38:12
Iteration through minute-by-minute logs complete 17:38:13
Snapshot df concatenation complete at 17:38:13
Reshaped animation dataframe finished construction at 17:38:14
Placement dataframe finished construction at 17:38:14

C:\Users\Sammi\anaconda3\envs\des\Lib\site-packages\vidigi\animation.py:119: FutureWarning:

The 'unit' keyword in TimedeltaIndex construction is deprecated and will be removed in a future version. Use pd.to_timedelta instead.

Output animation generation complete at 17:38:15
Total Time Elapsed: 2.19 seconds

30.4 Adding event logging to your own model

Here, we’re going to work with the very basic model from the chapter ‘An Example SimPy Model’.

Tip

Changed code will be highlighted in the right-hand column, with unchanged column displayed less brightly.

30.4.1 the g Class

Our g class is unchanged.

30.4.2 the Entity Class

Our entity class - in this case, Patient - is unchanged.

30.4.3 the Model Class

30.4.3.1 The init method

To our init method for the Model class, we add an empty list that will store event logs throughout the model run for each patient.

30.4.3.1.1 Original

def __init__(self, run_number):
    # Create a SimPy environment in which everything will live
    self.env = simpy.Environment()

    # Create a patient counter (which we'll use as a patient ID)
    self.patient_counter = 0

    # Create an empty list to store patient objects in
    self.patients = []

    # Create our resources
    self.init_resources()

    # Store the passed in run number
    self.run_number = run_number

    # Create a new Pandas DataFrame that will store some results
    # against the patient ID (which we'll use as the index).
    self.results_df = pd.DataFrame()
    self.results_df["Patient ID"] = [1]
    self.results_df["Queue Time Cubicle"] = [0.0]
    self.results_df["Time with Nurse"] = [0.0]
    self.results_df.set_index("Patient ID", inplace=True)

    # Create an attribute to store the mean queuing times
    # across this run of the model
    self.mean_q_time_cubicle = 0

    self.patient_inter_arrival_dist = Exponential(
        mean = g.arrival_rate,
        random_seed = self.run_number*g.random_number_set
        )

    self.treat_dist = Lognormal(
        mean = g.trauma_treat_mean,
        stdev = g.trauma_treat_var,
        random_seed = self.run_number*g.random_number_set
        )

30.4.3.1.2 With Event Logging Modifications

def __init__(self, run_number):
    # Create a SimPy environment in which everything will live
    self.env = simpy.Environment()

    # Add an empty list to store our event logs in 
    self.event_log = [] 

    # Create a patient counter (which we'll use as a patient ID)
    self.patient_counter = 0

    # Create an empty list to store patient objects in
    self.patients = []

    # Create our resources
    self.init_resources()

    # Store the passed in run number
    self.run_number = run_number

    # Create a new Pandas DataFrame that will store some results
    # against the patient ID (which we'll use as the index)
    self.results_df = pd.DataFrame()
    self.results_df["Patient ID"] = [1]
    self.results_df["Queue Time Cubicle"] = [0.0]
    self.results_df["Time with Nurse"] = [0.0]
    self.results_df.set_index("Patient ID", inplace=True)

    # Create an attribute to store the mean queuing times
    # across this run of the model
    self.mean_q_time_cubicle = 0

    self.patient_inter_arrival_dist = Exponential(
        mean = g.arrival_rate,
        random_seed = self.run_number*g.random_number_set
        )

    self.treat_dist = Lognormal(
        mean = g.trauma_treat_mean,
        stdev = g.trauma_treat_var,
        random_seed = self.run_number*g.random_number_set
        )

30.4.3.2 the generator_patient_arrivals method

This method is unchanged.

30.4.3.3 the attend_clinic method

This is the key place in which we add our logging. The logs are what vidigi relies on to calculate who should be where, when, within the animation.

Tip

Appending rows to a dataframe is a very slow and memory-intensive operation and could negatively affect our simulation runtime.

A more efficient approach is to add dictionaries to a list. Each dictionary corresponds to one row of our eventual event log dataframe, and contains the keys that will form our column names, and the values that will be the values in each row.

At the end of the process, we will pass these dictionaries to pandas and it will convert them into a dataframe - so the end result is the same, but this approach is much faster overall.

Event logging takes the format below:

self.event_log.append(
        {'patient': entity_identifier,
            'pathway': 'My_Pathway_Name',
            'event_type': 'arrival_departure', # or 'queue', 'resource_use', or 'resource_use_end'
            'event': 'arrival', # or 'depart', or for 'queue' and 'resource_use' or 'resource_use_end' you can determine your own event name
            'time': self.env.now}
    )

Tip

While these columns are the bare minimum, you can take this opportunity to add as many additional fields as you see fit!

Bear in mind that the event log files can get quite unwieldy, and it may not be very efficient if you are adding in extra columns where the values are not always present.

This is another benefit of using dictionaries, however - you don’t always have to pass the same number of key-value pairs; the resulting dataframe will just deal with this and leave any missing ‘columns’ empty for that row.

Here, we will begin to add our logs.

In this case, we are adding logging at 5 points

arriving
beginning to queue (in this case, this will be the same as the arrival time)
beginning treatment
ending treatment
departing

30.4.3.3.1 Original

def attend_clinic(self, patient):
    self.arrival = self.env.now

    # request examination resource
    start_wait = self.env.now

    with self.treatment_cubicles.request() as req:
        # Seize a treatment resource when available
        yield req

        # record the waiting time for registration
        self.wait_treat = self.env.now - start_wait

        # sample treatment duration
        self.treat_duration = self.treat_dist.sample()
        yield self.env.timeout(self.treat_duration)

    # total time in system
    self.total_time = self.env.now - self.arrival

30.4.3.3.2 With Event Logging Modifications

def attend_clinic(self, patient):
    self.arrival = self.env.now
    self.event_log.append( 
        {'patient': patient.identifier, 
            'pathway': 'Simplest', 
            'event_type': 'arrival_departure', 
            # you must use this event name for arrival events
            'event': 'arrival',  
            'time': self.env.now} 
    ) 

    # request examination resource
    start_wait = self.env.now
    self.event_log.append( 
        {'patient': patient.identifier, 
            'pathway': 'Simplest', 
            # for a queue, you can define your chosen event name
            'event': 'treatment_wait_begins', 
            'event_type': 'queue', 
            'time': self.env.now} 
    ) 

    # Seize a treatment resource when available
    with self.treatment_cubicles.request() as req:
        # Seize a treatment resource when available
        yield req

        # record the waiting time for registration
        self.wait_treat = self.env.now - start_wait
        self.event_log.append( 
            {'patient': patient.identifier, 
                'pathway': 'Simplest', 
                'event': 'treatment_begins', 
                # for a resource_use, you can define your chosen event name
                'event_type': 'resource_use', 
                'time': self.env.now 
                } 
        ) 

        # sample treatment duration
        self.treat_duration = self.treat_dist.sample()
        yield self.env.timeout(self.treat_duration)

        self.event_log.append( 
            {'patient': patient.identifier, 
                'pathway': 'Simplest', 
                # for a resource_use_end, you can define your chosen event name
                'event': 'treatment_complete', 
                'event_type': 'resource_use_end', 
                'time': self.env.now 
                } 
        ) 

    # total time in system
    self.total_time = self.env.now - self.arrival
    self.event_log.append( 
        {'patient': patient.identifier, 
        'pathway': 'Simplest', 
        'event': 'depart', # you must use this event name for departure events 
        'event_type': 'arrival_departure', 
        'time': self.env.now} 
    )

30.4.3.4 the calculate_run_results method

This method is unchanged.

30.4.3.5 the run method

Our concern in the run method is adding a way to extract the event log.

It will also be important to augment the event log with a run number - we already pass a run number attribute when initialising our Model class, so we can just access that.

Finally, we change our model to output a dictionary of our results_df and our event log. You could approach this in different ways, like returning a list, or only returning the event log and calculating all results from the event log instead.

30.4.3.5.1 Original

def run(self):
    # Start up our DES entity generators that create new patients.  We've
    # only got one in this model, but we'd need to do this for each one if
    # we had multiple generators.
    self.env.process(self.generator_patient_arrivals())

    # Run the model for the duration specified in g class
    self.env.run(until=g.sim_duration)

    # Now the simulation run has finished, call the method that calculates
    # run results
    self.calculate_run_results()

30.4.3.5.2 With Event Logging Modifications

def run(self):
    # Start up our DES entity generators that create new patients.  We've
    # only got one in this model, but we'd need to do this for each one if
    # we had multiple generators.
    self.env.process(self.generator_patient_arrivals())

    # Run the model for the duration specified in g class
    self.env.run(until=g.sim_duration)

    # Now the simulation run has finished, call the method that calculates
    # run results
    self.calculate_run_results()

    self.event_log = pd.DataFrame(self.event_log) 

    self.event_log["run"] = self.run_number 

    return {'results': self.results_df, 'event_log': self.event_log}

30.4.4 the Trial Class

30.4.4.1 the init method

In this, we just need to add an empty list which will hold all the event log dataframes from the individual runs.

30.4.4.1.1 Original

def  __init__(self):
    self.df_trial_results = pd.DataFrame()
    self.df_trial_results["Run Number"] = [0]
    self.df_trial_results["Arrivals"] = [0]
    self.df_trial_results["Mean Queue Time Cubicle"] = [0.0]
    self.df_trial_results.set_index("Run Number", inplace=True)

30.4.4.1.2 With Event Logging Modifications

def  __init__(self):
    self.df_trial_results = pd.DataFrame()
    self.df_trial_results["Run Number"] = [0]
    self.df_trial_results["Arrivals"] = [0]
    self.df_trial_results["Mean Queue Time Cubicle"] = [0.0]
    self.df_trial_results.set_index("Run Number", inplace=True)

    self.all_event_logs = []

30.4.4.2 the run_trial method

Finally, in every iteration, we split out the results from our model (the dictionary) into two objects - one of which is our event log.

We can then add this event log - remembering that it also has a column added identifying which run the event log relates to - into our list.

Once we have finished the for loop - i.e. we have completed every run of the model - we can join our event log into one large dataframe.

Finally, to make it easier to experiment with building visualisations, metrics, animations and more without having to rerun the model each time, we will save the resulting dataframe as a csv.

Tip

You may wish to explore a datatype-preserving format like pickle, arrow or parquet for this export so that you don’t run into issues with dates - but csv has the benefit of being cross-plaform, human readable and machine readable.

If you are using version control (Git/GitHub), it’s recommended to add this file to your .gitignore - it’s quite large and will change frequently!

30.4.4.2.1 Original

def run_trial(self):
    for run in range(g.number_of_runs):
        random.seed(run)

        my_model = Model(run)
        my_model.run()

        self.df_trial_results.loc[run] = [
            my_model.mean_q_time_cubicle
        ]

    return self.df_trial_results

30.4.4.2.2 With Event Logging Modifications

def run_trial(self):
    for run in range(g.number_of_runs):
        random.seed(run)

        my_model = Model(run)
        model_outputs = my_model.run()
        patient_level_results = model_outputs["results"] 
        event_log = model_outputs["event_log"] 

        self.df_trial_results.loc[run] = [
            my_model.mean_q_time_cubicle
        ]

        self.all_event_logs.append(event_log) 

    self.all_event_logs = pd.concat(self.all_event_logs) 

    self.all_event_logs.to_csv("event_logs.csv")

30.5 Logging resource IDs

If we want to make use of our full ability to monitor resource utilisation at a detailed level, and to create animations with the vidigi package, we also need to change the way we request resources to account for the fact we are now using a simpy store instead of directly interacting with our simpy resources.

Where we would have previously used

with self.treatment_cubicles.request() as req:
    # Seize a treatment resource when available
    yield req

    # ALL CODE WHERE WE NEED TO KEEP HOLD OF THE RESOURCE

# CONTINUE AFTER RELEASING RESOURCE HERE

we instead now use

# Seize a treatment resource when available
treatment_resource = yield self.treatment_cubicles.get()

# ALL CODE WHERE WE NEED TO KEEP HOLD OF THE RESOURCE

# CONTINUE AFTER RELEASING RESOURCE HERE

# Resource is no longer in use, so put it back in the store
self.treatment_cubicles.put(treatment_resource)

30.5.1 Imports

We will need to add the following to our imports:

from vidigi.utils import populate_store

30.5.2 The Model Class

30.5.2.1 the init_resources method

Vidigi needs to know which resource a user made use of so that we can ensure it stays with the correct resource throughout its time in the animation.

The standard simpy Resource does not have a way of tracking that, so we need to do two things: - create a simpy Store that we will store our resources in - use the vidigi helper function populate_store() to generate a store full of special resources that each have a unique ID we can track when doing our event logging

Overall, the use of stores won’t generally change your code too much - and we cover exactly what needs to change a little later in this document.

If you are using priority resources, this step will be a little different - see Example 3 in the documents if you need to use Resources that prioritise some entities over others.

30.5.2.1.1 Original

def init_resources(self):
    self.treatment_cubicles = simpy.Resource(
        self.env,
        capacity=g.n_cubicles
        )

30.5.2.1.2 With Vidigi Modifications

def init_resources(self):
    self.treatment_cubicles = simpy.Store(self.env) 

    populate_store(  
        num_resources=g.n_cubicles, 
        simpy_store=self.treatment_cubicles, 
        sim_env=self.env
        )

30.5.2.2 the attend_clinic method

We will return to our attend_clinic method, where we previously added in event logging, and now

make use of this new resource ID attribute
make the required changes to how resources are requested and released for the next user

30.5.2.2.1 Original

def attend_clinic(self, patient):
    self.arrival = self.env.now
    self.event_log.append(
        {'patient': patient.identifier,
            'pathway': 'Simplest',
            'event_type': 'arrival_departure',
            # you must use this event name for arrival events
            'event': 'arrival',
            'time': self.env.now}
    )

    # request examination resource
    start_wait = self.env.now
    self.event_log.append(
        {'patient': patient.identifier,
            'pathway': 'Simplest',
            # for a queue, you can define your chosen event name
            'event': 'treatment_wait_begins',
            'event_type': 'queue',
            'time': self.env.now}
    )

    # Seize a treatment resource when available
    with self.treatment_cubicles.request() as req:
        # Seize a treatment resource when available
        yield req

        # record the waiting time for registration
        self.wait_treat = self.env.now - start_wait
        self.event_log.append(
            {'patient': patient.identifier,
                'pathway': 'Simplest',
                'event': 'treatment_begins',
                # for a resource_use, you can define your chosen event name
                'event_type': 'resource_use',
                'time': self.env.now
                }
        )

        # sample treatment duration
        self.treat_duration = self.treat_dist.sample()
        yield self.env.timeout(self.treat_duration)

        self.event_log.append(
            {'patient': patient.identifier,
                'pathway': 'Simplest',
                # for a resource_use_end, you can define your chosen event name
                'event': 'treatment_complete',
                'event_type': 'resource_use_end',
                'time': self.env.now
                }
        )

    # total time in system
    self.total_time = self.env.now - self.arrival
    self.event_log.append(
        {'patient': patient.identifier,
        'pathway': 'Simplest',
        'event': 'depart', # you must use this event name for departure events
        'event_type': 'arrival_departure',
        'time': self.env.now}
    )

30.5.2.2.2 With Custom Resource Modifications

def attend_clinic(self, patient):
    self.arrival = self.env.now
    self.event_log.append( 
        {'patient': patient.identifier, 
            'pathway': 'Simplest', 
            'event_type': 'arrival_departure', 
            # you must use this event name for arrival events
            'event': 'arrival',  
            'time': self.env.now} 
    ) 

    # request examination resource
    start_wait = self.env.now
    self.event_log.append( 
        {'patient': patient.identifier, 
            'pathway': 'Simplest', 
            # for a queue, you can define your chosen event name
            'event': 'treatment_wait_begins', 
            'event_type': 'queue', 
            'time': self.env.now} 
    ) 

    # Seize a treatment resource when available
    treatment_resource = yield self.treatment_cubicles.get() 

    # record the waiting time for registration
    self.wait_treat = self.env.now - start_wait
    self.event_log.append( 
        {'patient': patient.identifier, 
            'pathway': 'Simplest', 
            'event': 'treatment_begins', 
            # for a resource_use, you can define your chosen event name
            'event_type': 'resource_use', 
            'time': self.env.now, 
            # grab the resource id from the treatment_resource requested
            'resource_id': treatment_resource.id_attribute 
            } 
    ) 

    # sample treatment duration
    self.treat_duration = self.treat_dist.sample()
    yield self.env.timeout(self.treat_duration)

    self.event_log.append( 
        {'patient': patient.identifier, 
            'pathway': 'Simplest', 
            # for a resource_use_end, you can define your chosen event name
            'event': 'treatment_complete', 
            'event_type': 'resource_use_end', 
            'time': self.env.now, 
            'resource_id': treatment_resource.id_attribute} 
    ) 

    # Resource is no longer in use, so put it back in the store 
    self.treatment_cubicles.put(treatment_resource) 

    # total time in system
    self.total_time = self.env.now - self.arrival
    self.event_log.append( 
        {'patient': patient.identifier, 
        'pathway': 'Simplest', 
        'event': 'depart', # you must use this event name for departure events 
        'event_type': 'arrival_departure', 
        'time': self.env.now} 
    )

30.6 Summary

In this chapter, we have demonstrated just a few of the things you can do with event logs, and how you can efficiently add them to an existing model.

In the rest of this section, we will dive deeper into some reproducible snippets and patterns to help you generate useful outputs from event logs, as well as a deeper dive into turning these into process maps with bupaR and animations with vidigi.