Model Readiness Checklist

Judging whether a model is ready to support real-world decisions is often a subjective choice. The checklist below offers practical guidance to help you decide when your model is ready enough to be used.

We recognise that models are often developed in less-than-ideal contexts, where there isn’t enough time or resource to implement all the automated tests you might like. This checklist therefore aims to give you a meaningful minimum to aim for, giving you the best possible chance of delivering an output that is valid for use in decision-making.

The checklist is divided into three levels:

  • Bronze: For models used in small- to medium-scale decision-making within a single organisation. Focuses on harm reduction and catching critical errors using largely manual or visual checks.


  • Silver: For models that inform larger or costlier decisions, or are applied across multiple areas. Encourages stronger assurance measures (e.g. peer review, partial automation) where feasible.


  • Gold: For models that will be reused, adapted, or published (e.g. in scientific journals), where the original author may not be involved. Emphasises transparency, documentation, and robustness to long-term or external use.


Note that you don’t have to complete everything in one level before moving on to tasks in the next level. For example, your particular project may prioritise having an easy-to-use interface for stakeholders to interact with (a ‘gold’ recommendation) before it has automated tests (a ‘silver’ recommendation). This is fine! The levels are designed to help guide you, not act as a prescriptive set of rules.

Tip

You may wish to read more generally around verification and validation, tests and quality assurance as well.


Checklist



⭐ = recommendation is part of the STARS reusability framework

Bronze

Visualisations

Creating simple visualisations of a few key areas is one of the best ways to spot unexpected behaviour and quickly validate key parts of your model logic. They allow you to quickly verify whether something looks unusually high or low, or if patterns are intuitively ‘wrong’.

Visualise the following:

    • This could be done as a dot/scatter plot with time on the x axis and each dot representing the arrival of an individual, with separate runs represented on the y axis, or as a heatmap
    • This helps to see if there are any unexpectedly large gaps between arrivals and if the general pace of arrivals over different time periods matches your understanding of the system
    • You could also do this recurrently at relevant timescale if recurrent patterns of arrivals - e.g if you are using time-dependent arrival generators to reflect patterns within a day, or across days/weeks/months. You could take an existing dotplot, filter it to a single run, and make the y axis instead reflect the day of week, for example.
    • This can be done overall as a line plot with simulation or clock time on the x axis and number or percentage of resources in use over time
    • You could also do this recurrently at relevant timescale if recurrent patterns of resource use - for example, if resources are obstructed in the evenings or at weekends.
    • This could be done with a histogram, box plot, swarm plot or violin plot
    • This helps to check that the distribution of generated times roughly matches the real-world pattern, as well as checking for any implausibly long or short times
    • This can be done with a line plot with simulation or clock time on the x axis and queue length on the y axis
    • This is useful for checking whether queues do build up, and if so, if they are implausibly large or small in comparison to the real system

Core Logic Checks

When you are doing these checks, you are really just looking for something being ‘off’.

    • Do the journeys look ‘right’? Do entities follow the paths you’d expect?

The following items can be robustly checked using event logs.

However, you may be able to do some more basic checks with console logs.

    • For example, in a model where patients arrive, all patients are triaged, then some patients are advised by a nurse while others are treated by a doctor, before all patients are discharged, you would want to confirm that entities are reaching each of these stages (e.g. through console logs), or that resources at each stage show at least some utilisation (e.g. via the plot you generate), helping you identify if there is an issue with logic that decides which pathway entities follow
    • This ensures no entities are “lost” in queues with no output no sink.

Robustness for decision-making

Reproducibility

Key checks against the real system

Process and Stakeholder Checks

    • Note that stakeholder responses to outputs shouldn’t necessarily be taken as a definite right/wrong judgment on the model - but they may help to sense check or indicate areas for more attention.

Documentation

Silver

Documentation

Code Review

Automated Testing

Define formal automated tests:

Reusability

Version Control

Model Robustness

Gold

Documentation

Reusability

Model Efficiency

Model Communication and Validation

Best Practice around Variability and Model Setup

Automated Testing