There is a shift taking place in safety practice, a BIG shift, significantly influenced by the collective term Resilience Engineering and the work of Eric Hollnagel.
Resilience Engineering, in this context, takes in a whole range of safety science ideas and combines them to improve an organisation.
Resilient systems have an inherent capability to adjust their functioning before and after changes and disturbances.
Importantly resilient systems continue operations in the face of incidents, stress or pressures that are applied to them.
Resilience is a mode of preparation, assuming that trouble is always going to happen, anticipating the unexpected, exerting proactive control, eliminating unwanted variability, and emphasising more helpful variability. As the system stretches, it can undergo some stresses and then return to its former shape or potentially even improve and get better. That's really what we're talking about when we say resilience!
The critical theme under this approach is that we maintain performance despite variability and uncertainty. Thus, when things go wrong, deviate or surprises strike, we can cope with that, and more importantly, we can be successful.
Safety is not defined as the absence of incidents but as the ability to cope with current conditions.
We want to increase the organisation's ability to make correct adjustments, vary in helpful ways, and make sure that we successfully achieve our objectives.
There are four critical principles we need to understand as we design and implement our systems.
- Intractability means that the system is so complex that we lose an appreciation of what's going on under the hood. We don't understand enough about how the system works to describe the processes and the cause and effects. Resilience engineering suggests that adverse events arise when systems are intractable! When we don't understand what's going on and we adapt or vary based upon a best guess. Failures and successes result from adjustments in the face of intractability or complexity.
- Functional resonance means that the variability of individuals can combine and escalate unexpectedly. So you might have Worker 1 not following a procedure while Worker 2 takes a shortcut with her job, which introduces variability in terms of how they're meant to do the work. Then Worker 3 starts to unload his loader in a high-risk zone. The combination of those different variabilities is what causes things to go wrong. On other days it might cause things to go right, and we wouldn't notice that there was a problem, but on this particular day, this functional resonance amplifies variability and could cause a problem.
- Emergence is a property where lots of little things and components combine, to which the sum of, is of greater importance. Something bigger comes from the system or organisation than if we took it all apart and studied those elements individually. It means we can't pick apart our system, and we can't do root cause analysis as we traditionally would do, to fully appreciate how everything works and fits together.
- The Efficiency thoroughness trade-off (ETTO) principle moves on from the dynamic risk modelling approach and suggests that people adjust to cope with current working conditions. Thus, they make compromises between efficiency and thoroughness. Similar to goal conflict, it appreciates that people have to make decisions that may compromise safety for lesser productivity or vice versa.
Systems are complex under this model. Work is generally underspecified, wherein we don't quite know everything that's going on. Designers can't possibly anticipate, assume or develop systems that are perfect.
Safety is about amplifying variability that leads to success and minimising variability that leads to failure. There's less focus on component failures - the little parts of the Machine that break down - and the deviations or violations across different levels of the organisation.
Resilience engineering attempts to be proactive by improving the ability to establish robust and flexible processes.
Monitoring and revisiting risk models with a systems view is crucial so that the organisation as a whole understands how everything functions together, as opposed to an individual operation.
Proactive use of resources aids in responding to and anticipating adverse events. One of the problems we have with traditional approaches to safety is that often, the number of errors to correct drops to zero. Processes may start to be uncontrollable, where we don't know exactly what's happening and have no data coming in (such as no injuries, no incidents and no near misses). Do we, therefore, assume that everything's safe? No. Henceforth, safety is about the existence of measures that pre-empt failure, thus enabling an organisation to fail successfully.
Rules and procedures can manage interactions between people and variability. High uncertainty requires the flexible use of rules. Rules (and rules about rules) shouldn't have one right way but instead, provide guidance around the level of flexibility permitted. For example, process rules guide how we know what rules to apply in different situations, giving people autonomy and decision-making power to be flexible when things go wrong.
Resilience requires improvisation. We absorb strain, preserve functioning, recover, learn and grow. Support and redundancy are also necessary when it comes to Resilience Engineering.
Rules shouldn't be thrown away, but they should be used sparingly and effectively. For example, a new starter may very much benefit from a prescriptive step-by-step task list of how to do the job, while a more experienced expert could conversely benefit from more flexibility, and capacity to adapt and adjust within that rule framework.