Beyond Recovery Time: The Metrics That Actually Measure Resilience

Written by Edward Jones | 13 May 2026

RTO and RPO have had a good run. For decades, Recovery Time Objective and Recovery Point Objective have been the default language of resilience planning - the numbers organisations point to when they want to demonstrate they've thought seriously about what happens when things go wrong.

The problem is that they measure the wrong thing. Or rather, they measure one narrow slice of a much larger picture, and somewhere along the way that slice became a proxy for the whole.

What RTO and RPO actually tell you

Recovery Time Objective tells you how quickly you aim to restore a system. Recovery Point Objective tells you how much data loss you're prepared to accept. Both are useful. Neither tells you whether your organisation can actually function during the period between an incident starting and normal operations resuming.

That gap - what happens in the middle - is where most organisations are flying blind. And it's where the real cost of an incident accumulates.

Systems being offline is expensive. But poor decisions made under pressure, failed communications, regulatory deadlines missed, customers left without answers, leadership teams unable to coordinate - these are often where the lasting damage is done. None of them show up in your RTO.

The metrics worth measuring instead

Mean Time to Detect (MTTD). How long does it take your organisation to identify that something is wrong? In many breaches, attackers have been present for weeks before detection. The recovery clock doesn't start at the moment of breach — it starts at the moment you know about it. MTTD is the metric that tells you how much of the incident you're dealing with retrospectively.

Communication Continuity Rate. Can your incident response team actually reach each other, and the right stakeholders, when primary systems are degraded or suspect? This rarely gets measured. It should. The ability to coordinate a response is the precondition for everything else - and it's frequently the thing that breaks first.

Decision Latency. How long does it take your organisation to move from identifying a problem to making a consequential decision about it? Slow decision-making during an incident isn't just frustrating — it's costly. Measuring it forces the uncomfortable conversation about who has authority, who needs to be consulted, and where the bottlenecks actually are.

Regulatory Deadline Adherence. Under DORA, NIS2, and FCA rules, notification timelines are specific and unforgiving. Whether your organisation meets them during a real incident is a direct measure of operational resilience - and one that has concrete consequences if the answer is no.

Post-Incident Capability Degradation. How long after an incident is resolved does your organisation operate below full capacity? Recovery time measures when systems come back. This measures when people, processes, and confidence do. They're rarely the same date.

Why this matters now

Resilience has moved up the agenda - regulators are asking harder questions, boards want more than a slide showing RTO targets, and the threat landscape has made it clear that incidents are a question of when, not if.

The organisations that handle incidents well aren't just the ones that recover fastest. They're the ones that communicate clearly, decide quickly, and maintain enough situational awareness to stay ahead of a situation that's actively trying to outpace them.

Measuring recovery time is easy. Measuring your ability to function under pressure is harder - but it's the question that actually matters.

View full post