June 14, 2026·Ankit Mehta·3 min read

After an incident, MTTR is the wrong thing to brag about

Every incident review I have sat in eventually arrives at the same slide. Mean time to recovery, down from last quarter, with a small green arrow. Everyone nods. The meeting ends. Three weeks later the same class of failure takes the site down again.

MTTR is easy to measure and easy to game. You can cut it by getting faster at the same fire drill, which means you are getting good at recovering from a problem you never fixed. A number that improves while the underlying weakness stays put is not progress. It is practice.

The question MTTR does not answer

A recovery time tells you how fast you stopped the bleeding. It says nothing about whether you understood why you were bleeding, or whether anything will stop it from happening again.

Two teams can both resolve an outage in twelve minutes. One reverted a deploy, shipped it again the next day, and went down for the same reason on Friday. The other found the missing timeout that let one slow dependency take everything down, fixed it, and never saw that failure mode again. Same MTTR. Completely different outcome.

The number you actually want is harder to put on a slide. Did this incident change anything. Did a runbook get written, a check get added, a limit get set, an assumption get killed. If the answer is no, the incident taught you nothing, no matter how fast you closed it.

Measure what changed

After an incident closes, I look for three things, and none of them is a duration.

What did we learn that we did not know before. Not the symptom, the cause. The thing that surprised us.

What did we change because of it. A concrete action with an owner and a date, not a vague intention to improve monitoring someday.

Did the change hold. Six weeks later, did this exact failure stay gone. That is the only proof the postmortem did its job.

Track those across a quarter and you get a picture MTTR will never give you. You can see whether your incidents are teaching you anything, or whether you are just getting faster at mopping the same floor.

A postmortem nobody reads is paperwork

The reason teams fall back on MTTR is that real learning is work, and the work usually lands at the worst time. The fire is out, everyone is tired, and writing an honest account of what went wrong feels optional.

That is the part we built Vigiles to take off your plate. When an incident closes, the timeline, the duration, and your resolution notes are already there, so the postmortem starts as a draft of what happened instead of a blank page two days later. You spend your energy on the part that matters, deciding what to change, not rebuilding the night from memory.

A postmortem is worth doing because it stops you paying for the same outage twice. Fast recovery is good. Learning something is what keeps the next incident from being a rerun.

If your incident reviews still end on an MTTR slide, try closing the loop properly. Vigiles builds the timeline for you and drafts the postmortem from what happened. Every workspace starts free. Or see how incident response works in Vigiles.