·Ankit Mehta·3 min read

The engineer who knows everything is a risk, not a hero

Most teams have one. The person who always gets pulled into the bad incidents, who knows which service is held together with tape, who can fix in five minutes what would take anyone else two hours. Everyone is grateful for them. Nobody notices that the team has quietly become dependent on a single human staying reachable forever.

We call that person a hero. They are actually a risk, and usually an exhausted one.

Why the hero is a problem

When one person holds the knowledge that resolves incidents, a few things happen, none of them good.

That person never gets a real break, because every serious incident routes back to them. They burn out, and burnt-out people leave. When they do, the knowledge leaves with them, and the team discovers how much it was relying on someone's memory.

The team also stops learning. Why would anyone else dig into the gnarly service when the hero will just handle it. The dependency deepens with every incident, and the single point of failure is not a server. It is a person.

Get the knowledge out of one head

The fix is not to value that engineer less. It is to get what they know out of their head and into a place the whole team can reach.

Write the runbook for the service only they understand. Capture the steps they take during the incidents only they get paged for. After the next incident they solve, have them explain not just what they did but how they knew to do it. That last part is the knowledge that usually never gets written down.

This is slow, and it feels less efficient than letting the hero keep saving the day. It is the only thing that turns a fragile team into a resilient one.

Shared timelines beat shared memory

A lot of the hero's value is simply that they remember. They remember what happened last time this service misbehaved, and what fixed it. If that history lived somewhere the team could read, the gap between the hero and everyone else would close on its own.

That is part of why we keep every incident in Vigiles as a durable record, with its timeline, its resolution, and its postmortem. The next person who hits a similar failure does not have to find the one engineer who was awake last time. They can read what happened, what was done, and why.

Heroes are great in a story. On a team, you want a system where nobody has to be one.

If your incidents all route to one person, you have a single point of failure with a pulse. Vigiles keeps every incident and its resolution as a record the whole team can use. Start free, or see how incident management works.