·3 min read
Your first on-call shift, and how to get through it
Your first on-call shift is less about knowing everything and more about staying calm, reading before you touch, and knowing when to escalate.
Posts tagged
Product updates, incident management notes, and lessons from building Vigiles.
Your first on-call shift is less about knowing everything and more about staying calm, reading before you touch, and knowing when to escalate.
Here is a quick test for how fragile your incident response is. Imagine your best engineer is unreachable during the next outage. If that thought worries you, your bus factor is one.
Every team has the one engineer who fixes everything. That dependency is a single point of failure with a pulse. How to spread the knowledge before it walks out the door.
Most runbooks are written once, never opened, and useless by the time you need them. What separates a runbook that helps mid-incident from one that just exists.