Single-node monitoring is quietly lying to you
If your uptime checks run from one location, you are seeing one network path, not your users. Why single-node monitoring misses real outages and invents fake ones.
Posts tagged
Product updates, incident management notes, and lessons from building Vigiles.
If your uptime checks run from one location, you are seeing one network path, not your users. Why single-node monitoring misses real outages and invents fake ones.
An expired certificate takes your whole site down in a way no code change can fix fast, and it is entirely predictable. Why cert expiry is the outage you can see coming.
When DNS breaks, your servers are fine, your usual checks may be fine, and your users cannot reach you at all. Why DNS failures are so easy to miss and how to catch them.
Your app can be perfectly healthy and still be down because something it relies on failed. Why you should monitor your dependencies, not only yourself.
Knowing something broke is half the job. Knowing it came back, and how long it was down, is the other half. Why recovery notifications matter.
Everyone wants five nines until they see the bill. What each extra nine of uptime really costs, and how to pick an availability target that fits your business.