Here’s something I wanted to get into for a while now, but haven’t had the time yet – switching the monitoring / alerting system from server-oriented to business-oriented. Â The gist of the story is:
If it’s not actionable and business critical, then it shouldn’t ring.
The article has some statistics and summaries as well. Â The reasoning behind the switch is obvious, but it’s good to have it formulated:
After a few months, I can tell reducing our alerting rate should have been a top priority before things got out of hands, for a few reasons.
- Constant alerts prevented the team to focus on what was important. Being interrupted even for things that can wait for a few hours lowers our productivity when we work on things that can’t wait.
- Being awaken every night, several times a night exhausts a team and make people less productive at day, and more prone to do errors.
- Too many off hours interventions cost the company a lot of money that could be invested in hardening the infrastructure or hiring someone else instead.

