We’re learning so much about how to embrace chaos at Chaos Conf this week. But what about DevOps Cat? Last we saw him, he was preparing to respond to a 3 AM alert that web service is down by running his human’s runbook.
Runbooks are sets of troubleshooting steps and tips that are valuable when responding to an incident or other operational tasks. Let’s see what DevOps Cat’s runbook says...
Oh no! As we can see, the action that the runbook suggested — copying a specific command — wasn’t as helpful as DevOps Cat expected it to be. Instead, he is paw-deep in an error message. Can chaos be the answer to a speedier response?
Of course! A runbook is only as useful as its content. Remember, runbooks should always be actionable, accessible, accurate, authoritative, and adaptable. In this case, the runbook was not accurate. By validating and maintaining runbooks consistently through chaos engineering's rapid feedback loops, the improvements will be there for the next time you face an incident.
You know what they say — a fire drill a day keeps the alerts at bay! So — be kind to yourself and your on-call team by staying prepared.
You can download the full comic here.