Here’s the long and short of it: alerts are not enough. Simply knowing something is wrong is only the first of many steps towards remediation. Fire alarms get our attention, but without a set evacuation plan, there's chaos.
So, once an alert is fired, what’s next? In most organizations, engineers follow a runbook or a routine in case of system failure. But this is a charged process and eats up precious time. Documentation is too often buried, out-of-date, or lacking context. And finding and taking action on this documentation is usually entirely manual.
The answer is not just process, but codified process. To quickly resolve incidents, teams need to know the steps to take and have data easily accessible to make the best decisions. Codifying and automating previously manual processes in chat platforms enable teams to turn alerts directly into action without hesitation. This also means human data — conversations and decisions — can be captured to later evaluate how well the process and automation worked.
Transposit supports this exact scenario by using automated runbooks and integrations. Transposit’s DevOps process automation platform provides a unique, fully integrated, human-in-the-loop approach that empowers on-call teams to operate with more order and repeatability and fewer errors. Automating alert response with Transposit is an impactful way to uplevel your entire incident management process.
In this blog series, we will explore how Transposit works when integrated with PagerDuty. In this first installment, you'll see how to use Transposit to get Amazon API Gateway service status in the Europe region in response to a triggered alert incident on PagerDuty.
Integrators are applications on Transposit that relay alerts and relevant data from external services through webhooks. We create Transposit triggers that provide a webhook endpoint, which we can then use as service integrators on PagerDuty.
Once the third-party monitoring tool integrator sends alerts to PagerDuty, the PagerDuty integrator on Transposit is triggered. It then routes the incidents to a Slack channel and invokes automated workflows in response to these incidents.
Let’s start with a specific task: We will use integrators and workflows to check the AWS API gateway service status in the Europe region in response to a PagerDuty alert incident.
When PagerDuty receives an alert incident from the monitoring and alerting integration, it posts to the triggers webhook endpoint. The trigger will route the incident alert to the Slack channel.
The workflow will receive an event from the trigger and execute as a result, returning the results to the designated Slack channel, as shown below.
From here, you can set up your runbooks to guide users through various steps like declaring an incident and updating users, creating a Jira ticket, querying charts from Datadog, and more. From the moment an alert is fired, your team is primed to tackle the incident.
Workflows provide teams with a path to consistent incident response. They also take away the monotony and duplication of incident-response work. On-call engineers spend a lot of time doing upfront work trying to gather context about an incident or alert. If the right contextual information already exists, it assists in decision making and leads to a more streamlined resolution process.
Transposit's integration with PagerDuty adds a missing piece of the incident management equation. On-call engineers receive contextual information about incidents; actions that contribute to resolving these incidents are done in an automated and consistent way. This codification of process means less downtime, business impact, and stress for engineers...and ultimately, happier customers.