Upleveled Alerting With Transposit + PagerDuty

Transposit’s PagerDuty integration turns alerts into action with codified process

Jessica Abelson, Director of Product Marketing
Jan 11th, 2021
Share

Here’s the long and short of it: alerts are not enough. Simply knowing something is wrong is only the first of many steps towards remediation. Fire alarms get our attention, but without a set evacuation plan, there’s chaos.

So, once an alert is fired, what’s next? In most organizations, engineers follow a runbook or a routine in case of system failure. But this is a charged process and eats up precious time. Documentation is too often buried, out-of-date, or lacking context. And finding and taking action on this documentation is usually entirely manual.

The answer is not just process, but codified process. To quickly resolve incidents, teams need to know the steps to take and have data easily accessible to make the best decisions. Codifying and automating previously manual processes in chat platforms enable teams to turn alerts directly into action without hesitation. This also means human data — conversations and decisions — can be captured to later evaluate how well the process and automation worked.

Transposit supports this exact scenario by using automated runbooks and integrations. Transposit’s DevOps process automation platform provides a unique, fully integrated, human-in-the-loop approach that empowers on-call teams to operate with more order and repeatability and fewer errors. Automating alert response with Transposit is an impactful way to uplevel your entire incident management process.

In this blog series, we will explore how Transposit works when integrated with PagerDuty. In this first installment, you’ll see how to use Transposit to get Amazon API Gateway service status in the Europe region in response to a triggered alert incident on PagerDuty.

Integrators and Workflows

Integrators are applications on Transposit that relay alerts and relevant data from external services through webhooks. We create Transposit triggers that provide a webhook endpoint, which we can then use as service integrators on PagerDuty.

Once the third-party monitoring tool integrator sends alerts to PagerDuty, the PagerDuty integrator on Transposit is triggered. It then routes the incidents to a Slack channel and invokes automated workflows in response to these incidents.

We can set up workflows defined by one Transposit dev platform application to implement functionality end-to-end—say, to get the AWS service status in a particular region. Python and/or Javascript are currently supported. Workflow rules are used to trigger the workflow when an event from a trigger is received.

How It Works

Let’s start with a specific task: We will use integrators and workflows to check the AWS API gateway service status in the Europe region in response to a PagerDuty alert incident.

  1. Create a Transposit workflow and select the application to run as the Transposit-provided AWS get service status.

  1. Set the Environment Variables to the service you want to query—in this case, the Amazon API Gateway and the AWS region blocks you want to run against.

  1. Create a Transposit trigger following these docs. Once configured, you can route alerts to a Slack channel, and invoke Transposit workflows to remediate or resolve incidents tied to those alerts.
  2. Add rules to the workflow you created in the first step so it can be run when the trigger is invoked by PagerDuty.

  1. Finally, trigger an alert from your PagerDuty third-party monitoring tool integration or web UI (as shown below).

What Happens Next?

When PagerDuty receives an alert incident from the monitoring and alerting integration, it posts to the triggers webhook endpoint. The trigger will route the incident alert to the Slack channel.

The workflow will receive an event from the trigger and execute as a result, returning the results to the designated Slack channel, as shown below.

From here, you can set up your runbooks to guide users through various steps like declaring an incident and updating users, creating a Jira ticket, querying charts from Datadog, and more. From the moment an alert is fired, your team is primed to tackle the incident.

Transposit + PagerDuty is the Missing Piece to Incident Response

Workflows provide teams with a path to consistent incident response. They also take away the monotony and duplication of incident-response work. On-call engineers spend a lot of time doing upfront work trying to gather context about an incident or alert. If the right contextual information already exists, it assists in decision making and leads to a more streamlined resolution process.

Transposit’s integration with PagerDuty adds a missing piece of the incident management equation. On-call engineers receive contextual information about incidents; actions that contribute to resolving these incidents are done in an automated and consistent way. This codification of process means less downtime, business impact, and stress for engineers…and ultimately, happier customers.

Share