Automate Incident Intake: Reduce from 15 Minutes to Instant

How to automate the “every time there’s an incident” tasks and jump straight to investigation and classification

Jessica Abelson, Director of Product Marketing
Mar 1st, 2022
Share

Incident management is one of the biggest hurdles facing modern engineering teams. Legacy IT service management (ITSM) solutions are ill-equipped to handle modern speed and turnaround expectations. Oftentimes, incident reports can come through so many different channels or incident monitoring platforms that it is difficult to set automation parameters to handle them. Teams often also lack the resources necessary to automate processes and build in-house platforms tailored to their specific needs and infrastructure.

Transposit provides a modern approach to incident management, with connected workflow that integrates and automates incident management across people, platforms, processes, and APIs. Transposit helps teams accelerate response by surfacing knowledge and context, bringing the right people and teams together, and reducing toil through human-in-the-loop automation.

We break incident management down into five steps.

  1. Intake: Start the incident management process by bringing the right people together, creating tickets, a Slack channel, a Zoom bridge, etc.
  2. Classification: Understand level of customer impact, assign severity, and escalate.
  3. Engagement: Communicate with internal and external stakeholders and customers.
  4. Remediation: Further investigate incident and take action to stop customer impact.
  5. Report, Record, and Learn: Create post-incident report, hold retrospective, identify problems to solve, and drive continuous improvement based on learnings.

The first step — intake — alone can take upwards of 15 to 30 minutes per incident, which can lead to DevOps teams struggling to keep up with their manual daily operations while responding to incidents in a timely manner. Transposit integrates the tasks that need to take place every time there’s an incident, using automated runbooks and integrations to more than 200 pre-built connectors to every tool in an organization’s entire cloud stack.

The status quo: manual toil

When an incident is reported, there are a number of things that need to happen in order to begin the process of getting it fully taken care of — and oftentimes, these steps are entirely manual. In a traditional DevOps environment, things start with an alert of some kind, either through an observability platform such as New Relic or Datadog, or reported by customer service or customers directly.

When this happens, it is generally then up to the incident manager to get the process started. This usually starts with the incident manager manually creating and assigning a ticket in Jira or Zendesk or any number of other similar platforms. The incident manager then will have to pull up the incident process in a wiki page, create and manually add all necessary stakeholders to a Slack channel, and create a Zoom bridge and initiate a Zoom call to get all of the stakeholders onto the same page and to begin delegating tasks. This is a tedious process that eats away at time.

How Transposit solves this problem

Transposit has the capability to turn that whole lift into an instant, automated process. Each incident has its own unique, nuanced needs in terms of response and remediation, and therefore still requires humans in the loop at various stages. But the intake phase is full of repetitive protocol that follows the same actions each time — creating a Jira ticket, logging a PagerDuty incident, inviting stakeholders to a Slack channel, scheduling a Zoom meeting, and/or any number of other organization-specific tasks.

By making use of Transposit’s automated runbooks and their pre-built connectors to more than 200 productivity tools, the entire process takes place the instant an incident is detected. Before anyone notices that an incident has been reported, the automated runbook handles the intake.

How it works

The process of setting up the automation begins by creating and setting up an incident runbook in the Transposit platform.

Set up runbook triggers

To fully automate incident intake, we need to begin by adding triggers to the runbook so it will automatically execute based on the criteria set. The triggering signals coming in from multiple monitoring and observability tools and channels can be optimized using Transposit webhooks, which integrate with tools like Datadog, Pagerduty, and BigPanda to instantly kick off the runbooks and set the intake process and the corresponding workflow in motion.

  1. Under Start runbook when, click Add trigger
  2. On the right, choose the type of trigger. For a webhook, choose Webhook, and then choose which webhook. Learn how to set up a webhook in minutes.
  3. If you wish, click Add condition to create more specific criteria for this runbook to trigger based on this webhook (seen below).

Note that runbooks can also be triggered based on the creation of an activity type, an activity update (i.e. a severity has been set or changed), or if a runbook’s state has changed (in progress, closed, or error).

Add automated actions to “When runbook starts” section

Now we can add each task that needs to be automated in the runbook category labeled When runbook starts. Every action in this section will automatically execute upon the runbook running. Actions (like creating a Slack channel) or Conditions (like creating a waterfall task based on a previous action’s completion) can be created and customized with just a few clicks. In no time, the automated intake process is in place and ready to streamline the management of the incident. Learn more about creating and running runbooks here.

  1. Under When runbook starts, click Add action

  1. Search for the action and add it in. Make sure to add in any data for “required” fields on the right-hand side. If the action is not available, you can create new actions in the developer platform.
  2. You can also Set activity field to automatically take the output from an action and pipe it into an activity field (like taking the output URL from the “Create a Jira issue” action and piping it into the “Related Links” activity field).
  3. Lastly, you can add Conditions. By adding conditions, you can automatically kick off further actions based on certain criteria.

Every automated action is recorded in the timeline, so your team has a full audit trail of what has happened.

What’s next

Intake is just the first step in a Transposit’s 5-stage Incident Management Solution, which is followed by Classification, Engagement, Remediation, and Report, Record, and Learn. Continue on to learn how to automate incident classification.

Share