Communicate and engage with on-call, stakeholders, and customers through a single human-in-the-loop automated workflow
Once you've has been notified of an incident and done your due diligence to classify it according to its urgency and severity, it's now time to contact the other subject matter experts, impacted teams, and external stakeholders, as the situation dictates. Unfortunately, it typically involves a lot of manual steps to track down and contact the correct teams and stakeholders through the proper channels, and ultimately assure an appropriate response. As crucial as this is, this nature of work takes responders away from their true specialized duties, and it very easily becomes a bottleneck.
Transposit’s connected workflow is able to notify all necessary parties with a single click through the engage phase of our five-phase incident management solution. The Transposit engagement phase is modeled after the way firefighters handle emergency response, breaking it down into three core steps: dispatch, inform, and notify.
Of course, incident management is unique for every organization and even likely varies within a given organization. For this reason, the architecture of the Transposit platform allows different responses to be grouped together as a series of automated actions, or separated out into individual cues that are initiated one by one by the DevOps team. The simplification and customizability of Transposit has the power to reduce this whole tedious process into a single click.
After kicking off an incident process (incident intake) and classifying the incident, the next step in the order of operations in incident management involves dispatching people and teams that will be directly involved in the response (like on-call service teams), informing stakeholders that might be directly affected or need to take action (executives, legal, customer service) and informing others who need visibility (like external customers).
This process creates a bottleneck because the incident manager then has to manually track down and notify each person or special interest group. This most likely means looking up the on-call person from the correct team and pinging them through Slack or Teams, calling them over the phone, video chatting them via Zoom or through Slack or Teams, or emailing them the details of the event.
Then the incident manager has to undertake the same protocol for stakeholders. This is not only redundant but it often makes duplicate work, if the incident manager is having to repackage, rephrase, or repeat the same information multiple times or through multiple channels.
Transposit reduces the issue of escalation and associated internal communication with necessary teams and stakeholders into single-click workflows. There are three primary actions that take place through the Transposit platform at this juncture.
The dispatch step, most crucial to the actual resolution of the incident, involves activating the group that is in charge of mobilizing and responding to the incident. They are the conduit to all of the personnel ultimately being dispatched to take action, as they are the chief player in solving the issue. There is specified, curated, incident-specific information that is relayed to these people as well. In most circumstances, this means using PagerDuty, Opsgenie, or a similar platform to send an automated communication to the on-call person responsible for helping to mitigate the problem. Collectively, these three actions are programmed to bring severity-based escalation as well as action grouping related to incident communications, all of which can be set in action with a single click from a Transposit workflow.
In the inform step, stakeholders outside of the engineering arm are notified of the issue because of the ripple effect in their roles and duties and other communications necessary to stay in front of any type of crisis or public relations issue. The non-engineering people who need to be looped in are typically customer support, the public relations or communications team, the senior leadership, major business partners, the legal department, and anybody else who might be instrumental in getting in front of the issue. This typically involves sending an automated formalized email and a few curated messages to different appropriate Slack channels.
The notify step makes sure that anybody more broadly who wants to be aware of the situation is able to find information on it, whether internal or external stakeholders. Often taking the form of a more general notification companywide, this creates necessary transparency throughout the organization. Additionally, this provides the info and pathway necessary for anyone who may have additional resources or insight but is not part of one of the on-call teams to step up with ad-hoc support or other pertinent contextual info. This is often a more general email about the issue, as well as the creation and invitation to a Statuspage or Jira incident.
The entire incident response, from detection to engagement, can be totally automated with Transposit, but it is often prudent to give the incident manager the power to select and deploy the correct response. A human-in-the-loop approach can help with both the oversight and precision of the response.
As with the intake and classify stages, all of the automations within the engage phase are created by making a chain of actions within a Transposit runbook. The actions can be set up to run in a chain of events by a single click, or they can be subdivided into multiple sub-groups to give the incident manager a more granular level of control.
Actions can take many different forms, such as “create PagerDuty incident,” “page on-call support,” “send X message to Y Slack channels,” “send general email to executives and public relations list,” “create a status update incident,” and countless other actions specific to the incident class or organization.
In the runbook body section of the runbook, create a new section for Engage. You can add one or more buttons to this section. You could run the entire engage section from a single button, by grouping all the necessary “dispatch, inform, and notify” actions together under one button:
You could also separate out each step, with their own buttons:
As the nature of the response is largely dictated by the type and severity of the incident, a number of different runbooks can be pre-configured so that when an incident takes place, the DevOps team can simply select the runbook from a menu of options that fits the situation and deploy its chain response of actions with a single click. Click here to learn more specifics about setting up Transposit runbooks from our documentation.
To make this stage even easier for the incident manager, pre-fill details for each action. For instance, for the "Notify Slack channels" action, you can pre-fill the correct Slack channels to notify and add a custom message. Keep the Prompt for user input box checked so that the text can be changed when the action is actually being used, if needed.
You can use the Data button to add information from the activity or a previous action like we've done in the custom message section by adding in the Slack Channel link. You can use this Data button in any action to ensure people are getting the right information, whether it's in a PagerDuty alert, email, or Statuspage update.