Easier On-call Overrides

How to make the process of scheduling an override on PagerDuty more efficient, encouraged, and documented

Taylor Barnett · Nov 20, 2019

Photo of violinist performing by Larisa Birta on Unsplash

If you know any engineer who has ever been on-call, you know the situation. You are out to dinner or drinks with them and they are lugging around their backpack with laptop and hotspot in tow. They are trying to live a normal life while being on-call.

But what if they are bedridden sick? Or are going to be on vacation? Or want to attend their child’s school play? Life happens. On-call schedules can’t really predict these life events months in advance.

Needing to schedule overrides is inevitable. I see it within the #oncall channel at Transposit all the time. It’s a very manual process right now. Someone asks, “Hey! Can someone cover me in two weeks on Thursday and Friday while I’m on vacation?” And someone replies in a timely (or, not so timely) fashion and then someone with the correct permissions needs to login and go change the on-call schedule.

Being able to comfortably ask for coverage is a sign of a healthy on-call rotation. Asking for a schedule override is a very human thing. You are asking for the support of your team while trying to balance supporting yourself.

It's a common, manual task, but easy to forget. This can cause a few problems, like the incorrect person getting paged and leading to disrupted on-call coverage. Also, since we already asking teammates for coverage in Slack, it would make things easier if we could update the schedule in the same interface too for efficiency and less context switching. Lastly, it is helpful for historical data purposes to know who is on-call at any given time.

I wanted to explore what I could do to make the process of scheduling an override on PagerDuty more efficient, encouraged, and documented.

The solution

Naturally, I turned to Transposit to build out a Slack command that integrates with PagerDuty since many on-call teams use PagerDuty as the single source of truth for their on-call schedules. For me, Transposit helps automate different tasks that include a human in the loop. It’s a human requesting the override and a human accepting it, but everything else is automated.

In my sample app, I built a Slack command that allows you to request an override with a start and end time and then it asks the channel, usually a channel where your whole on-call team is, if they can cover it. Once someone accepts it, it automatically creates an override in the PagerDuty schedule.

See it in action

First, you use the /request-override command:

Picture of /request-override command

Following this you are guided through a set of prompts where you share what start and end time you want the override for:

Picture of UI within Slack to set the start date

Picture of UI within Slack showing a calendar view to set start date

You then confirm the override:

Picture of Slack message to confirm override request details

Before this step, everything was only visible to you, until now when the request is shared with the rest of the channel. Once it is accepted by another team member, the override is scheduled in PagerDuty and everyone is updated:

Picture of what everyone else in Slack sees

Going forward

This application could be expanded in a few different ways:

If you’d like to use it within your on-call team, get started now here!

Try intelligent runbooks and simplified incident resolution