Organizations are looking to SRE and automation to combat a rise in service incidents.
Our findings are based on an independent survey we commissioned of over 500 IT Operations, DevOps, and Site Reliability Engineering (SRE) professionals at U.S. organizations with over 300 employees.
We’re thrilled to announce our State of DevOps Automation 2021 Report. In a year that was anything but normal, we wanted to understand how DevOps teams have been impacted by the sudden change in work environments coinciding with the added pressure put on digital services as any and everything went online.
Our report reveals that the confluence of digital transformation initiatives with the remote/hybrid work policies brought on by the pandemic has exacerbated the challenges of managing a modern stack — organizations overwhelmingly reporting higher rates of downtime (68.4%) and increased MTTR (93.6%).
The question then is, what now? While the pandemic has accelerated the reliance on digital services, we don’t expect this trend to be going back the other way anytime soon. Organizations are pressed to find scalable, long-term solutions.
Our survey surfaced two critical levers organizations are aiming to pull to navigate the new terrain:
More than ever before, organizations are investing in SRE to set up process controls and ensure reliability of their services. Even within organizations that do not have official SRE roles, ITOps teams are embracing SRE practices.
Almost all (98%) of respondents with the “VP/Director/Manager IT Operations” role increased focus on SRE practices in their organization in the past 12 months, while 62.4% of IT Operations respondents plan to expand SRE efforts in 2021.
SREs are critical contributors to incident resolution and help teams work with complex distributed systems at scale. However, nearly 80% of respondents said individuals responsible for reliability engineering are experiencing challenges while trying to solve incidents as they are occurring.
While adoption of SRE has helped organizations deal with growing complexity and reliability issues, SRE teams are still facing challenges overcoming an overwhelming amount of manual toil and lack of shared process controls between engineering and operations teams. Organizations are looking to automation to help make SRE as effective as possible.
Our report revealed that manual toil is the top challenge during remediation, with 51.7% of respondents reporting that a lack of automation is preventing them from quickly taking action to resolve an incident and 52.3% stating that they plan to implement new automation tools in the coming year.
It’s clear that organizations see the great promise of automation in reducing manual toil and enhancing service reliability. In fact, organizations are investing a tremendous amount of time, money, and resources building custom automation, with 40% of organizations having one or more full time engineers working on custom in-house tools or bots for automating incident response.
Even with so much time and resources put into automation, organizations are still facing challenges to automating processes in a meaningful way. In fact, nearly half of respondents reported that their engineering operations are only 26-50% automated.
We found that the barriers to automation are not a lack of desire or time, but rather a lack of understanding of how current processes work, with so much of this living as institutional knowledge.
Organizations have come to an inflection point. To manage the complexity of today’s landscape, most are convinced that automation will hold the key. But is the solution really as simple as ‘automate everything’? While many think of automation as something that removes humans from the process, the reality is that humans are inextricable from the equation. In fact, they’re necessary. In our report, we found that 9 out of 10 respondents believe automation should let humans use their judgment at critical decision points to be more reliable and effective.
So the goal is less about how to implement end-to-end automation but rather how to build automated processes that more accurately replicate real world scenarios—letting machines do what they’re great at doing like repetitive, predictable tasks and humans stepping in to use judgment and provide discernment. This solution—human-in-the-loop automation—will help organizations incrementally automate processes.
The journey to using automation in a meaningful way starts by codifying processes — taking institutional knowledge out of the heads of experts and into dynamic documentation so that teams have shared knowledge and clarity about what steps to automate. From there, we believe organizations everywhere can unleash the full potential of automation to reduce manual toil, streamline processes, and ultimately deliver value to customers faster.