What a Naval Aviation Squadron Incident Can Teach Us About IT Incident Management

Codified processes and SOPs are key to efficient incident response and resolution

Hugh Brien
Nov 11th, 2021

It’s been over 25 years since I was a junior officer and aviator in a US Navy helicopter squadron. It was a great job — it was both fun and challenging and at times a little terrifying. The CH-46D squadron (which has since all been retired and rebooted) flew a variety of missions in all kinds of conditions. This included logistics missions, vertical replenishment (VERTREP) which supplied cargo to ships, search and rescue, and personnel transfers like “holy helo” — flying the Battle Group priest around for Sunday service and even hoisting him to smaller ships.

One of the many collateral duties in most military units is the role of the “Officer of the Day”. This is typically a 24-hour shift where the officer is responsible for posting the flight schedule, touring the facility, reviewing security, ensuring formations — acting as the real-time eyes and ears of the Command Officer and his staff. An important aspect of being Officer of the Day was being the first point of contact for any significant event revolving around aircraft, respective cargo, and personnel.

One of the most dreaded incidents was being notified of a downed aircraft or any loss of personnel. When the Officer of the Day was notified, it required pulling out the Binder and walking through a list of steps, typically referred to as Standard Operating Procedures (SOP). In IT, this closely resembles what we refer to as a runbook.

The purpose of an SOP is to use organizational best practices to standardize how a unit operates and ensure maximum efficacy. SOPs provide each service member with everything you need to understand how a unit operates and how to accomplish specific tasks. The SOP itself takes into account various circumstances — acknowledging the limitations and challenges that a particular unit may face — in order to maximize their opportunity for success.

This article from the US Army outlines it well: “this process eliminates wasted effort from each Soldier trying to determine his/her own version of the best way to execute a task. Furthermore, as part of the assessment process, SOPs address associated safety challenges to prevent accidental loss, preserve Army assets, and maintain unit effectiveness. Finally, it eliminates the loss of best practices due to Soldier turnover in the unit. The standardized procedure is codified for continuity, safety, and efficiency in the SOP.”

Runbooks provide a similar function for engineers and operators — they are sets of steps and guidelines that are valuable when responding to an incident or performing other operational tasks. A good runbook is actionable, accessible, accurate, authoritative, and adaptable and will help any operator respond quickly and efficiently when something goes wrong or needs to be addressed.

A downed aircraft or any loss of personnel was classified as using the following criteria:

No matter what, the first task is always to contact the Command Officer. You were to continue to call or page them until they responded. It was anathema to think that the Command Officer was not the first notified. Then, you would then notify the Executive Officer (Second in Command) and the Command Master Chief (Senior enlisted Non-Commissioned Officer), followed by the National Military Command Center (NMCC). You can see the current requirements here.

For the call to the Command Center, you had to read a script and fill in the details. It was something like:

“This is the Watch Officer and HELLSUPPRON ELEVEN.”

“I am reporting a an “Alpha” Class Mishap, Aircraft Type, Buno Number, Side Number, Callsign, Location Lat/Long, Time in Zulu/GMT, Number of Souls onboard. Mishap summary.

Then you would notify the Squadron Safety Officer and notify the Squadron Legal Officer.

Some steps are committed to memory. For example — your first call/contact is to the commanding officer. The CO is the most experienced aviator in your squadron. The CO has already been exposed to a certain level of stress and difficulty. They are going to demonstrate how to operate effectively during that type of event or incident. As a helicopter pilot, when operating an aircraft, the first step in almost any procedure was to lower the nose/collective to get airspeed. Airspeed is life in aviation — whether in a helicopter or an airplane.

Given the stressful nature of these situations, one cannot rely on memory alone. The standardized procedures mean one does not miss or incorrectly follow any steps. In addition, these procedures are well defined so that anyone, no matter how junior, is qualified to execute them. These are also the first steps in a checklist that might be used in a post-incident review debrief.

When I was a brand new Aviator doing my first run as Officer of the Day in my very first squadron, an “automated checklist” that would initiate contacting the CO and other officials, and help me seamlessly move through those standardized tasks, would have been a game-changer. Still, the structured thinking that was instilled in me during my time in the military has continued to provide value throughout my career and life.

At Transposit, we’re excited to provide that structure, documentation, and automation for those who encounter stressful incidents while maintaining infrastructure and site reliability. Our fully integrated, data-driven approach brings the clarity, unity, and direction teams need to efficiently handle incidents, reduce mean time to resolution (MTTR), and meet service level objectives (SLOs). If you’d like to learn more, we’re happy to schedule time for a demo.