Throwback Thursday: Availability and Outage: How to Handle an Outage-ocalypse Like a Pro
Originally posted 1/6/2012
2011 was not a kind year for customers of businesses that provide a large portion of their services via the Web. From movie streaming like Netflix, to the banking access via Bank of America, downtimes were, at the very least, indiscriminate. In the worst cases, with loss of availability spreading over days, the only realistic description would be to refer to each of them as an “outage-ocalypse” – an outage so big it spans for days. Amazingly most businesses, even those that made lists like this one from SmartBear, escaped with an image unscathed.Of course while these outages make great fodder for "Top Ten Lists," it fails to draw attention to the fact that organizations, of any size, more than likely experienced an outage-ocalypse of their own, just on a smaller scale. If you are one of these, you should not feel alone. Even as prevention is a key component of running a modern IT Service Management organization, an outage can still occur. The difference is how you handle them, and what the perception of your customer base is afterwards.
So, as we navigate 2012, we want to help you deal with outages of any size like a pro. Listed below are five key suggestions on how.
Communicate the Outage
When systems go down, incidents begin to generate instantly, sometimes in multiples. This is because a customer may tend to think they are the only one experiencing the issue. Quick communication of an outage can help reduce volume, allowing you’re your Service Desk to work on solving the issue versus the time consuming process of responding to individuals directly. The Service Catalog, as well as automated functionality for notifying users, are both great tools. If those aren’t part of your current ITSM solution (you should think about upgrading) never forget manual communication methods though – even getting a call out to managers if necessary.
Identify the Cause
This sounds a lot more like common sense than professional advice. However, even large organizations with major outages in 2011 found it necessary to devote as many resources as possible (see RIM SWAT Team) to get to the cause of an outage. Since Change is often at the root of most problems, change management needs to be part of your daily regimen. While testing is often accepted as a necessary part of the process, keeping track of changes is not. We’ve heard tales of organizations still using Excel, some even pen and paper. While that’s a step ahead of no mechanism at all, look into real change and configuration management solutions. Often, because of the automation features and the low cost of modern ITSM solutions, ROI over manual methods can be realized very quickly.
Communicate the Resolution
Communication comes up twice in our list here, and that’s on purpose. Good communication goes a long way in helping customers feel like part of the process. If there are lessons to be learned or improvements that will be made, don’t hesitate to use this as an opportunity for internal PR. Again, Modern ITSM software solutions will provide options to automate communication, but don’t look down on sending an open and honest email to your users letting them know the issue has been resolved and that you appreciate their understanding. Think of it as another chance to bring customers into your empathetic (versus apathetic) approach to solving outages. In addition don’t forget to include the IT team. In many cases communication is most lacking with those closest to us within the organization.
Most organizations have the best intentions when it comes to processes. Though, in many cases it becomes a create-and-then-leave-alone endeavor. It’s not until an outage occurs that any consideration is given. In contrast, true process-driven organizations are agile, and understand that processes need continual review and adjustment. At a minimum, if you experience an outage, be sure to review your current processes and make changes based on lessons learned. Remember, the end game is prevention, not reaction. Processes, along with consistent and regular review will work with you to reach this objective.
Modern ITSM software solutions should create a solid link between your skilled staff and your well-developed processes. We have found that tools can often be the weakest link, forcing IT organization to hire staff, or adjust processes based purely on limitation of their current tool set. This is a big warning sign, and will only become more apparent with an outage. Spend some time evaluating what you find is missing from your current ITSM system, and develop a strategy for overcoming those limitations. While cost is often the biggest factor, there are a lot of modern options that provide wide gains with regard to functionality at substantial cost savings. If you're not sure if now is the right time to upgrading your expensive, legacy system, check out the guide below for extra help.