Thanks to the highly complex nature of modern IT infrastructure, it’s inevitable that outages will occur. That’s why it’s important that steps are taken to minimise downtime, although according to a new study by LogicMonitor, a SaaS-based performance monitoring platform for enterprise IT and service providers, employees aren’t always doing enough to prevent outages.
The 2019 IT Outage Impact Study examines the impact infrastructure and software brownouts and outages have on organisations, and if such events are preventable. The survey found that although performance and availability, the state of when an organisation’s IT infrastructure is functioning properly, are the top two concerns of IT teams worldwide, organisations are still plagued by frequent brownouts (where infrastructure or software performs at a degraded level) or outright outages.
80% of global survey respondents report that the performance and availability of their IT infrastructure tops their list of concerns. In fact, availability was considered more important than security and cost-effectiveness, which ranked third and fourth, respectively.
A DevOps engineer for a technology integration and management company said, “We support finance clients that deal with microtransactions against the open market, so an outage or even a loss of connectivity to the stock exchange can quickly equate to lost dollars, and they hold us accountable for that.”
Despite the high level of concern, many firms are still missing simple issues that could prevent these outages. In fact, the survey says that as many as 53% of all outages reported could have been prevented had employees been more proactive.
The top two missed opportunities in preventing downtime were as follows:
- Failing to notice when usage is trending towards a danger level. For example, this might be more traffic than the network can efficiently handle, or it might be a primary storage share running out of space.
- Failing to notice that critical hardware (or software) performance is trending steadily downward.
The high stakes of IT outages
It’s vitally important that companies keep their IT infrastructures online, as the cost of even an hour of downtime can be staggeringly high. Global companies that have frequent outages and brownouts experience up to 16x higher costs when mitigating and recovering from downtime than companies who have fewer instances of downtime. The ‘big six’ costs identified by respondents included:
- Lost revenue
- Lost productivity
- Compliance costs
- Mitigation costs
- Damage to the brand
- Lowered stock price
It’s not just the financial cost to the company that needs to be taken into account with IT outages, as there’s also a significant human cost. 35% of UK IT leaders have reported being worried about someone losing their job due to an IT outage, with many believing that they could be on the chopping block.
“IT availability has become one of the business world’s most valuable commodities, but also the most difficult to maintain. Organisations today are increasingly dependent on the availability of their IT infrastructure,” said Gadi Oren, vice president of technology evangelism at LogicMonitor.
“A single IT outage can have huge negative business impacts including lost revenue and compliance failure, as well as decreased customer satisfaction and a tarnished brand reputation. Comprehensively monitoring IT infrastructure is key in detecting the early warning signs of impending IT outages and acting in real-time to course-correct before it’s too late.”
Why outages occur
Survey participants report that the most common causes of disruptive downtime, which pose a threat to their key priorities of performance and availability, include:
- Network failure
- Software malfunction
- Usage spikes/surges
- Third-party provider outages
- Human error
- Configuration error
The potential consequences of an IT outage is why it’s vitally important to ensure IT leaders understand why outages occur and to do their best to prevent them. What’s more, LogicMonitor recommends that IT leaders set up monitoring tools to ensure that in the case of any outage, the team can respond quickly to minimise downtime.