Paul Mercina, head of innovation at Park Place Technologies examines the role of artificial intelligence and predictive maintenance as part of a smart distributed future.
The evolution of information technology since the 1940s-era ENIAC computer has been swift and literally world-changing. The approaches applied to maintaining our now vast technology assets, on the other hand, have been slower to develop.
But today, we’re finally standing on the cusp of the most transformative innovation in IT support — the application of artificial intelligence to this once almost exclusively human field — and it couldn’t have come soon enough.
A great leap at a perfect time
The data centre as we know it isn’t going away, but it is experiencing a radical reinvention. Businesses are turning to hybrid models to simultaneously take advantage of public cloud and owned IT assets, and Gartner predicts 90% of organisations will follow this route by 2020.
Even owned assets are shifting, with private clouds hosted on premises and in colocation facilities, and modular data centre pods taking compute ever-farther afield.
In other words, it’s all getting very complex. And that’s before we start talking about remote facilities, the emergence of IoT sensors all over the place, and the push toward the edge to handle the more distributed nature of our “smarter” lives.
Combining such disparate elements into a unified infrastructure is an architectural challenge. Fortunately, one or any number of consultancies will help build this out. Keeping it running is another one, which often goes unnoticed until a major outage makes headlines.
To keep up in the race for reliability — in a tightening labour market and under increasing budget pressure is already difficult. Figuring out where to get more engineers to check on all those micro-mobile data centres that could be processing 75% or more of enterprise data by 2022 is nigh on impossible.
New support methods are required for this new era. Fortunately, they’re here, as machine learning technologies are being applied to enterprise IT hardware maintenance.
Robots to the rescue?
We’re not talking about C3PO here, but learning computers are coming to understand our networks and interdependent IT systems in more holistic ways than our sniffers, diagnostic modules, and monitoring dashboards — or even the humans working with these tools — ever have. The details of data centre maintenance, it turns out, make for a great use case for machine learning.
Already, there is a plethora of performance information generated and logged by data centre hardware, all day, every day.
The biggest problem for administrators has always been to see through the chaos to identify problems in time, in order to minimise business impacts, but that’s a tall order. It’s all too easy to overlook a minor error report or be called away to deal with a different issue, only to have a serious outage erupt.
In remote facilities, there is the additional complication of sending engineers out to check on the equipment. This is time-consuming and costly, and will only become more so as edge computing proliferates.
AI-based monitoring systems offer significant advantage. They can keep track of every bit and byte of performance information.
Moreover, using machine learning techniques, AI-based remote monitoring takes the next step, gleaning insights from disorganised and seemingly unconnected data points.
In relatively short periods, these systems come to identify what types of issues precede outages and diagnose failures in the earliest stages.
With that capability, we’re starting to achieve the data centre manager’s dream — being able to solve a problem before it starts. In other words, predictive maintenance.
Predictive maintenance in action
Park Place Technologies assessed the status of machine learning and decided to take the plunge. We deployed a machine learning-based predictive maintenance capability for our clients and called it ParkView.
As just one example, we’re working with Cincinnati Bell in the US, a major regional telecommunications company with numerous unmanned data processing outposts.
These facilities are mostly far from the corporate headquarters, so the company was racking up expenditures to have employees log thousands upon thousands of miles to physically check that servers and other equipment in their multi-state territory were operating.
This changed with the predictive maintenance capability we introduced. Remote monitoring not only spared them the human resource allocation wasted behind the wheel, but it also improved the diagnostics.
Rather than having to guess whether a controller, a battery, or a failed memory stick was the root cause of a problem — and potentially dispatching an engineer with the wrong gear to implement the wrong fix – the AI conducts a complete and more accurate analysis.
The system then automatically generates a trouble ticket with all the information — including spare parts numbers — that our engineers need in order to enact a solution.
Consider that a single administrator salary in the US averages over $120,000 and such an employee can support, on average, 64 Windows O/S instances today, according to Gartner, being able to stretch that number into the hundreds of instances per administrator — and eventually the thousands and beyond —w ill be the source of exceptional savings.
Additionally, the predictive nature of the diagnostics and the improvement in first-time fix rates helps minimise downtime, thereby delivering further business value.
Looking forward, this is how enterprises around the world will solve the emerging maintenance dilemma to handle the burst in edge computing, increasing uptime to match ever-increasing customer expectations, and doing exponentially more without a similar budget expansion.
Machine learning, automation, and integrated engineering response are combining to deliver a different form of IT maintenance – the one our digital transformation demands.
Where we go from here
The IT maintenance industry has moved the markers. We started with reactive maintenance, fixing systems as they broke down, and worked through scheduled maintenance, when downtime could at least be forecast and IT teams put on the clock in the middle of the night.
We’ve strived for predictability. Many data centre leaders who have worked in a facility long enough develop an eerie sense of how to hold all the pieces together when trouble is brewing. The introduction of hardware maintenance outsourcing solutions helped ease their burden.
Now, AI is packing a decade’s learning curve into days, empowering data centre managers to “keep the lights on” in less time so that they can spend more of it advancing strategic objectives.
As analytics capabilities improve, the powerful field of deep learning takes hold, software fully defines our data centres and DCIM tools are further integrated with maintenance solutions, predictive maintenance will start to look a lot more like self-maintenance.
AI-based systems will diagnose and often heal problems, calling in humans only when a pair of hands is needed to swap out a part. This will free up human resource to allow for further innovation and personalised services.
How will we be running our data centres when that day comes? It’s hard to say now, but we’ll learn.