AIOps use cases and imperatives
AIOps, simply put, is applying the power of analytics and machine learning to automate and improve the efficiency of IT Operations.
With the increased use of IT Systems, driven by digital transformation and more recently with COVID, IT leaders are expected to do more with either the same or reduced level of resources. According to a recent IDG study - "2020 State of CIO Executive Summary”, increasing operational efficiency registered as most prominent business initiative, cited by 37% of both IT Executives and LOB.
AIOps promises to improve operational efficiencies by
- Helping identify, predict and isolate problems faster. Without AIOps, IT Operators spend hours/days isolating problems. Read more
- Rapidly analyzing terabytes to petabytes of operational data to identify root cause analysis. Today, IT Operators and SMEs spend multiple person-days are spent in war-rooms. Read more
- Prioritizing problems and recommending solutions to address problems, where without AIOps, IT Operators spend hours trying different solutions to address problems. Read more
- Forecasting capacity needs and optimizing resource utilization, where without AIOps, IT leaders spend countless hours every budget cycle to estimate capacity needs. Read more
- Autonomous IT Operations - closed loop automated detection and mitigation is a bit further out in future, and depends on how quickly ML algorithms can prove their accuracy and gain trust of IT Operations. Read more
Most commercially available AIOps solutions, can help IT organizations achieve some of these usecases with varying degree of success. The success in achieving operational efficiency from AIOps is determined by following factors - and I like to call these as imperatives:
- Monitor every IT Resource - Is your IT Organization monitoring every IT resource involved in a transaction or customer interaction, and tracking operational metrics, events, traces and logs that can be used by your AIOps solution to identify and predict issues?
- Analyze every outage - Is your IT organization having well established practices to gather outage data to perform root cause analysis, as this information can now be leveraged by your AIOps solutions to accelerate root cause analysis?
- Automate every repeatable task - After any incident or maintenance activity, do you have practices to review the actions taken and improve existing automation, as this well maintained automation can be leveraged by your AIOps solution to provide recommendations for future incidents?
- Measure operational efficiency - How are you measuring operational efficiency - and do you have clear goals on areas of improvement e.g. # of outages, long time to resolve incidents etc. KPIs like Mean time to resolve (MTTR), Mean time to acknowledge (MTTA) and Mean time between failures (MTBF) are commonly leveraged by IT Organizations to measure operational efficiency.
The ideas, opinions and research presented in this article are my personal views on this subject. Please stay tuned to read more about these success factors and usecases. Leave comments to share your thoughts and participate in the conversation.
Thx for sharing your thoughts. Very good general write-up of the AIOps capabilities. As a large company beeing in the middle of a transformation journey I fully agree on the massive improvements AIOps can bring to an operational organisation.
AIOps integrated into DevOps continuous delivery can help improve agility https://www.garudax.id/pulse/devops-next-frontier-its-pursuit-agility-maneesh-goyal-1e
Part 3 of the series focuses on AIOps use case on faster root cause analysis, including challenges addressed, key success factors and metrics to measure success. Here is the link to the article if you missed it.. https://www.garudax.id/pulse/analyze-root-cause-business-impact-faster-maneesh-goyal
Kim Letkeman Thanks, and I hope that you found it useful. Are you still writing a lot - still remember your passion for writing.
The part 2 of this series focuses on the AIOps use case to Identify, predict and isolate problems faster. This is the key use case as it prepares the organization to leverage AI to identify problems and incidents before they happen, a key factor for the Autonomous Operations. https://www.garudax.id/pulse/aiops-identify-predict-isolate-problems-faster-maneesh-goyal/.