Splunk IT Service Intelligence

One of the Engineering Teams I manage is Splunk. Prior to starting my job a few years ago I had no exposure to Splunk at all. As always, I wanted to learn more about the tools I manage so I started reading and asking questions. Our primary objective in using Splunk was for FISMA compliance to retain security logs and to feed our Security Operations Center (SOC) Splunk Enterprise Security (ES) Application. 

Splunk Iceberg 

As time went on, I quickly realized that most of Splunk's potential was untapped. Like an iceberg, this massive monster had remained hidden below the water level. The FISMA compliance and SOC requirement were only the tip of the iceberg of what Splunk could do. 

I attended many root cause analysis (RCA) meetings where various silos of information were tapped in order to identify potential problems. I quickly noticed the amount of time and resources involved in hunting down the culprit. Each team followed similar investigative processes but used different tools. What's more, some teams did not have access to all of the tools other teams had. 

I began to ask why nobody was using Splunk to correlate machine data in order to view a much larger pool of information. The answer was that nobody knew how to do it and that the Splunk search interface was useless. I chalked it up to a lack of training, something nobody had time for. Splunk was powerful and useful, we just were not taking advantage of it. I was determined to solve this problem. 

Root Cause Investigations 

I started asking my Splunk Engineers to assist with RCAs. I wanted to see if Splunk Knowledge could make difference in RCAs. I found that the Splunk Engineers could, without specific service knowledge, find RCA information much faster than Engineers who supported that service. In fact, they were able to find root causes when these other Engineers could not. They did all of this with the information already in Splunk for FISMA and SOC requirements without information collected for the purpose of RCAs. 

I knew that my Splunk Engineers could not research every RCA and that the other Engineers would not learn Splunk well enough to do it themselves. I also knew there was another option, we could build a Splunk App for RCAs.  

My RCA Tool Idea 

My idea was to have an App where an Engineer could start looking at data from various sources in a single dashboard. Instead of using various other tools to look at information, we could present it to them without worrying about specific data access issues or the time required to collect it all. 

My Splunk Application requirements: Consolidate information from various tools used by various services, provide uniform access through Splunk to all Engineers, compose a single dashboard that visualizes collected information together to provide an intelligent view around the event time, allow for building correlated searches to identify potential problems based on past experience and service dependencies. 

Here is the exact mock interface I used as an example of what I wanted. I did not even know they were from the ITSI development program. 

I presented my requirement to my Splunk Team who assured it could be done. I built a mock interface as a visual example of what I wanted and began to socialize my idea. It was very well received and I started receiving more input from other services. 

Splunk Solution – IT Service Intelligence (ITSI) 

My Senior Splunk Engineer informed me about a new Premium Splunk Application that seemed to do everything we wanted and much more. It was called IT Service Intelligence (ITSI). It was not available for purchase yet but we could evaluate it. 

My experience with ITSI changed my use case for it. It quickly became obvious that we could use it to proactively monitor services in addition to reactive investigations. I liked the whole methodology to how ITSI viewed the environment as a collection of dependent components that composed a service. Instead of traditionally looking at each individual component, ITSI provided a service level view where all components are monitored as a composite service.  

It may not matter much that one component is not adequately performing in the service offering, but it may matter when several or specific components are degraded. These degradations alone may not trigger any alarms but together, as a service, they may pose a serious service degradation. Additionally, a component can experience performance degradation as a result of degraded performance in another dependent component. The dependent component may not even trigger an alarm but may trigger a cascade effect that degrades the service.  

ITSI solves this problem by allowing you to define a service composed of multiple KPIs from these dependent components. This allows you to see the health of the service as a whole and the dependent component KPIs contributing to the health score. 

ITSI is completely customizable in allowing you to define the service and dependencies. It also has anomaly detection where it can spot irregular behavior of a KPI and generate a notable event. This may seem problematic when you perform weekly patching on specific nights and drive KPIs beyond their normal behavior. Of course all this can be configured in ITSI to prevent false alarms. 

ITSI Use Case 

Back to my original objective before diving into some of the cool things in ITSI. The Deep Dive feature in ITSI suits my needs to RCAs. I can have all the relevant information graphed out on one view. Not only can I look at what happened during an outage, but I can compare each one to a historical period in time. That way I can see how everything normally looks in comparison to what things looked like around the time period leading up to the outage. I can drill down into the timelines to uncover the log events of any component. 

Conclusion 

ITSI is another tool in the Splunk tool-shed allowing you to index data from any source and derive intelligence. While each vendor offers their own product management and health visualization that administrators should continue to use, Splunk's power is the ability to aggregate that data and make it available to a wider user base. 

If you are already using Splunk, ITSI is worth a look because you are probably already indexing a lot of valuable data that ITSI uses. If you are not using Splunk at all, ITSI is a good reason to start. 

This is my unbiased opinion based on my experience and admiration for Splunk. 

To view or add a comment, sign in

More articles by Scott Eagles

  • Smart Devices Exploit Consumers

    The one thing consumers do not consider when purchasing “smart” electronics is the short lifespan and frequent need to…

  • Inequality In Minority Owned Business

    A lot of people do not sympathize with minority struggles for equality. These people often say that they have the same…

  • No Pill for Skill, Why Security Fails. Part II

    I discussed how important hiring people with the correct skill is in my last article. Let’s now turn our attention to…

  • No Pill for Skill, Why Security Fails, Part I

    Everyone wants a magic pill to fix what is ailing them. They want a pill for malware prevention, insider threat…

    2 Comments
  • Can Technical Experts Become Leaders?

    Most technical experts receive well deserved promotions throughout their careers. They often end up in management…

  • Team Leadership: Not My Humans

    Team Leadership: Not My Humans While I was in the Air Force we had this Major that always referred to his subordinates…

  • U.S. Government Fails to Protect American Business

    There's a lot of controversy around Government spying on citizens in efforts to protect us. While I feel it may be a…

  • Leaders Fail

    Do leaders fail? How do you handle failure? How do you or your subordinates deal with it? We all try to avoid failure…

  • Hiring Manager Selection Criteria

    As a hiring manager for Security Professionals I know how I rank candidates during the evaluation process. The overall…

  • Data is King, Security is Queen

    Data and Security should go together like king and queen, peanut butter and jelly. They should be stuck side by side…

    3 Comments

Others also viewed

Explore content categories