Getting Started with Splunk
Splunk Enterprise: Bringing Data-to-everything

Getting Started with Splunk

As technologies evolve, new tools emerge, and it becomes imperative to learn those tools to keep ourselves up to date. I recently came across one such tool, Splunk a data-to-everything platform.

Machine Data

90% of the data generated by organisations is machine data, and mainly unstructured data, in the form of log files, digital exhaust and time-series sensor data for IoT. It usually is the most underused and undervalued data that has the capability to provide the most important insights. #Splunk uses machine data to address the IT operations, big data needs, security, and analytics domains of the organisation.

Search Processing Language

I got started with learning to use the Splunk Platform and its language, the Search Processing Language. Splunk uses SPL to transform massive amounts of unstructured machine data into time-series events that can be used to answer business and operational questions in real-time.

Benefits of Splunk

The capabilities of the platform are staggering, and it truly brings data to every question, decision, and action. Splunk’s language, the Search Processing Language (SPL) combines the best capabilities of SQL with Unix Pipeline Syntax, allowing us to access data in its original format (the unstructured RAW format), optimize it for time-series data, and use the same language to visualize the data. SPL allows us to search (almost find a needle in a haystack), correlate, analyse and visualize any data.

Also, one more refreshing take was the ability to use Machine Learning and Anomaly detection inside the search language itself, without having to use another tool or software to bring in these much-needed capabilities unlike most of the other platforms.

Features of Splunk Enterprise

There are many areas where the Splunk Enterprise knocks other such tools out of the park. The main features of the Splunk Enterprise are:

1.   Indexing: Splunk indexes any data that is fed into it. Indexing allows Splunk to store and retrieve data in a much faster and efficient way.

2.   Search: The most important part of the Splunk platform. The searching tool can literally find a needle in a haystack using minimal commands.

3.   Alerts: One of the most important features of Splunk is to provide real-time alerts when certain events occur. This helps in detecting anomalies as soon as they occur.

4.   Dashboards: Dashboards contains customizable panels to display results of searches in visually appealing ways and from real-time searches.

5.   Pivot: Pivots refers to tables, charts and visualisations. They can be saved as reports and also we can use instant pivots if we have users who are not experienced in SPL.

6.   Reports: We can save searches and pivots as Reports to enable easy sharing, add them to dashboards for visualisations and schedule reports to run at specified times.

7.   Data Model: Data models encode specialized domain knowledge about one or more sets of indexed data. They enable Pivot Editor users to create reports and dashboards without designing the searches that generate them.

Splunk 7.x Fundamentals Part 1

I just got completed with the Splunk 7.x Fundamentals Part 1 course. The course is well-designed with bite-sized modules, quizzes at the end of each module and hands-on lab modules on their Splunk Trial software which they provided with this free course. The course starts by introducing us to Splunk and its components. 

Certificate

Components

The Splunk platform is based on three components:

1.   Forwarders: Used to forward data from a machine to our database. The primary source of data. These require minimal resources and have little impact on performance.

2.   Indexers: Index the data to enable faster and efficient searching. Indexing allows us to retrieve data faster, limit access to data and have multiple data retention policies.

3.   Search Heads: Used to query the data, it distributes users queries to indexers and consolidates results from the indexers. It also provides tools to visualize this data with reports, pivots, and dashboards.

Applications

The Splunk Platform gives us access to numerous Splunk apps which aid us in achieving our analytics demands by opening up Splunk to address a wide variety of use cases and extend its power. 1000+ apps are available on Splunk Base and we can even add our own to these.

Users and Roles

Splunk has a concept of users and roles to limit access to data. There are three main roles: admin, power, and user and Splunk admins can create more roles as they deem fit by granting varying permissions.

Ingesting Data

Splunk provides three main options to ingest data: Upload Option, Monitoring and Forwarders.

Splunk automatically determines the source types for major datatypes however we can create our source types too.

Searching

The Search Processing Language has a search pipeline, where commands are delimited by a pipe ( | ), the pipe character inputs the results of the last command to the next, to chain SPL commands to each other. 

SPL Command

What this example command does is it searches for 403 errors, this gives an output of events where it has occurred, this is then piped to a stats function which works on this input.

Just hovering our mouse over results allow us to add more constraints to search or remove them. We can select the time range of our results using the time range picker or directly mention it in the SPL command for easier referencing.

The timeline shows a distribution of results in the specified time range.

Timeline

We can select narrower time ranges or choose to concentrate on a time range by simply choosing them from this timeline using our mouse.

Timeline 2

Each such search is a job and we can share these searches by one-click search button or even export results as CSV, XML and JSON files.

The Fields sidebar shows different fields from the event and we can add or remove fields from selected fields (Fields displayed for each result) by simply adding other fields to our selected fields.

We can even use Wildcards in our searches to include words like “admin, administrator, administrators” as simply “admin*”.

We can use operators like NOT (!) to make our searches better and well-suited to our needs, and with numeric fields, we may use Greater Than or equal to (>=), Less Than or equal to (<=), Greater Than (>) and Less Than (<) operators.

Splunk also has three search modes: Fast for speed over completeness, Verbose for completeness over speed, and Smart which balances these two.

Search Language Syntax Components

Searches are made up of 5 components:

1.   Search Terms: To specify what we are searching for.

2.   Commands: To specify what we want to do with the results.

3.   Functions: To specify how we want to execute this command.

4.   Arguments: To specify the variables on which the function will work.

5.   Clauses: How we want to group the results or if we wish to rename the results.

At each stage of the command or between each pipe a table is created and piped on to the next command.

Tables

Table command allows us to form a table of results by selecting specific fields from the results of a search. We can give custom names to the Column Headers.

Tables

Specifying fields in the search command improves search speed by extracting only those fields.

Dedup, Sort, Top and Rare

The Dedup command can be used to remove duplicates from the results.

Sort command can be used to sort results by selected fields.

The Top command gives results with the most common values of the selected fields. It also shows two new columns with the number of counts of the event and corresponding percentage, both of which can be toggled off.

The Rare command gives results with the least common values of the selected fields, with similar attributes like the Top Command.

We can group results using the by clause and limit results too.

Top Command

Stats Command

Stats command allows us to calculate statistics on our data like finding count, distinct count, sum, average, unique values, or a list of all values.

Stats Command

Reports and Dashboards

One of the most important aspects of Splunk is searching events, computing statistics on these events, visualizing these events and creating search reports and visuals on a dashboard all in one command using the Splunk Search Processing Language. The capabilities of SPL have been extended to cover all these aspects together to form an end-to-end product to apply data-to-everything.

The report is saved search, which generates fresh results each time they are run. Reports can be shared and added to dashboards, in the form of visual panels. We can edit the searches in these reports easily and most importantly schedule them to run according to a defined schedule.

A dashboard consists of one or more panels of visual data to give us an instant insight into our data. Dashboards can also be printed and exported as PDF’s making Splunk a really handy tool for all thing’s analytics. 

Pivots and Datasets

Splunk also allows users who do not have experience with SPL to work with tables and visualizations using Pivots.

Pivots provide an easy to use user interface to create tables easily by using filters (similar to inclusion and exclusion of fields) and splitting columns (similar to group by clause). These tables can be instantly visualized by making use of the Pivot UI.

Pivots can be saved as reports and displayed in the dashboards, with time-range pickers.

Instant Pivots enable us to create Pivots without creating any data-models for it to work on.

Lookups

Splunk allows us to use Lookups to pull data from Standalone files which may contain information useful to our event searches and join the results the way we see fit. We can easily upload the lookup file, create a lookup definition specifying the degree of matching we need, whether text matches should be case-sensitive or not and get started with it.

Lookups

We can create Automatic Lookup by specifying our Lookup Table and Choosing Output fields and input fields for matching. This enables us to use lookup fields without needing to input the lookup files in our SPL command every time.

Scheduled Reports and Alerts

We can save a search as a Report and schedule it to learn and pre-defined times or continuously in real-time. We can trigger actions like saving results as CSV, logging events, output results to a telemetry point, run a script, send an email, or webhook. Similarly, we can trigger these actions for alerts when certain defined trigger conditions are met on searches that run on scheduled times or in real-time.

Next Steps

As a next step, I am exploring Splunk Machine Learning Toolkit (MLTK) which enables us to learn machine learning from within the Splunk Enterprise itself using SPL in the search commands itself. The MLTK app has 300 open source Python algorithms from sci-kit learn, pandas, stats models, Numpy and Scipy. We can add our algorithms using GitHub and work within the Splunk environment. Splunk has the option of using 1000+ apps and add-ons that are hosted on its Splunkbase platform.

All these features extend the capabilities of Splunk from just being a searching app to an end-to-end product to bring data-to-everything. The future prospects of Splunk look bright, and it seems like a good time to learn this new #technology. As of May 2019, Splunk worked for 90 companies in the Fortune 100 list.

"The Splunk approach to data at scale is to ingest data from anywhere and be able to ask it any question at any time. Splunk aims to make machine data accessible, usable and valuable to everyone."

To view or add a comment, sign in

More articles by Anirban Saha

  • Getting Started with Text Summarization

    What is Text Summarization? “Text summarization is the process of distilling the most important information from a…

    3 Comments
  • Getting Started with Moogsoft AIOps

    Introduction Often during IT operation maintenance, we come across situations where we need to handle errors in our…

    2 Comments
  • 'Machine Learning Onramp' by Mathworks.

    Just completed "Machine Learning Onramp" course by MathWorks. Definitely one of the beautifully designed courses I've…

Others also viewed

Explore content categories