Design Considerations for Splunk

Having used Splunk for a while now, thought I'd jot down some of the gems I've learnt along the way. I'll probably not be using Splunk much, if at all, in my new forthcoming job :(

This is mostly about what you need to consider/ask when designing any Splunk solution. Lots of this is common sense and about covering all the bases. But it mainly falls into 2 camps: data provision/management and the more sexy reporting/analytics!

I present my Splunk butterfly!

The Data Questions

Get this bit wrong or don't pay it enough attention to it, and you're in for a world of pain! Most of my target end audience, are non-technical or don't want the detail. They just want the numbers/sexy graphs.

This is at juxtaposition to what you need to deploy a long lived service with minimal maintenance and disruption. 

TIP NUMBER 1: Try to get sensible data guesstimates for the amount of data that will be ingested.

This is imperative to licensing and storage. As Splunk is licensed per Gb/pd indexed, you need to have a pretty good initial idea at what volumes you're looking at. 

TIP NUMBER 2: Use a heavy forwarder to front up data ingestion.

As long as you just forward and not store data, the forwarder license is free. By using a heavy forwarder, you can strip out redundant data before you send it across to an indexer. I've used this several times now, as we're only interested in some data in certain logs, for a certain purpose.

This may not be important to you or you may not have any control over it, but you should pay some consideration to where you keep the data and where it's coming from. You don't want to be sending hundreds of Gbs over a WAN if you can help it. Store it locally and avoid duplication where possible and use search heads for distributed multi-site capabilities.

TIP NUMBER 3: Set a data retention policy and do some form of data management.

You can setup sensible per index defaults using indexes.conf, either using size or ageing limits. Do you really need to keep ALL that data forever? You'll be asked for it, but in my experience near real-time means probably 2-3 months at most. You might have a legal or contractual obligation to adhere to. This makes for a sensible retention policy!

I don't rely on the source systems sending over the data. I don't trust them. They are rarely sized to keep large amounts of logs and do logging because they have to.

I use logrotate on my heavy forwarders to manage the raw data logs. Excellent lilttle data management utility, check it out. If you have to re-ingest data into Splunk, you can use something like the splunk oneshot command line argument to get back to where you want. Saved my bacon several times!

The Infrastructure Questions

I started running Splunk on simple small/medium sized RHEL VMs. Splunk uses a lot of CPU and disk I/O, so make sure it gets enough, else performance will suffer.

If you're ingesting GB's of real time data, then go physical with local fast [SSD] disks backing off to 'as fast as you can' SAN for the rest. You can then arrange your buckets to take advantage of this, using slower spindles as the data ages. Also expect to have to tune your OS network stack to keep up (Linux transparent huge pages, for example).

TIP NUMBER 4: You will scale out, so build that into the designs from day one.

As soon as you have a number of indexers, put a search head or two across the top of them. They are really easy to set up and use. This is specially true if you need to report on data across physical sites, which would otherwise be managed independently.

Use license pools to manage your license capacities. You can assign 'slices' of your overall license amount to individual indexers. It's a nice way to manage licensing and protect your design and investment!

The Analytics Piece

TIP NUMBER 5: Invest some time in learning the SPL, reporting and dashboarding.

Especially if you want the ear of senior management! Tell them how many widgets you've sold in the last week, but show them split across segments in real time, on a Google Map works wonders! 

Once you have useful data in Splunk and start drilling in, the end user questions will never stop. The "what does", "what if", "can you tell me about" scenarios will flow. This shows that you've done something useful, as that insight probably didn't exist until now. You have the power, learn to use it!

TIP NUMBER 6: Lookup, down and all around.

Once you've used a lookup table in Splunk, like air con in a car, there's no going back. Enriching your data with new fields from existing ones, is pure magic! For instance, I've created physical site details based on hostnames or data filenames. I can then search across 'Sites' and across data types. I've added Brand details to simple telephone number fields, to satisfy the needs of the red pen pushers. 

TIP NUMBER 7: Teach people to self serve.

Pass on your knowledge around the SPL, reports, dashboards etc. People around you will soon pick up the pieces and start delving and developing themselves. The end user front end capabilities of Splunk are what set it apart from the rest. There's a reason for this. It just works and has rich functionality out of the box!

Well there you have to. That's probably enough to go on with. If you have any useful gems yourself, please do leave them at the comments door, for all to explore!

I know this comment is delayed, but thank you for writing and sharing this. These are very good thoughts to ponder early in the design phase.

Like
Reply

I'm approaching our first deployment. Thanks for the heads up Phil. Capacity and performance will always be at the back of my mind!

Like
Reply

If only I'd had this while I was working on Splunk!

Like
Reply

Excellent article Phil. Thank you!

Like
Reply

great article Phil, as ever written with real insight

Like
Reply

To view or add a comment, sign in

More articles by Phil Griffiths

  • Still using vi for Ansible? No problem!

    Having grown up using vi as a main editor on UN*X systems, I still love it and often fire it up for quick demos etc. I…

    3 Comments
  • Calling the Ansible Automation Platform API Using 'Friendly' Job Template Names

    Have you ever wanted to call the AAP/Tower API to launch a Job Template? The REST API allows you to do that, fast and…

    1 Comment
  • Creating a custom EE for AWX

    This is a quick and rough guide to creating and consuming a custom execution environment (EE) in AWX. What You'll Need…

    29 Comments
  • Ansible Execution Environments

    Introduction Ansible as a automation platform offering is evolving further, making customer automation runtimes easier…

    15 Comments
  • AWX 18.0.0 with Containerised Execution Environments

    The upstream of Ansible Automation Platform, AWX has just landed with a new exciting release. Thought I'd give it a try…

    4 Comments
  • Ansible Features I Missed

    Well, ok some of these aren't particularly new, but I've only just become aware of them. Even after years of Ansible…

    4 Comments
  • Tuning RHEL Using Ansible System Roles

    Take the pain out of some performance tuning using RHEL's tuned functionality. Take even more pain out setting it up…

    1 Comment
  • Oh Molecule You've Come A Long Way

    I often get asked about Ansible testing, best practices and related topics. If there's one tool I'd have in the kit bag…

    4 Comments
  • Holy crap batman! sudo's bust!

    As we all know, software has bugs, no one escapes! How you respond and fix those bugs is what's important. If you ever…

    1 Comment
  • RHEL OS Image Builder (part two)

    This is part 2 of the blog series. Part one is here.

    1 Comment

Others also viewed

Explore content categories