Design Considerations for Splunk
Having used Splunk for a while now, thought I'd jot down some of the gems I've learnt along the way. I'll probably not be using Splunk much, if at all, in my new forthcoming job :(
This is mostly about what you need to consider/ask when designing any Splunk solution. Lots of this is common sense and about covering all the bases. But it mainly falls into 2 camps: data provision/management and the more sexy reporting/analytics!
I present my Splunk butterfly!
The Data Questions
Get this bit wrong or don't pay it enough attention to it, and you're in for a world of pain! Most of my target end audience, are non-technical or don't want the detail. They just want the numbers/sexy graphs.
This is at juxtaposition to what you need to deploy a long lived service with minimal maintenance and disruption.
TIP NUMBER 1: Try to get sensible data guesstimates for the amount of data that will be ingested.
This is imperative to licensing and storage. As Splunk is licensed per Gb/pd indexed, you need to have a pretty good initial idea at what volumes you're looking at.
TIP NUMBER 2: Use a heavy forwarder to front up data ingestion.
As long as you just forward and not store data, the forwarder license is free. By using a heavy forwarder, you can strip out redundant data before you send it across to an indexer. I've used this several times now, as we're only interested in some data in certain logs, for a certain purpose.
This may not be important to you or you may not have any control over it, but you should pay some consideration to where you keep the data and where it's coming from. You don't want to be sending hundreds of Gbs over a WAN if you can help it. Store it locally and avoid duplication where possible and use search heads for distributed multi-site capabilities.
TIP NUMBER 3: Set a data retention policy and do some form of data management.
You can setup sensible per index defaults using indexes.conf, either using size or ageing limits. Do you really need to keep ALL that data forever? You'll be asked for it, but in my experience near real-time means probably 2-3 months at most. You might have a legal or contractual obligation to adhere to. This makes for a sensible retention policy!
I don't rely on the source systems sending over the data. I don't trust them. They are rarely sized to keep large amounts of logs and do logging because they have to.
I use logrotate on my heavy forwarders to manage the raw data logs. Excellent lilttle data management utility, check it out. If you have to re-ingest data into Splunk, you can use something like the splunk oneshot command line argument to get back to where you want. Saved my bacon several times!
The Infrastructure Questions
I started running Splunk on simple small/medium sized RHEL VMs. Splunk uses a lot of CPU and disk I/O, so make sure it gets enough, else performance will suffer.
If you're ingesting GB's of real time data, then go physical with local fast [SSD] disks backing off to 'as fast as you can' SAN for the rest. You can then arrange your buckets to take advantage of this, using slower spindles as the data ages. Also expect to have to tune your OS network stack to keep up (Linux transparent huge pages, for example).
TIP NUMBER 4: You will scale out, so build that into the designs from day one.
As soon as you have a number of indexers, put a search head or two across the top of them. They are really easy to set up and use. This is specially true if you need to report on data across physical sites, which would otherwise be managed independently.
Use license pools to manage your license capacities. You can assign 'slices' of your overall license amount to individual indexers. It's a nice way to manage licensing and protect your design and investment!
The Analytics Piece
TIP NUMBER 5: Invest some time in learning the SPL, reporting and dashboarding.
Especially if you want the ear of senior management! Tell them how many widgets you've sold in the last week, but show them split across segments in real time, on a Google Map works wonders!
Once you have useful data in Splunk and start drilling in, the end user questions will never stop. The "what does", "what if", "can you tell me about" scenarios will flow. This shows that you've done something useful, as that insight probably didn't exist until now. You have the power, learn to use it!
TIP NUMBER 6: Lookup, down and all around.
Once you've used a lookup table in Splunk, like air con in a car, there's no going back. Enriching your data with new fields from existing ones, is pure magic! For instance, I've created physical site details based on hostnames or data filenames. I can then search across 'Sites' and across data types. I've added Brand details to simple telephone number fields, to satisfy the needs of the red pen pushers.
TIP NUMBER 7: Teach people to self serve.
Pass on your knowledge around the SPL, reports, dashboards etc. People around you will soon pick up the pieces and start delving and developing themselves. The end user front end capabilities of Splunk are what set it apart from the rest. There's a reason for this. It just works and has rich functionality out of the box!
Well there you have to. That's probably enough to go on with. If you have any useful gems yourself, please do leave them at the comments door, for all to explore!
I know this comment is delayed, but thank you for writing and sharing this. These are very good thoughts to ponder early in the design phase.
I'm approaching our first deployment. Thanks for the heads up Phil. Capacity and performance will always be at the back of my mind!
If only I'd had this while I was working on Splunk!
Excellent article Phil. Thank you!
great article Phil, as ever written with real insight