Some random tips for Elasticsearch

I've been running Graylog and Elasticsearch for several years now. Here's just some random tips about things I've learned the hard way over the years:

1) Make sure your shards fit in memory. I have 16gb of memory on my Elasticsearch instances, so my shards are set at around 10gb in size.

2) Make sure your hardware is *identical*. This makes it easy to spawn new Elasticsearch instances via Puppet or whatever, and makes it easy to keep performance consistent, since one node isn't going to be faster than another and thus your cluster inexplicably slows when your data ingester starts making writes to one of the nodes.

3) Accept that you're going to have to restart the cluster from time to time when a critical mass of nodes crash or there's a software update, and create some automated method to do so (such as a script on the puppet server that has keys set up on each Elasticsearch instance and does a bulk ssh to each to do a 'service elasticsearch start'). Elasticsearch occasionally decides it's not going to behave. That's just the nature of the beast, I don't know why, but what is, is. And also, when you upgrade Elasticsearch, in some cases you can do rolling upgrades via puppet but really the only reliable way is to update the software on each node then do a bulk restart. Elasticsearch otherwise tends to lose its cookies.

4) Assume that Elasticsearch has imperfect availability. Your ingestation mechanism must have some sort of caching mechanism to cache data that's to be placed into Elasticsearch until the Elasticsearch cluster is available again. This is why Graylog has a caching mechanism to cache incoming data until Elasticsearch is available again.

5) Monitor, monitor, monitor! My Elasticsearch cluster went down this morning when 2/3rds of its nodes decided they weren't gonna march no more. Nagios promptly notified me, and one software update (to fix the issue that caused the crash) and cluster restart later, it's up and going again.

As long as you keep all this in mind, Elasticsearch does a pretty good job of ingesting large amounts of time-sequenced data onto fairly modest hardware and querying it rapidly. My current Elasticsearch cluster is ingesting log data from Graylog onto st1 EBS volumes (spinning rust) and it works fine. No need to pay for SSD performance when you really don't need it.

To view or add a comment, sign in

More articles by Eric Green

Explore content categories