My Journey to 100-percent-fully-automated-GraylogZilla-enabled™ Logging

Daniel Schutterop

Published Feb 10, 2016

In a world where you are only as strong as your log analyzing skills, who could resist a tool that ties all your logs together in a comprehensive and searchable fashion?

I know, it sounds like the intro to a terrible Sci-Fi flick, but in reality this is what drove me to Graylog in the first place: I needed to centralize my logs, and I needed the tools to help me make sense of all the logged data. I was done with just adding heap after heap of raw syslog messages to a flat file and using grep and awk to create some structure in whatever type of data I was trying to extract information from, let alone using some wonky old SQL-driven syslog framework that allows you to store data indefinitely (but just don’t try to query it!).

Though I must admit that it was fairly easy to push all my syslog data into Graylog, I also had some scaling issues when I first started out (this was in 2014, mind you). This was mainly because of my own naïveté; a three-node elasticsearch cluster along with the Graylog server processes on these same nodes (and also running the web interface) while processing the syslog, application and auditing data that it’s being fed from about 500+ concurrent sources isn’t exactly the best possible configuration, or so I found out.

Fortunately the guys and gals from Graylog made their stuff easy enough to reorganize and scale, so after relocating my graylog2_ indices to a dedicated elasticsearch cluster (and adding more nodes) things were running quite smoothly.

Then I ran into the 20–80 issue. You know, 80 percent of work takes 20 percent of the effort to get done. The other 20 percent takes… well, you do the math. While the greater part of the environment and applications were flawlessly logging everything to the Graylog nodes, there was still a small subset of applications that were either logging incoherent data, locally storing stuff using their own format, or simply logging way too much junk to easily make sense of. One of the ’20 percent applications’ that had my special attention was Foreman, for which I put together a modest content pack, available on Graylog Marketplace.

The question that got me rolling in this case was ‘can you tell me how many hosts we built this month?’, a question to which I had to answer either ‘no’ or ‘let me grab my grep and awk toolkit’. As I did the latter, the next thing I did was figure out a way to grab the orchestration data from Foreman.

I enabled Foreman to send syslog data using this snippet from the guys at Foreman, thus solving my primary issue. At the same time, while sifting through the backlog of Foreman production logfiles, I ran into a nifty little plugin named foreman-hooks, which allows you to run any kind of script or binary on orchestration events in Foreman. In short, you write a parameterized script handling mandatory parameter $1 as the hook that’s called (e.g. create, update, delete) and a $2 being the hostname you’re performing the action on.

The script I wrote, located on github, uses curl to post a message, containing the hostname, orchestration action and a simple counter (a constant 1) to the Graylog cluster’s specific global GELF listener I have running.

So functionally speaking, whenever a machine is created, updated, deleted or put into build mode, Foreman pushes a GELF message to Graylog and makes the action traceable and easy to find.

In order to display the data, I made a dashboard using the counter field in the orchestration messages and grouped the messages by their respective actions and timeframe.

While I’m still not at 100-percent-fully-automated-GraylogZilla-enabled™ logging, it’s way closer to 100 percent than the 80 percent I started out with. The main reason for this being the agility and speed with which you can whip up an input, stream and extractor. To me, the dashboard is an extra added bonus that keeps management at bay.

In addition to the Foreman content pack, I’ve also created a VMware content pack, Puppet extractor, NetApp extractor, Graylog to Graphite, and MGE UPS extractor. All of this can be found on Graylog Marketplace, which is the central repository for any add-on that can extend the functionality of Graylog. It’s pretty easy to contribute — I just signed in with my GitHub account and followed these contribution guidelines.

Even now, while I’m still adding data streams to Graylog on a weekly basis, I often encounter new applications that are still not able to handle any form of external logging, which is a terrible shame. Of course it’s always possible to get the data you need, either by wrapping, trapping or reverse engineering, it’s just less aesthetically pleasing and eventually harder to maintain (imagine malfunctioning wrapper scripts between major version updates) than using a mechanism that’s already present and functional.

Being a Python guy myself, I always include flexible logging (to File, GELF using gravpy, or stdout) in everything I build. These days my focus is definitely on gravpy, because Graylog is my main log facility.

As it should be I might add, because it’s one hell of a tool.

Daniel

My Journey to 100-percent-fully-automated-GraylogZilla-enabled™ Logging

Daniel Schutterop

More articles by Daniel Schutterop

Others also viewed

Grab yourself A Graph

Week of December 2nd

Multiple Endpoints ? GraphQL to the Rescue

6 Patterns That Make Durability Real — Plus the Hidden Dimension we need to know

Lucene, Not As You Know It — Part 2: Actually Understandable Index Creation

Create and Manage Background Tasks in FastAPI

Purge Methods in Elasticsearch and its whereabouts.

Lazy Evaluation in Spark: Lessons from a Delta Table Delete-Update Ordeal

Technical Deep Dive: AuthZ Control Plane for Agents

Data APIs are going the way of the dodo: What you need to know

Explore content categories