Running top as a systemd service or through Splunk scripted inputs

Vinay Setlur

Published Apr 26, 2020

You are busy working on your assignments - preparing reports for audit , making life easier for your team by working on some scripts for automation , attending a meeting for a new application getting on-boarded , working outstanding issues , or just taking your dog out or getting yourself a coffee refill when you get the "dreaded code-red" call !

You are contacted by your Crit Sit team to get engaged on a production performance issue . It's a heated call where nobody is off the hook and Miranda rules (" You have the right to remain silent ") would only exacerbate the pain while everyone is looking at you for the silver bullet . Nothing would make you more happy than getting the folks up and running with acceptable operational throughput and you are scurrying through your servers and logs looking for the "smoking gun" while all you see are merely "red herrings" - And you have a tough time explaining that to the folks on the call .

The database is often guilty until proven innocent and it takes a lot of facts to be strung together after identifying the false alarms and then eliminating them one-by-one.

Tracking over time:-

You start of by the simplistic approach of tracking your observations :- If they occurred on a "good" day and are also seen on a "bad" day , you identify those as your "false alarms" .

An example would be a one-off bad query that essentially doesn't contribute to a CPU spike on the Database server and occurs once in a while even on days when the application hasn't reported any performance issues . Too many of them and yes , that could become a glaring issue - but is that indeed the case here ?

In order to get that right , you would need a tracking mechanism over time .

What would be the best parameters to track over time ? Well obviously for a DBA , the queries to the database server would be a good start .

For MongoDB Databases , cloud manager aka mms installed on one of your MongoDB replica set hosts can provide a good amount of metrics with respect to the number queries per second , inserts per second , the connections over the slice of time reported , Query targeting ( documents scanned / documents returned )

You may also use mtools as a good means to track your query trends over time (although it's not supported by MongoDB) and that can generate a plethora of information with mlogvis my favorite . Feed this the mongod.log file and lo and behold !

-- And sure , all other databases have their own slick tools for tracking query performance over time .

However , Are these poor queries indeed the cause of the problem at hand ?

It is an important question because it brings forth a scenario that is often overlooked while the focus is solely on the one-off "bad" queries.

Application teams may be unaware of the queries that get spawned on their threads and it's impact on CPU and memory on the Database server and may rely totally on the verdict from the DBA .

The ultimate corroboration to a query being a part of this problem would be it's contribution to the CPU or memory on the database server.

Too many bad queries and yes , "Houston , we have a problem" .

This is where the top as a service comes in handy for all your databases that might be running on a linux host.

Run the top command to stream the load average , cpu and memory consumption over time . Your analysis must begin at this point and then check if you see a corresponding spike on your database metrics . This would save a tremendous amount of time when you see low values on your load average , cpu and memory trends but see an occasional bad query.

Don't get me wrong , this article is by no means trivializing the effects of bad queries on your server but intends to unravel seldom visited territories and also present a more complete picture without jumping into immediate conclusions just by spotting a bad query and calling that the root cause of the problem .

To create a top service , you would

Run Top in batch mode as follows:- For example , call this top.sh

#!/bin/bash
top -b -n 2 | head -5

2. Run a tracker script as follows:- For example , call this track_top.sh

#!/bin/bash
while [[ true ]]; do
   track_tm=$(date +%F'T'%T)
   top_load=$(./top.sh | grep -i load )
   top_cpu=$(./top.sh | grep -i cpu)
   top_mem=$(./top.sh | grep -i "KiB Mem")
   top_swap=$(./top.sh | grep -i "KiB Swap")
   echo ""$track_tm" "$top_load""
   echo ""$track_tm" "$top_cpu""
   echo ""$track_tm" "$top_mem""
   echo ""$track_tm" "$top_swap""
   sleep 10  # Tweak this value depending on how aggresive you want your tracking 
done

3. Get sudoers access to run a sudo /bin/vi /usr/lib/systemd/system/*** to manage your services on the servers you are responsible for .

$ sudo /bin/vi /usr/lib/systemd/system/mongotrack.service

[Unit]
Description=tracktop

[Service]
Type=simple
User=root

ExecStart=/<your script location>/track_top.sh > trackout

4. Start the service :- sudo systemctl start mongotrack

You could track the trackout file or also simply avoid installing this as a systemd service if you have Splunk in your enterprise .

The way to do that would be to have the track_top.sh modified as under :- (basically take it off the loop and control the execution interval on the splunk forwarder inputs.conf with a scripted input )

#!/bin/bash
   track_tm=$(date +%F'T'%T)
   top_load=$(./top.sh | grep -i load )
   top_cpu=$(./top.sh | grep -i cpu)
   top_mem=$(./top.sh | grep -i "KiB Mem")
   top_swap=$(./top.sh | grep -i "KiB Swap")
   echo ""$track_tm" "$top_load""
   echo ""$track_tm" "$top_cpu""
   echo ""$track_tm" "$top_mem""
   echo ""$track_tm" "$top_swap""

The inputs.conf , track_top.sh and the top.sh would need to be pushed via the Enterprise Splunk deployment server to the splunkforwarder bin directory on the database server in question ($SPLUNK_HOME/etc/apps/<app-name>/bin).

This will allow you to create a timechart like below demonstrating your CPU utilization over time and verifying if the bad queries you have observed are indeed the culprits or if there more lighter but voluminous ones escaping your eye -- which are silently creating these spikes in CPU

To view or add a comment, sign in

Running top as a systemd service or through Splunk scripted inputs

Vinay Setlur

More articles by Vinay Setlur

Others also viewed

Which One’s Your Perfect Fit: Elastic Stack vs Splunk vs Wazuh

Macros for Splunk: A Quick Guide

Forwarding MAS8/MAS9 MAXIMO Manage Logs to Splunk. Bundles Only.

Securing MarkLogic with DISA SRGs and STIGs

SQL Server Always Encrypted: Complete Technical Deep Dive

Ingesting Keycloak logs into Microsoft Sentinel with Logstash

Design Considerations for Splunk

Are You Paying Too Much for Splunk?

Solarwinds, Splunk, or Datadog?

Scaling a Splunk Search Head Cluster - Part 2

Explore content categories