Monitoring - Devils in the details
In the world of IT when we think monitoring most people think is it up or down.
A few more ask the question is it fast enough or not? And a small group ask why is it down or slow.
Know why your critical application gave up the ghost is no longer something that should happen and result in an emergency meeting with senior management.
Today it's safe to say downtime from the poor performance is just as likely to kill your business application as being down by other outages. With many critical systems having imports and exports to other systems, often outages just cannot be afforded, and with daily jobs over running this can and does cost millions in lost time and effort.
Now traditionally what would happen is that once someone "normally at management level" would report a problem with performance and it could have already been hours or even days of poor performance. Now depending on monitoring and or if the application is an in-house or external vendor, at this point, working out how slow is it and why can become a lengthy process taking days. And I'm sure some of you are imagining some chain-smoking developer hunched over a computer but that’s not how it works I can assure you. That guy your imagining is your tax consultant.
Now back to the point a far more effective way of monitoring is to have the application stack and database monitored from the outset letting you a establish positive and negative performance trends during a normal day.
It's important to have baselines and targets, so you have something to compare against.
This proactive approach means not only do you know where the problem is and how much it has slowed down making troubleshooting a task of minutes, not days it also allows you to compare historical data against what could be monthly or yearly trends and better forecast your needs. And if you are super proactive and review this data you might be able to avoid performance outage altogether.
Some of the tools out today such as Dynatrace, Stagemonitor and Prometheus are very good at going into the details and cost you only storage and some CPU cycles to run.