Problem Solving and Debugging: The Art and Skill that is Vanishing in the Ranks of Tech Experts
Over the past few months my company and I have been pulled into very interesting emergency problem solving situations where some organization or another has been battling some nagging tech issue for hours, days, months... In pretty much every case, after about an hour of info gathering (most cases within 5-10 minutes of info gathering), we've been able to help the org zero in on the problem.
Now I'm not advocating for everyone to start calling me to solve their months long problems, and in fact I've steered my business AWAY from this problem-solving type of work because quite frankly, how does one pay the bills from 5 minutes or even 60 minutes of work?
The Need for Problem Solving and Debugging
While in some cases we knew the solution to the problem because we've seen it before, or in other cases our years of knowledge and experience led us quickly to the solution. But in all cases, we started with just basic Problem Solving and Debugging 101.
Example 1 - Email Problem
In the first case example, an organizations was having a problem where emails stopped being sent and received. They use Microsoft Office 365 and immediately started to deep dive into assessing how Microsoft might be having a problem with their cloud email system focusing a couple hours of time and attention JUST with Microsoft. However users were describing that they were able to send/receive emails internally to one another, so Microsoft seemed to be internally transferring messages between users, but it was messages going in/out to other organizations that were not flowing.
So the WHO factor kicked in for our consultant, that internal emails are fine. But external emails seemed like that was a problem. So the next question was an architecture question which was do emails go in/out through a 3rd party provider. The answer was yes, that the organization uses ZIX as a email filter and security solution where all in/out emails flow through. A quick phone call to ZIX confirmed they were down, and thus all inbound and outbound messages outside of Microsoft's email cloud were stuck.
Example 2 - Cloud Virtual Machines Running Slowly
In this second case example, an organization migrated a series of applications (Web + SQL database servers) from on-prem to the cloud. The apps worked fine initially, but then during a busy production day, the apps slowed down to a crawl. What usually took 3-4 seconds to process was taking 3-4 minutes.
The organization's solution was to upgrade the cloud VMs from good systems ($750/mo) to the top of the line cloud systems ($5000/mo). Of course the performance increase eliminated the slowdown at peak times, but the cost was killing the organization. After 3 months of blowing through the organization's budget, we were pulled in.
Recommended by LinkedIn
We enabled logging and monitoring on the systems (a free service included with the cloud subscription) and after a couple days of monitoring, we identified that 99% of what the organization was doing can be serviced fine by a very basic $350/mo system, but there was ONE process that ran every 3 hours that sucked all capacity out of the production system.
It was a report that was generated every 3 hours that in talking with the business unit identified that the report really only needed to be generated once a day, and off hours was fine! So we had the report processing scheduled to run at midnight, it killed all other process capacity for about an hour, but at a time no one was on the system. This WHAT and WHEN factor was uncovered by simple logging, and instead of $5000/mo, the org is now running on less than $350/mo in cloud hosted services.
Playing Detective, Instead of Playing (Just) Tech
These two cases and dozens others like it that we solved had very little to do with "tech work", it was just simple detective work to get facts on the table (who, what, where, when, how, etc) and with an open mind jotted down 3-4 possible causes, and even try to jot down 10 possible causes before diving too deep into ONE.
The best detective work isn't to come to a conclusion too quickly because you could be heading down a rabbit hole, and once you are too far, it's hard to get back out, dust off, and start again. So start with a lot of options to consider. Don't be too quick to dismiss one path as well. Take good notes why you think one is better than another.
Don't Go for a Quick Fix
The other advice is don't do a quick end around and put in a SOLUTION without spending a cycle or two confirming the PROBLEM first. As in the case where the org upgraded from a $750/mo to a $5000/mo solution, sure that made the problem go away, but if they only looked closer at the problem first, they would have realized that the huge spend masked the real problem of a report that was easily rescheduled to process off hours.
Wrap-up
I'm seeing a pattern these days where people have gotten so used to "googling" for a solution, to find a quick fix, that they get stuck down a rabbit hole. Use that mind of ours, ask the who, what, where, when, how questions. Before you jump into one path, jot down several potential problems and solutions.
Don't jump to a big solution too quickly, try to understand the problem so that the solution is appropriately sized for the fix. It's like pain killers, you can take pills to ease the pain, but if you know what the problem is (maybe a pinched nerve) and can fix the root problem (through acupuncture or something), then you don't need the workaround (pain killer) solution forever.
I call problem solving and debugging an art, and it is one that patient, broad minded thinkers can assess situations and come up with solutions. I hope it is a skill that others will hopefully take the time to get good at! Don't let "googling" something distract you from common sense and innate detective skills we all have, but haven't used much of lately...
Rand great post. The art of curious inquiry into problems is a dwindling art presumably because people think they have seen things before so jump to conclusions they 'know the answer'.
Rand, I really enjoyed this article, thank you. Your points about “playing detective” and not going for the quick fix are right on!
Great advice!
Love this, so true!