The High Cost of Technical Debt: A Case Study

The High Cost of Technical Debt: A Case Study

Technical Debt: we all have it. Yet, this phenomenon remains poorly understood by product managers. Unlike financial debt, the costs are often hidden and difficult to measure. But the most dangerous aspect is that "Technical Debt items are contagious, causing other parts of the system to be contaminated with the same problem, which may lead to nonlinear growth of interest." [1]

Here's a case study of one such event; unmanaged tech debt caused interest costs to spiral catastrophically out of control.

To set the scene, we have an online shopping system with the following requirements:

  • A catalog of 30,000 total products
  • 500 peak concurrent users
  • An index page that displays the top 24 products in a chosen category (for example, a list of Blu-ray titles).

How much would you expect to pay for the SQL cluster running this catalog? Would you believe $360,000 per year in raw IaaS expenses? How could this happen?

Ten years ago, the system's creators built the catalog on a custom data access layer. At the time, it worked pretty well, but the architecture had a fatal flaw.

As years passed, the system grew. New discount coupons, special promotions, pricing tiers, inventory management, wishlists, and fulfillment features piled on. What was once a nimble system grew to 4 million lines of source code. The fatal flaw spread, making the code nearly unmaintainable. This in turn created a feedback loop: The code was so hard to understand it could no longer be refined during regular iteration cycles, and obsolete or unused features could not be removed. The system got bigger still.

Along the way, system crashes were common. Downtime averaged about 1 day per month. Each time, the only option for bringing it back online quickly was to add more hardware. Bit by bit, the SQL cluster grew into a $360,000 monstrosity.

The total cost of this tech debt will never be calculated, but it probably runs in the millions: lost revenue, customer service calls, lost employee productivity, and tarnished brand equity. If minor investments had been made in correcting this flaw early, the massive losses were entirely preventable.

If you're a product manager, you urgently need to track your tech debt. Maintain a backlog, and dedicate a portion of your iteration budget to paying down the debt. Make a realistic economic estimate of the costs of delaying this debt paydown. Understand how dangerous this problem can become when it becomes contagious. Don't be my next case study!


[1] Martini, Antonio & Bosch, Jan. (2015). The Danger of Architectural Technical Debt: Contagious Debt and Vicious Circles. Proceedings - 12th Working IEEE/IFIP Conference on Software Architecture, WICSA 2015. . 10.1109/WICSA.2015.31.

Dylan Tack love the article! Curious if you think that in the future businesses will be able to utilize blockchain tech to curb these costs?

Like
Reply

I don't think there's a simple answer. Lately I've been influenced a lot by "The Principles of Product Development Flow", by Donald Reinertsen. This book claims traditional ROI calculations are counterproductive when applied to product development, and instead advocates a scheduling metric based on the cost of delay divided by the task duration. In other words, the highest economic priority is always the job that generates the most cost-of-delay savings per unit of bottlenecked resource. Figuring out how to untangle that, and apply it to software engineering, is perhaps a topic for a future post. I think if an organization can track it's debt (via backlog tasks), and make even crude guesstimates at the economic cost of delaying each task, then they are ahead of 95% of orgs out there.

Great work putting some (shocking) numbers to this. Do you have any ballpark recommendations for what portion of the budget to dedicate to tech debt type issues to be more sustainable?

To view or add a comment, sign in

More articles by Dylan Tack

  • Did the FAA wait too long to ground the 737 Max?

    "available data and aggregate safety performance… provides no basis to order grounding the aircraft." This was the…

    3 Comments
  • Exploring Analytics with Jupyter and SciPy

    In my last post, I wrote about the cost of tech debt, using a case study of skyrocketing hardware costs. Here's…

  • Be a Partner, not a Vendor

    Years ago, I received a frustrating email from a disappointed client. I was confused – from an engineering perspective,…

  • A $1000 brain: almost human

    I was dusting off my copy of Ray Kurzweil's The Age of Spiritual Machines today, and found a fascinating chart (adapted…

Others also viewed

Explore content categories