Improving Reliability using Context

Anowar I.

Published Mar 4, 2021

Few of our application needs to query the central monitoring server to augment its real-time decision. Throughout we kept on finding the queries passed or failed with no apparent reason, impacting our application's decision capability. At a first glance, in a cloud environment, this is not unusual, the reason why we use patterns such as Retry and Circuit Breaker, etc. Still with this implemented, we were not able to guarantee the quality of service.

Digging around the metrics, logs, and running some queries we figured, our enterprise observability platform often was churning out queries whose time value of the request has long passed. It simply meant the server was working for zombie requests. This observation led us to think about what can be done, decrease the server timeout to reduce load, create more automation to deprioritize or kill long-running queries, a proxy caching layer, or just right out a request to horizontally or vertically scale-out.

All these ways can easily be automated, but the problem was none of the solutions were efficient enough. The problem that the server was still churning queries that were not needed. We want to hint at the server saying, I need the data in a certain timeframe and the server can accept that and just kill the query thereby freeing up the associated IO and CPU resources and thereby drive up efficiency.

This was quite an interesting question, and we started asking do our apps do the same. Are we exposing any interface where clients of our application feel empowered to tell us their sense of urgency while calling our API?

Since most of our applications are written in GoLang I started looking into its language construct first. In this article, I want to go over the power of context and how it can be leveraged to make our HTTP resources more efficient and less wasteful. I will demonstrate how easy it is to set up cancellation and cascade cancellation propagation to make sure we are minimizing waste.

A regular server implementation of hello service in GoLang

Refactoring the hello function exposes a way to receive context value

A client implementation of the requesting data and passing the hint (line #14) of the time value for the request.

The output shows the cancellation and its propagation

From the above snippets and output, we see context gives us the power of being able to cancel operations as soon as they are not needed. from here we can extend it to create new deadlines, timeouts, and cancellations at any step and pass them around. If the context at the top gets canceled, this will propagate all the way to all children contexts and those operations will be stopped too.

This implies that by adding some extra functionality, a few channels, in this case, we can become so much more efficient. Quite an important checklist item for building reliable, and scalable, secure applications in the cloud

Improving Reliability using Context

Anowar I.

More articles by Anowar I.

Explore content categories

More articles by Anowar I.

To Build Safer Systems: Vibes and Specs

Green Has a Half-Life: Building Systems That Assume Everything Is Already Broken

"AI First" Breaks the Dam: The Year AI Reshaped Everything

Making AI Systems Fail Safely

Embracing the Reliability Engineering Mindset with GenAI

Grounding AI Innovation in Fundamental Truths

Intent Engineering: The Patterns of Semantics and Iteration

Beyond the Hype: Impact of LLMs and a New AI Stack

Value of AI Infrastructure: Beyond Numbers

Multiverse of LLM temperature

Explore content categories