Improving Reliability using Context

Improving Reliability using Context

Few of our application needs to query the central monitoring server to augment its real-time decision. Throughout we kept on finding the queries passed or failed with no apparent reason, impacting our application's decision capability. At a first glance, in a cloud environment, this is not unusual, the reason why we use patterns such as Retry and Circuit Breaker, etc. Still with this implemented, we were not able to guarantee the quality of service.

Digging around the metrics, logs, and running some queries we figured, our enterprise observability platform often was churning out queries whose time value of the request has long passed. It simply meant the server was working for zombie requests. This observation led us to think about what can be done, decrease the server timeout to reduce load, create more automation to deprioritize or kill long-running queries, a proxy caching layer, or just right out a request to horizontally or vertically scale-out.

All these ways can easily be automated, but the problem was none of the solutions were efficient enough. The problem that the server was still churning queries that were not needed. We want to hint at the server saying, I need the data in a certain timeframe and the server can accept that and just kill the query thereby freeing up the associated IO and CPU resources and thereby drive up efficiency.

This was quite an interesting question, and we started asking do our apps do the same. Are we exposing any interface where clients of our application feel empowered to tell us their sense of urgency while calling our API?

Since most of our applications are written in GoLang I started looking into its language construct first. In this article, I want to go over the power of context and how it can be leveraged to make our HTTP resources more efficient and less wasteful. I will demonstrate how easy it is to set up cancellation and cascade cancellation propagation to make sure we are minimizing waste.

No alt text provided for this image

A regular server implementation of hello service in GoLang

No alt text provided for this image

Refactoring the hello function exposes a way to receive context value

No alt text provided for this image

A client implementation of the requesting data and passing the hint (line #14) of the time value for the request.

No alt text provided for this image

The output shows the cancellation and its propagation

From the above snippets and output, we see context gives us the power of being able to cancel operations as soon as they are not needed. from here we can extend it to create new deadlines, timeouts, and cancellations at any step and pass them around. If the context at the top gets canceled, this will propagate all the way to all children contexts and those operations will be stopped too.

This implies that by adding some extra functionality, a few channels, in this case, we can become so much more efficient. Quite an important checklist item for building reliable, and scalable, secure applications in the cloud

Nice pattern! Have any data for how much you saved by using it?

Like
Reply

To view or add a comment, sign in

More articles by Anowar I.

Explore content categories