Diving deep with the problem statement

Diving deep with the problem statement

Came across a situation where customer experiences increased API rate due to the race condition in custom code implementation, that was the result of decomposition of a monolithic application into multitenant API based services.

With the above statement, how do you think we should solve this?

First step is to analyze the problem statement. 

1\ Identify the nouns, noun forms, and phrases. [Identify what to analyze]

In the above statement, these are API rate, Race condition, Custom code implementation, Decomposition of a monolithic application, Multi-tenant API based services.

2\ Pair the cause and effect [Analyze]

#1 Race condition in Custom code implementation increased API rate.

#2 Decomposition of a monolithic application into Multi-tenant API based services led to Race condition.

3\ Identify the root cause [Analyze]

It is the “Race condition” that we need to focus on.

Dive deep

Do we know what it is - First, what is “Race condition”?

Say, your system interaction is made of multiple components working with each other. For a desired outcome, these components work in synchronization. Let me call them events. For visualization, Event 1 depends on Event 2 and Event 2 depends on Event 3. Note that each event execution is independent of each other and can execute in an asynchronous manner. In this case, it is the responsibility of the application designer to enforce the strict dependency on the execution, as out-of-sync execution may lead to inconsistent states within the system. In an environment, where concurrent requests get into the system, it is possible for the system to go out of the sync, and lead to out of sequence event executions causing inconsistent states, leading to multiple different paths of execution. This condition is called “Race condition”.

Example Scenario: Imagine an e-commerce platform where product inventory is managed through an API. A race condition occurs when two customers try to purchase the last item simultaneously:

Say, both customers' requests read the inventory as Now, both requests attempt to decrement the inventory and create an order. One request succeeds, but the other fails due to the race condition. The client of the failed request automatically retries. The retry fails because the item is now out of stock. The client makes additional API calls to check for inventory updates. The user, seeing an error, manually retries the purchase, generating more API calls. The system detects the inconsistency and makes API calls to reconcile the inventory. In this scenario, what should have been two API calls (one for each purchase attempt) has turned into multiple calls due to the race condition, significantly increasing API traffic.

Next, We need to further analyze now, how Race condition resulted in increase API rate? [Expand your analysis dimensions, think deeply, what all aspects could lead to increased API rate because of race condition]

I am listing a few. Feel free to expand the analysis.

Uncontrolled Retries:  When race conditions occur, they often result in errors or unexpected outcomes. Applications implement retry mechanisms to handle these failures. If multiple clients experience race conditions and start retrying their requests.

Self correcting mechanisms:  Race conditions can cause data inconsistencies. When the application detects these inconsistencies, it may make additional API calls to verify or correct the data, increasing overall traffic. 

Compensating flows:  The system might need to perform compensation actions to correct the state. These actions often involve additional API calls, increasing traffic.

Keep Polling:  Race conditions cause delays in data updates; Applications might start polling the API more frequently to check for changes.

Duplicate Requests: Race conditions can cause operations to appear as if they haven't completed; clients panic, might resend requests, increasing API traffic.

Cascading Effects:  Race conditions in one part of the system can trigger a ripple effect, causing multiple components to make additional API calls to recover from errors.

Debugging and Monitoring: When race conditions occur frequently, developers and operations teams might increase logging and monitoring for understanding the internal state of the system. This would lead to more API calls for the diagnostic purposes.

User-Triggered Retries: Race conditions cause visible errors to end-users, they might manually retry operations hoping to recover from inconsistent situations, leading to increased API traffic.

How do we address this problem?

[Categorize your problem] Is it a configuration problem or a design problem? It is a design problem. Throttling API GW is like treating the symptom. Neither the problem is related to multi-tenancy nor to the decomposition of monolithic application into microservices. The devil lies in the “Flow design”. One needs to enforce strict component dependency for system consistency through out the flow execution. The solution lies in implementing concurrency controls that ensure consistent states, optimize retry strategies, and design APIs to handle high concurrency efficiently.

#Modernization #Microservices #AWS #Migration


To view or add a comment, sign in

More articles by Raghavendra Prakash

Others also viewed

Explore content categories