The two hammers of software engineering !


In this article I discuss two very simple common techniques which are used to solve a wide variety of problems in software engineering as well as outside. These are simple, intuitive, powerful and applicable in a variety of situations as you'll see below, but I have not seen them mentioned in any algorithm book. I have come across them being used in day to day life as well as some of the most complex software I got a chance to work on and every time I came across them, I couldn't help but admire their ingenuity. Hopefully one of them can come to rescue when next time you are stuck with a problem.

Indirection

"All problems in software engineering can be solved with a level of indirection, except the performance problems" - TUSD (The Unknown Software Dude)

Problem: How would you efficiently manage bi-directional mappings (m:1) when the entities are changing ? To state the previous sentence in simple terms, consider you are the owner of a famous restaurant (pre-phone era) in the city and have lots of customers. These customers know your restaurant address and come over directly when they want to dine. Now your restaurant is moving to a new location and hence the address is going to change. How would you make sure all those customers have the new address ?

Solution: In this case, many people (m) have a mapping to your restaurant (1). For any change in your restaurant attributes, you also need to update the customers about that change. Theoretically it is possible to keep a reverse mapping to all the customers and then we can go to each customer to update the new address. But it is costly to maintain and doesn't scale with the growing number of customers. It is also impossible to maintain the reverse mapping if it wasn't done from start ie.. you can't introduce this new reverse mapping scheme without already knowing who all your customers are.

Another alternative is to leave your new address at the previous location, so that anyone arriving can follow the new address and reach your restaurant. This seems ugly and unmaintainable, if your location changes often. It soon becomes traversing a linked list and confusing multiple branches if another restaurant moves at your previous location and then moves too.

An elegant way to address is to introduce a layer of indirection. If you ever find yourself in a situation where you need to change some information or metadata in the underlying layers, which upper layers rely on or are aware of, you should be thinking about indirection. In the above example, instead of customers directly coming to the restaurant with a known address, they consult some other entity (say city business directory) to get the address. This way, you only need to update the business directory when moving to a new location and everything would just work.

The downside of indirection is that now every time someone wants to come to your restaurant they need to spend extra time and resources to get the address even if it has not changed. If this extra lookup is cheap (just looking up in the address book), it is probably ok, but if it requires them to drive to another location to consult before reaching the final destination, it is costly. Indirection often introduces some performance penalty which may or may not matter to the application.

To solve performance problems like above, one common scheme is to cache the information and only pay the penalty when the cached information is no longer valid. This way, you as a customer still remember the old address and keep going to the restaurant using that, till you find that the address that you know is wrong. In that case, follow the indirection, get the new address, update your cached address and proceed to the new restaurant location. There is an implicit assumption in this caching scheme that you can always identify a mismatch if your relationship/destination changes. If not, then there is a potential to access incorrect information. Say, if your software only cared about 'Order food from this given address' and if the old restaurant moved out and a new restaurant moved in at that place, your software would be unable to detect that and still be able to order food (only with the wrong food choices) !!

The parallel problem in the storage industry is where the file metadata keeps a mapping of disk blocks to be able to read the data. This prevents disk firmware from moving the disk blocks around to create efficient free space and creating a level of indirection (See section 3.3 for an example) can help there. If you have ever stored a pointer to a pointer in your code, you have already used indirection subconsciously.

Generation numbers

I particularly love this one because of its simplicity and ease of implementation. Here is a simple problem outside software engineering.

Problem: "If there are 100 cars parked in a lot, police officers need a quick way to identify which of the cars have invalid/expired registration ie.. were registered last year, but not registered in current year. How would you efficiently do this ?"

Take a pause and think about it before proceeding....

Solution: Obvious solution for the police officers is to lookup each car's number in their database and check if it is registered for the current year. This is painful and slow. The Department of Motor Vehicles (DMV) uses a simple scheme to invalidate and identify any such cars by issuing the registration sticker of every year in a different color, so the 2020 sticker could be in red, while 2021 is yellow. It makes it easy to just glance and identify that all cars with red stickers have expired registration.

The equivalent problem in the software world is that if you have large number of objects which needs to be invalidated based on certain conditions (say some data structures in a cache, and the cache becomes stale because of some reason, objects shouldn't be served from it) you should be thinking about "generations" as a solution. Walking each entry in the cache and invalidating them individually could be expensive. Instead you can augment each object with the current generation number (say an integer value 100) when objects are inserted in cache. Now to invalidate them, just change the current live generation to be anything other than 100 (typically generation++). Every time you need to decide whether the object is valid and can be served from cache, compare the object's generation with the current generation. If it is not the same, it is an invalid entry and should not be used.

This invalidation scheme is O(1) irrespective of how many objects need to be invalidated. Generations are widely used to solve such class of invalidation problems efficiently or to check the validity of an object.

Another example of using generations is to validate something in a stateless world. Say you, as a server, gave a token to a client which is only valid if the server state hasn't changed (eg..restart) since the token was given to the client. If the client comes back after a few days and presents the token back, how do you know it is valid or not ie.. whether you gave the token now, or in your previous life before the restart ? We can't afford to remember all the clients to whom a token was given since the server started. To solve this, just add an additional field (generation or restart count) in the token while handing it out. When the client presents the token back, it can be compared with the restart count that server has right now. If they are not the same, then the client's token is invalid.

__________________________________________________________________________

Thanks for reading this far.....if you know of other such common and simple useful patterns, let me know in comments below.

Nice article....One more example of indirection which we always used subconsciously when we stopped using hard coded values.

Like
Reply

Very knowledge full. 👍👍 waiting for more such article buddy.

Very thoughtful Manish bhai...

Like
Reply

To view or add a comment, sign in

More articles by Manish Katiyar

  • What is in a name ? - (Part 2 of 2)

    In part1 of this article we looked at what kind of naming convention is better for files stored on a local machine. In…

  • What is in a name ? - (Part 1 of 2)

    "What's in a name ?" ..

    1 Comment
  • The curious case of deleted data!

    If you follow any Indian news channel, you are aware of the pandemic that has grappled the nation and how everyone is…

    1 Comment

Others also viewed

Explore content categories