To DRY, or not to DRY
I recently re-read the section on the evils of duplication and following the DRY principle in The Pragmatic Programmer book by Andrew Hunt and David Thomas. In addition, I read several other articles and wanted to consolidate the information from the book, articles and my own thoughts in one place. Hope you find this useful.
First the authors of the book, present an argument as to why it is important to not duplicate knowledge in multiple places. According to them, as programmers, we primarily collect, organize, maintain, harness knowledge and make it come alive in the form of code. Since, knowledge is not stable and changes based on both internal (us learning something new) and external factors (regulatory change), we need to expend a good amount of energy in reorganizing and re-expressing what we previously coded. This puts us in “maintenance mode”. The mode becomes a nightmare when one discovers the sheer number of places - specifications, processes, documentation where knowledge has inadvertently duplicated and needs to be rectified.
In my experience, it is extremely difficult to motivate teams and developers to take on maintenance work. In an agile world where feature delivery is paramount, taking the time to refactor old code, fixing old bugs all seem to get weighed against the team’s ability to deliver new features. Product Owners often try to see if a bug fix can be delayed in order to deliver a certain feature often exacerbating the problem. This is one more argument to follow the DRY principle (Don’t Repeat Yourself) which is stated as:
Every Piece of Knowledge must have a single, unambiguous, authoritative representation within the system
Following the DRY principle allows us to make changes in one place and not repeat ourselves over and over again and enables us to develop software reliably and predictably.
The authors then present several reasons due to which duplication arises in software and also provide remedies to combat them. They also broadly categorize them into the following buckets.
Imposed Duplication - Here, the developers feel they have no choice and environment requires them to duplicate information and implementation.
- Combat multiple representations of information by building code generators from common source:- If you find yourself coding the same information representation in multiple places (front-end and back-end), see if you can write a filter or code generator to reliably replicate the same function from the same source. (ex: build class definitions automatically from database schema)
- Combat disconnect between comments and code by having a clear separation of responsibilities:- use code to express low-level knowledge, use comments to express high level explanations.
- Combat disconnect between documentation and code by generating documentation from code:- Programmatically generate documentation to ensure it is always up-to-date with code.
- Combat language imposed duplication as best as possible:- not much to be done here if the programming language itself imposes duplication. Be mindful and avoid duplication beyond what is needed.
Inadvertent Duplication - Here, the developers did not realize that they were duplicating information.
- Normalize design – don’t duplicate attributes in multiple places. Two examples are provided – if you are building a trucking application – the truck has a driver and a delivery route has a driver but the driver is a separate attribute that is independent of both – so , have a separate driver class that captures the attributes of a driver. Similarly, don’t define things that can be derived – example; the length of a line can be calculated from the difference between the end point and the start point; so, the length attribute should be a simple calculation.
Impatient Duplication - Duplication occurs because developers think this is faster.
- Under time pressure it might look like a good idea to resort to copy and paste but realizing that “short cuts make for long delays” will allow us to avoid this problem.
And finally
Inter-developer Duplication - Different developers duplicate the same information unknowingly.
- Clear design:- have a clear design that avoids duplication
- Communication:- Setup forums to discuss common problems; setup util libraries where common code can reside. Read other people’s code and be open to feed-forward.
- Foster an environment where it is easy to reuse
A slightly different twist on thinking about duplication is how we can address the above categories of duplication along the major functions of design, coding, testing and documentation.
In summary, there is only one place in software development where DRY is not applicable, this is when you are communicating with your stakeholders, team members and others. Repeat information often in order to make sure everyone is on the same page. At all other times, apply DRY and develop/deliver reliable software.
Source: The Pragmatic Programmer - Andrew Hunt and David Thomas.