Continuous Refactoring
I'm republishing a series of articles I wrote a long time ago on Blogger. This is article #3
My code is becoming a mess!
Every piece of software started nice and simple. It's straightforward. People had no problem understanding it or making changes to it. Over the time, as more and more features got added, the code became longer and longer, the logic became more and more complex. At some point, the software started to lose its clarity. Fewer and fewer people can get a grip of what the code is doing, especially for the new comers. No one can understand the code well enough to anticipate the consequences of a change. "If it ain't broke, don't touch it". So what do people do when they are asked to add a new feature? Even though there's existing code that achieves 70% of the feature to be added, instead of trying to reuse it, the developers will copy and paste existing code and tweak it for their needs. The logic gets duplicated, spread all over the place. There are multiple implementations of the same thing and you don't know which one is correct or more current. The code just keeps snowballing and we eventually get ourselves a spaghetti ball that's unreadable, unmaintainable and untestable.
Untangling the spaghetti ball is extremely difficult and people often prefer to just throw it out and start fresh again. But if great discipline is not exercised, we will just end up with another spaghetti ball pretty soon.
What do I do?
So how do we prevent this from happening? By designing a highly modular, high coherent and low coupling architecture upfront? That doesn't always work because you can't possibly anticipate all future requirements and design your system to accommodate.
A more practical approach is: Don't over engineer. Keep your system simple so that it meets your current needs. But keep your eyes open and watch out for problematic spots. When you see such spots, refactor them to keep your code high-coherent and low-coupling. Never allow the trouble spots to grow out of control and become a monster that no one dares to touch. Of course while keeping it simple, you also need to keep scalability in mind and don't make obvious poor architecture choices. If you and your team can continue this practice of refactoring whenever needed, you'll be able to keep your code clean without unnecessary overhead.
Another benefit of this approach is that the cost of refactoring is relatively low because the scale of the refactoring is small. This means the refactoring can be done with minimal impact on the project delivery. Refactoring code usually doesn't have any immediate visible business benefits. So if the cost becomes too high, there will be pressure from business and management to defer it and focus on feature implementation first. By delaying it further, the cost will just continue to grow exponentially, which makes it even less likely to be ever put on the agenda.
When to do it?
So How do I know a refactoring is needed? There's no definite answer, but here are some scenarios that usually call for a refactoring:
1. When you have a reasonably big monitor and you don't see the full body of the method on your screen. Extra warning if you have to do page down twice or more. This is probably the most frequent offender in your day to day life dealing with legacy code. I have seen a method that's over 600 lines and a source file with over 10,000 lines. Despite my best effort, I was completely lost. On the contrary, take a look at the implementation of JDK and see how often you see methods that don't fit into one screen. Human brain is better with processing hierarchical information. Having to process hundreds of lines of code all at once causes mental overload. An analogy is to memorize driving directions from one city to another. Most people can't remember all the turns and exits all at once (unless you're my wife who has the supernatural power of remembering directions). It's much easier to break it down into stages. First figure out how to get out of town, then the highway section, then the local section in the destination city. You only need to focus on the current stage you're in and you can happily forget the details of the other two.
2. If you see lots of if-then-else, especially nested. This is another cause of mental overload. Every level of if-then-else causes the branches to grow exponentially. There usually is a way to externalize some of the logic and reduce the number of if-then-else.
3. If you can carve out portions of the existing code and reuse them. Or if you can find a better and cleaner way to achieve the same thing. This is a no-brainer.
4. If you find the code awkward to use or it doesn't fit with the domain/business concepts very well. This usually calls for a re-architecture to reflect a better understanding of the domain. This isn't straightforward and probably won't come up very often either.
How to do it?
So how should I refactor code that looks unwieldy? Putting each page of code into a separate method obviously doesn't work. The goal is not just to make each method short, but also to make the methods doing a coherent unit of work, to hide the complexity with purpose revealing method names so that I can ignore the details if I want to.
What I normally do is to think at a high level about what needs to be done to accomplish the task. Take the driving direction example, a logical break down maybe
getOutOfTown()
cruiseTheHighway()
exitHighway()
driveLocalUntilDestination()
In order for this approach to work, you have to understand the existing code. Otherwise it becomes risky as your abstraction may leave out critical pieces. It will be disastrous if the step to exit highway is missing from the abstraction above. That's why it's crucial not to let the code grow so big that it's beyond understandable.
Once that is done, next step is to find relevant code to put into each method. Don't be surprised if you find some code that doesn't fit into any of the methods. When that happens, either it belongs to somewhere else (people often put code where it's convenient, but that's not necessarily the right place), or the code is no longer needed (it's not uncommon that requirements, situations have changed and some old code became obsolete). This is also the time to think about the scope of each method. Most likely these method should be private. In case it makes sense to make a method public so it can be used by other files, also think about whether it should be moved to another file or a new file of its own.
Once you've found home for all the code, you can repeat the process again to see if the code in these new methods can be refactored further with purpose revealing method names. Comments can become outdated, but code never does. So if you can work hard and keep the method names always reflect what the code does, that's how the code can serve as a live documentation.
Take the driving directions example again. Maybe at one point there was code added to calculate the driving duration based on the speed limit. Some time later, the traffic condition was also taken into account. It's not difficult to imagine that the part to calculate the duration became so complex that it has buried the core purpose of the method, which is to find a route between the origin and the destination. It's better to make the duration computation logic separate and explicit so the routing algorithm stays compact and clean. Then some time later, there was requirement to get directions for public transportation, bicycles and pedestrians. Now it's time to externalize the routing algorithm as well as the duration computation logic so that they are reusable across different transportation types.
Reminder:
Never Let Your Code Grow Out Of Control!