This Code Cannot Be Refactored!
I heard so much about refactoring and its benefits. But, I also heard this sentence multiple times during my career from different programmers:
This code cannot be refactored.
I'm not refactoring expert, at all, but I hardly remember times that I looked at some code and didn't know how to refactor it. I often had some ideas to make it a better code, i.e. more readable, more maintainable and more extensible. But, sometimes the refactoring needs more effort. In my opinion, there is no truly un-refactor-able code. Yes, I know that there are pieces of code that are hard to refactor for sure, and surprisingly, they are the best candidates to start refactoring.
Here I am going to write about some heuristics, ideas and techniques I earned after more than five years of working with multiple code bases, including tons of legacy code, writing a lot of dirty and clean code pieces, and also studying some articles and books about clean code, refactoring and so on. I borrowed some ideas from the Uncle Bob's "Clean Code" and tried to add my experiences.
Working, 10 Years Old, Unchangeable Software
This code is working for 10 years without any big problem. Do you really want to modify it, i.e. "clean" it? Are you sure!?
When you tell me or show me with your gesture that it's not a good idea to modify a piece of code, I become sure that it needs to be cleaned! Modifying code is our job. When there is some obstacle in front of us that makes it hard or impossible to modify code, it's a clue about bad code, and bad code should be cleaned to be good code or we have to put it aside and created something from scratch.
If and only if you tell me with confidence that I am free and welcomed to modify and, much better, to extend it, I'll be convinced that it's clean and I promise to think twice before changing a line in it, because, after reading the code, that is always easy, I'll realize that there were some good guys who did everything in the best possible way. Good code calls you to praise its authors.
Minimum Possible Change, Just That
This software is working perfectly and it passes all the tests. Just edit it for bug fixes and add new features with minimum possible change.
If you do that, soon or late, you will end up with a big mess. Each time you fix a bug or implement a new feature by adding one special case (e.g. adding on if-else), without refactoring the old code, redesigning if required and adding new unit test cases, you are going to make your code one step closer to the end of its life. When it's not possible to read, refactor and extend a piece of code, that is dead for sure; even if it passes all the tests. Like each living thing that grows, living code must grows.
Each bug fix, each new feature, each touch of code is an opportunity to add some new tests, influence code coverage, refactor it, found hard to read spots, simplification and so on. It's the only way to prevent code rot. Creating special tasks for refactoring with lower priority is the best way to kill this opportunity. Later means never.
Myth of Unit Testing
It's impossible to unit test this code. Unit test is for books.
Or, maybe, if we think in another way, you designed a highly coupled module, that is very hard to unit test in isolation. We are used to create big classes that do a lot of things, full-feature classes! To do that, we add dependency after dependency. We satisfy ourselves by making our dependencies, i.e. our attributes, to be private. Tens of private pointers and references to tens of other giant classes. And after some months or years, when we are tired of bug fixing in a non-systematic way and testing in highest level, we start to think about unit testing. And, often it's too late.
On the other hand, when you try to create mock classes to test your particular class, other giant classes are absolutely hard to mock. It seems that everything is against unit testing. It seems that unit testing is just possible in theory.
Don't fool yourself. Making members private is not enough. You must avoid coupling by any means possible. Use design patterns for it if you have no idea. Create simple, small and clean classes that do one thing, and do it in best possible way. Keep SRP in mind in all levels of design and implementation. Use test-driven development.
Unbreakable Function, in the Core of Code
This very very long function or method is not breakable because it does a specific thing and breaking it to pieces makes them seems incomplete.
Yes, functions and methods are supposed to do one specific thing but it doesn't mean that they have to do all of that by themselves. They definitely can do the operation by calling other functions and methods. It's just the abstraction level that matters. It's desirable for a function or method to do its operation in just one level of abstraction and not to mix different levels of abstraction. Going deep into details breaks this rule.
I found the following procedure useful when implementing a complex algorithm in a function or method. Think about a function in highest possible level of abstraction and try to describe the operation in a list of simple English sentences. Then make each sentence a function name and implement it in one abstract level, just one level below the original function. If a sentence is not convertible to a function name, you have mixed multiple levels of abstraction. Conditional sentences are a good example of mixed levels of abstraction. Do it repeatedly to have a bunch of small functions or methods, each in just one level of abstraction, with good descriptive names and small body length.
If you already have a long function, you should refactor it in another way. If the function is a list of steps, each step contains one or multiple line of code (and maybe, has a inline comment before each step), it is easy to make each step a function. The inline comment is often a good name and it could be removed after making the step a separate function. Just look at function "_construct_records" in the source code of TensorFlow source code in this link to get the idea (I spent some time finding that function. I don't mean the code of TensorFlow is a bad code, at all).
If the function is complex because of interleaved if-else, loops and so on, it is a sign that it mixed multiple levels of abstraction. So, a good way is to break function to smaller functions reflecting these levels of abstraction. So, each smaller function, have just one control structure, a loop, an if-else statement, a switch-case statement, an exception handling clause or some other programming construct. It's an iterative process and you can do it top-down. Continue until you have bunch of small, simple, well-named function that implement just one level of abstraction.
Stupid Ugly Descriptive Names, a Train of Words
It's almost impossible to rename this function or method or class or module because a good and descriptive name will be more than 80 characters.
It's just another sign of unnecessarily big and complicated things. For example, long functions or methods that mix layers of abstraction, a class doing a lot of things but lacks cohesion, a module that is overloaded with unrelated functionalities or something like that. It's a red sign though. Stop talking about renaming it or not and start breaking it until you can name all the pieces with less characters to fit in one line!
We Need All of Them to Be There
This function or method need 10 argument. You cannot reduce the number of arguments, it need all of them.
Another sign; the function or method is big. It needs a lot of data to do something complicated. If it was designed to do something simple in one abstract level, it would need 1, 2 or 3 arguments and that is it. You might be able to encapsulate some related arguments in an object and pass it to the function or method instead of multiple arguments. If you have trouble doing that, it may be your design problem. You should move some functionalities between classes to be able to just act in one abstract level inside your original function or methods. This process can ends up with some new classes that are nothing to afraid of.
Don't Forget Function Comments, Forget DRY, Repeat Yourself
We are very keen to add comment to every method, even private ones. Repeat argument names and argument types if you have no idea. Our comments are flawless!
Repeating sucks, even in documentation. When there is simple way to talk to audience by choosing a descriptive and good names, why I have to add comments to it like a monkey!
def add(num, num_list):
""" Add all num to all the element in l.
:param num: a number
:param num_list: a list of numbers
:return: nothing
"""
for i in range(len(num_list)):
num_list[i] = num + num_list[i]
Looks perfect, huh? But it's not, in my opinion. This one is telling all the story in less lines and less opportunities for inconsistency.
def add_num_to_elements_in_list(num, num_list):
for i in range(len(num_list)):
num_list[i] = num + num_list[i]
Use descriptive names every where possible to reduce comments. Leave formal and mandatory comments to public APIs only and to just rare cases. Comment is better to be exceptions that add value to code rather than mandatory fixed-format duplicated things to convince our doc generator tools. Nobody read documents generated by that tools to know how a piece of code works internally and how to fix or improve it. I never did that in my 5+ years of career.
Public API is another beast, though. Remember that public API must be minimized and simplified as much as possible. Well-designed, small and clean public API needs a small amount of documentation. Sometimes 10 lines of sample code using the API is just what the API user needs to start working with a good API and he/she could refer to you API doc, generated with automatic tool, for more details. At least, for the public API of a class it's enough.
Conclusion
If a code is hard to to read, maintain, extend, unit test and refactor, there is something wrong in its organization. Sometimes it is helpful to stand far from that complicated module and think bigger. Maybe you have to move some pieces of code or create new classes or new data types to be able to continue refactoring. Sometimes things must be broken. Almost always, good naming is the best help. Replace needs for unnecessary comments with well-named functions and classes. Replace needs for inline comments with making newer level of function calls. Avoid unnecessary coupling from scratch. Add unit testing to you development as soon as possible.
There is always a way to make the code better, for sure. Just don't give up.
Awesome, thanks!
We need to face, now, the situation of machine-generated-code. It appears to be a fact that this kind of code "cannot be refactored". I have studied this case some time ago, and the point is that the refactorization is not impossible by itself. Instead, the perspective is different: since it is (normally) a huge amount of machine-generated-code it is, in practice, "not refactorizable". However, I had not the opportunity, so far, to test what happens if you feed some machine-generator-of-code with the code generated by another tool, and order the second machine to refactor it.
This is very good!