The Three ways of DevOps – Part 2 of 3
On my previous article I introduced the concept of three ways of DevOps, focusing on the first way, the way of flow. Now is time to focus on the second way, feedback.
The Second Way – The Principles of Feedback
Feedback, as defined by Oxford dictionary, is “Information about reactions to a product, a person's performance of a task, etc. which is used as a basis for improvement.”
While on the first way we were focused on creating flow for the activities of the streamline, the second way is all about providing feedback on a fast and constant way.
The flow of our value stream goes from left to right, while the feedback is on the other direction, from right to left.
This is key to the opportunity to quickly identify and address problems while they are small and easier to fix, preventing it to late discovery when fixing can be costly and take longer time.
But it is not only a matter of discovering and identifying the problem as early as possible. When this occurs our entire value stream can suffer with delays, waiting for the identified problem to be solved.
Swarm the problem
In order to keep the work flowing is important to address the problem as soon as possible and swarm it. It means to mobilize whomever is needed to have the problem fixed, getting it back on the value stream.
By swarming the problem and fixing it as soon as possible we not only get back on track early, but also prevent the problem to spread and avoid the creation of technical debt.
If the problem is not addressed by the time it is discovered or we plan to fix it later because “there is no time to fix it now”, this will occur in technical debt because we will never find the time to fix that. And the cost will be much higher. Furthermore, this type of problem may occur again with other tasks.
Learn from problems
That’s why it is important to not only address it and swarm it to have it fixed, but also to identify what caused the problem, document and learn from it, improving the process and procedures to avoid that it happens again.
This process also requires a safety environment where errors are not punished but addressed and used to learn from it. We need to encourage people to raise their hands when a problem is found, no matter who caused it. The focus must be on solving the problem and leaning from it, not punishing.
Quality close to source
Another key point is to keep the quality control near to the source. For example, a peer review of development can be much more useful than an approval from a manager two or three levels above who has no idea, in fact, what he is approving.
Techniques such as pull request approval can help on this action, but it cannot become a stopper. It is important to keep the batches small as possible, as described on the first way. Remember: someone can take 30 minutes to analyze changes and provide feedback on 50 lines of code change, but a feedback on 1000 lines changed can be done in 2 minutes!
Automatize controls
We must be careful with the controls we put in place and make sure they don’t create unnecessary delays on the value stream. Some controls such as manual tasks, several approvals from busy line managers, large volume of documentation that will soon be outdated and waiting for large batches to be approved on special comities can easily become a trap, creating constraints and hold the work on the value stream.
It doesn’t mean that we don’t need to have quality controls or approval gates, quite the opposite. The controls are required to be in place and constantly updated. However, those controls must be automatized as much as possible to do not create unnecessary delays and constraints in the value stream. And the earlier the control is in place on the value stream, the sooner is the feedback and identification of problems.
Blameless
And keep in mind that everyone involved on the value stream has its responsibilities. Developers must be responsible for the code they create making sure it works as expected, for example.
And remember, when an issue is found in production, is not fair to blame the user for not having tested a specific scenario and have accepted the UAT. Neither blame the manager that approved the change that he had no idea what code was affected. This will not solve the issue, but only create conflict and making people to hide problems when they happen.
On the stream value, in fact, everyone is responsible for the success and the failures, so don’t lose time trying to find who to blame but swarm the team to focus on solving the issue and learning from that.