The Three ways of DevOps — Part 3 of 3
So here we are on the third and final part of the series “The three ways of DevOps”. We went through the ways of flow and feedback. Now it’s time for continuous improvement.
The Third Way — The Principles of Continual Experimentation
While on the first way we define the stream value process and on the second way we get feedback, on the third way we focus on how to implement the insights produced and how to learn from it.
The feedback will have no purpose if the information acquired is not used to improve the process and implement changes. But implement changes might be very challenging depending on the type of organizational culture we work.
Organizational culture types
The organizational culture can be defined, as observed by sociologist Dr. Ron Westrum, on the following types:
Pathological — Characterized by fear and threat. People hold information for political reasons and often distort it to make them look better. Problems are hidden.
Bureaucratic — Driven by lot of processes and controls, highly siloed with low flow of information between departments. Failures are punished for not following processes.
Generative — Shared responsibility where failures are shared and result in reflection. People seek and share information with a common goal for the organization to achieve its goals.
If you identify that you are on the generative type of culture, it’s a halfway done! For other types the change is more challenge, but possible and needed if the organization really wants to become a top performer.
In order to pursue a greater level of maturity in the process, after implementing techniques of first and second ways, it is important to observe the following key point to successfully enable the improvement and learning in the organization.
Enable safety
People must feel comfort to raise the hands and tell that they have a problem, caused and issue or took a wrong decision.
If the environment doesn’t provide this level of comfort what happens is that people will hold information, trying to hide it as much as possible and work to fix the issue by themselves. It not only creates a stress on the person that is handling it by itself, but also lost the opportunity to investigate why it happened and most important, how to prevent it to happen again.
When the issue cannot be hidden and spread throughout the organization, management implicitly or explicitly tend to hunt the responsible and make sure that there will be punishment. Remember Sarah Moulton from Phoenix Project and Unicorn Project? That's exactly what she does. Hunt and blame.
It then get even worse with new controls, policies and approvals created to avoid the issue to happen again, but what really happens is just add more complexity and delay on the stream value.
Instead, by enabling safety on the environment allow people to share the problems as soon as they happen, mobilizing whomever is needed to try to solve it and conducting a postmortem analysis once the problem is fixed. The problem is then documented, and the solution shared throughout the organization with the intention that others learn from this type of failure.
"If you are not failing, you are not innovating enough” (Elon Musk)
Improvement of daily work
We improve our daily work by analyzing the tasks and finding better way to achieve results. We also must reserve time to pay technical debt as it can make our work easier, allowing us to stop using workarounds, fixing for good the problems and focusing our time in find solutions that really bring value to our customer.
These improvements can be done gradually on small tasks. Big bang changes often fail because we introduce so many new concepts and process suddenly, making it harder to adapt and follow them.
Local discoveries to global improvements
When something good is discovered and successfully implemented, there must be a mechanism to spread the world, share the findings among the whole company.
Creating shared source code repositories and postmortem reports available to everyone help to make the information consistent and useful, transforming tacit knowledge into explicit knowledge.
The sharing of experience can help other teams save time when searching for a similar solution that others might have experienced already, giving a heads up and get insights for upcoming solutions.
Inject Resilience
Resilience is the ability to recover from an unexpected situation, from difficulties and failures. And we know that the systems can fail.
In fact, it’s not a matter of “IF” it fails, but “WHEN”, because for sure it will. So, we must be prepared for the fails when it happens.
It means to analyze every piece of the puzzle and ask yourself “what happens when this component fails?”. By doing this is possible to create a safety recovery plan and be ready to when it happens.
High performers also inject problems into their environment to test and improve the resilience of their systems. A good example of this is the well know Netflix’s simian army, a tool that randomly kills process and disable production instances, forcing the resilience plan to take place to make sure everything keeps working.
Leaders that reinforce the experimentation culture
Team leaders, managers, coordinators, they all have a critical role in the experimental and continual learning process. They must push their teams on the experimentation path, eliminate the blame and incite the search for new solutions.
“That’s the way I do it from last 15 years and it works.”
Yeah, I heard this argument once. But if we do things always the same, we cannot expect different result. We will have the same problems, the same delivery time.
There is no improvement without change. There is no change without experimenting. And there is no learning without failure.
I hope you enjoyed this article, as well the First and Second parts. I also highly recommend you to read the book The DevOps Handbook, by Gene Kim, which inspired me for this series and where you can go deep on the concepts introduced.