How can Lean support DevOps Continuous Flow?
Lean is the core tool for optimising flow in mass production manufacturing. From bottles to cars to chocolate bars, huge numbers of finished goods continuously flow in near perfect processes. What can people in IT learn from this approach? Specifically how can Lean support DevOps Continuous Flow?
This article is about how to design a system of work that will speed delivery. Essentially, how can Lean support making the software delivery life-cycle run faster. I'm going to cover a tiny bit of the Lean approach. I'm being selective as I think these are the most relevant parts. But if it's of interest then you can dive into an ocean of new knowledge that you might well find of value.
Let's look at a simplified IT scenario that follows the journey a new product feature takes from idea to production. Look out for the amount of time nothing productive actually happens:
It took 20 Days to go from the initial idea to go-live. Where did all the time go? Of the 160 hours available only 16 hours were spent by people working on the feature. What about the other 150 hours when nothing happened? Does this mean people are lazy and should be made to work harder and do longer hours? Of course not, they were working on other things and had full-on days juggling different priorities.
Let's look at the business process. It's got six steps with different people handling each step i.e. business analysts, developers, testers, devops etc..
Now let's add the time each step takes just underneath. There well be a lot variation here depending on the feature being developed. But for now lets plug the numbers in to understand the core concepts. Here we see the core process takes 16hrs start to finish to implement the feature.
This means if you booked a meeting room for 2-days and got the whole team involved so they were 100% focussed on this one feature you could deliver it in 2-days. Each person would immediately pass the baton to the next person who would immediately do their work. There would be no "dead time" waiting for the next person to be free. If all restrictions were removed, this process could deliver in 16 hours. But it doesn't. It took 20 Days.
It doesn't because there are lots of other features being worked on and this is just one of many. Each team has their own Queue of work waiting for them. There can be a lot of work in people's Queues and they need to clear the backlog before they can look at this new feature request.
Imagine you are at the airport and waiting in line for passport control. It takes but a moment when you are at the desk. But you could be waiting for hours and hours in the Queue for your turn.
Let's add these Queues into the business process diagram. Let's put the number of items in the Queue just to the left of the process step. Also we can calculate how long it will take to clear the backlog. It's just the Number in the Queue * Process Step time. Here we start to see where the Dead Time is in the Queues before the feature can be worked on.
Recommended by LinkedIn
To make a process run faster, you don't start by looking at improving the work people do and asking people to work harder. To dramatically speed a process you look at where time is lost where nothing happens. The time spent waiting in a Queue. About 90% of the time was lost waiting in a Queue before it got looked at.
The biggest backlog here is at the deployment step with 70 items waiting to be configured for deployment resulting in a 70hr delay, just over a weeks wait. Both Development and Testing have about 3 days of work to complete before they will be able to get to work on the new feature. But its going to take over 4 days for the feature to join the Development Queue as it needs to sit on the earlier Queues before it's processed.
The first radical idea is you look to have empty Queues. Nothing in your backlog anywhere. Does this mean people are going to be paid and won't have a permanent Queue of days worth of work in backlog? Yes. The whole team will be poised waiting for something to appear in their Queue. When a new work item appears they are immediately on it and its done.
SLA's go out the window. Committing to reply within 3 working days becomes a thing of the past. Work arrives and is processed immediately. Is this really possible? Yes. But it's not easy.
So you stopped pouring new features into the process and after a couple of weeks everyone's been able to clear their backlog in the Queues. Let's start feeding new features into the process at the rate of one every two hours. That's the time it takes to complete the first step of the process. What does it look like after 10 new features have been put into the process and 20 hours have elapsed?
The good news is two features have been deployed. That is great! But look at the Queues, they've just filled-up again. There is a load of work waiting at Refinement and Development but the testers have got nothing to do! We've climbed the ladder and slid straight back down the snake. What's gone wrong? Empty Queues was a mistake, back to the old ways of working!
The second radical idea is all the process steps should take the same amount of time. Prioritisation is running 4 times faster than Refinement. So the work immediately stacks-up again. The rule is the process can only run as fast as the slowest step. Here it's Development which takes 6 Hours. How to do you synchronise all the steps to take the same amount of time?
You could hire more developers. Doubling capacity could take the step time from 6 to 3 hours. This would make Refinement the slowest step at 4 hours and would be the new blocker. This approach helps but there will always be a slowest step somewhere in the process. You should look to synchronise but it's difficult to get just right.
The third radical idea is to change the rate at which you put new features into the process. This rate should be the slowest process step time, here Development. So add new feature requests at the rate of one every six hours maximum. That is the capacity of the process. Want it to run faster then look at the slowest running step. Because all the queues are empty nothing is ever blocked. So if you put 10 new features into the process, one every six hours, after 16 hours the first feature will appear in Production. Then a new feature will appear in Production every 6 hours after that, like clockwork, continuous flow.
To summarise
Thanks, this all makes perfect sense - referring to my original question about tracking 'time taken' I'd still suggest that this can be achieved by monitoring queue length and adjusting appropriately. But other than that - absolutely - changing pace to that of the slowest step is the key to throughput velocity. Once you've done that you can look at improving the pace of that slowest stage.