DevOps Stories – An Interview with John Weers of Micron
John Weers is Senior Manager of DevOps and Software Quality at Micron. He works to build highly capable teams that trust each other, build high quality software, deliver value with each sprint and realize there’s more to life than work.
Note – these and other interviews and case studies will form the backbone of our upcoming book “Achieving DevOps” from Apress, due out in mid 2019 and available now for pre-order!
Kickstarting a DevOps Culture
Some initial background – I lead on a team of passionate DevOps engineers/managers who are tasked with making our DevOps transformation work. While our group is only officially about 5 months old, we’ve all been working this separately for quite a while.
Kickstarting a DevOps Culture: About every two weeks we have a group of about 15 DevOps experts that get together and talk – we call them the “design team”. That’s a critical touch point for us – we identify some problems in the organization, talk about what might be the best practice for them, and then use that as a base in making recommendations. So that’s how we set up a common direction and coordinate; but we each speak for and report to a different piece of the org. That’s a very good thing – I’d be worried if we were a separate group of architects, because then we’d get tuned out as “those DevOps guys”. It’s a different thing altogether if a recommendation is coming from someone working for the same person you do!
We’ve definitely made huge strides when it comes to being more of a learning-type organization – which means, are we risk-friendly, do we favor experimentation? When there’s a problem, we’re starting to focus less on root cause and ‘how do we prevent this disaster from happening again’ – and more on, what did we learn from this? I see teams out there trying new things, experimenting with a new tool for automation – and senior management has been really friendly towards this.
Our movement didn’t kick off with a bang but more of a whimper. About 3 years ago, we came to this realization that our quality in my area of IT was very poor. Everything we were deploying was incredibly buggy as hell – we’re talking hundreds of defects. In another area, the issue wasn’t quality but time – the manual test cycle was so long, we’re talking weeks for any release.
You can tell we’re making progress with people’s conversations – it’s no longer about testing dates or coverage percentages or how many bugs we found this month, but “how soon can we get this into production?” – most of the fear is gone of a buggy release as we’ve moved up that quality curve. But it has been a gradual thing. I talked to everyone I could think of at conferences, including what was done at Microsoft and REI. But it took a lot of trial and error to find out what works with our organization. No one that I know of has hit on the magical formula right off the bat; it takes patience and a lot of experimentation.
Start With Testing: Our first effort was to target testing – automated testing, in our case using HP’s UFT and Quality Center platform. But there never was a all-hands-on-deck, clarion call from on high to “Do DevOps!” – that did happen, but it came two years later. We had to lay down the groundwork by focusing first on quality, specifically testing.
We’re three years along now and we are making progress, but don’t kid yourself that growth or a change in mindset happens overnight. Just the phrase “Shift Left” for example – yes we used that approach, for example with our manufacturing shop floor software, where we shifted the emphasis to unit testing and really shrunk the amount of UI testing we do. We found that it decreased our bugs in production by 90%.
We went through a few phases – one where we had a small army of contractors doing test automation and regression testing against the UI layer. Well, the quality didn’t budge, because of the he-said/she-said type interactions between the developers and QA teams in their different siloes. We tried to address issues with the interaction between different applications and systems with integration testing, and that was miserable. Then we reached a point where we realized the whole dynamic needed to be rethought.
So, we broke up the QA org in its entirety, and assigned QA testers on each of our agile teams and said – you guys will sink or swim as a team. Miraculously, our success with regression testing went up dramatically, once we could write tests along with the software as it was being developed. Once the team is accountable for their quality, they find a way of making it happen.
We got a lot of resistance and kickback from the developers, which was a little surprising. There was a lot of complaints when we first started requiring developers to write unit tests along with their code of it not being “value added” type activity. But we knew this was something that couldn’t budge – without unit tests, by the time we knew there was a problem in integration or functional testing, it would often be too late to fix it in time before it went out the door.
So, we held the line. Six months later, those teams that had a comprehensive unit testing suite were seeing very few errors being released to production. At this point, those teams won’t give up unit testing because it’s so valuable to them.
We found that “Shift Left” doesn’t mean throwing out all your integration and regression testing. You still need to do a little testing to make sure the user experience isn’t broken.
Culture and Energy are the Limiting Points: If you want to “Do DevOps” as a solo individual, you’ll fail. You need other experts around you to share the load and provide ideas and help. You’re only as strong as the group.
Can I start by saying, the tool is not the problem, ever? It’s always culture and energy. What I seem to find is, we can make progress in any area that I or another DevOps expert can personally inject some energy into. If I’m visible, if I talk to people, if I can build a compelling storyline – they make rapid progress. Without it, there’s nothing. It’s almost like starting a fire – you can’t just crumple up some newspaper, dump some kindling on it, light a match and walk away. You’ve got to tend it, constantly add material or blow on it to get something going.
So we’re spread very thin; energy and time are really limited, and without injecting energy things just don’t happen. That’s a very common story – it’s not that we’re lazy, or bad, or stupid – no we’re working really hard, but there’s so much work to be done we can’t spare the cycles to look at how we’re going about things. Sometimes, you need an outside perspective to provide that new idea, show a different way.
Lead By Listening: One of the base principles of DevOps is to find your area of pain and devote cycles into automating it. That removes a lot of waste, human defects, errors when you’re running a deployment. But that doesn’t resonate when I work with a team that’s new to DevOps. I don’t walk in there with a stone tablet of commandments, “here’s what you should do to do DevOps”. That’s a huge turn-off.
Instead, I start by listening. I talk to each team ask them how they go about their work, what they do, how they do it. Once we find out how things are working, we can also identify some problems – then we can come in and we can talk about how automation can address that problem in a way that’s specific to that team, how DevOps can make your world better. They see a better future and they can go after it.
Tools as Bait: I just said the tool isn’t the problem, but that doesn’t mean it’s not a critical part of the solution. Since we’re often dealing with software people and techies here, use it like we do – as bait to get the changes you want rolling. It’s a tough sell to walk into a meeting and pitch small and frequent releases, for example. But what if you talk about using Charm to track changes, LiveCompare to show if your changes are covered with testing, and UFT to handle your automation? What if we can show how these tools can shrink 6-8 weeks of testing into half that time? Now, you’ve got some attention!
About a year ago, our CIO set a mandate for the entire organization to excel at both DevOps and Agile. But the architecture wasn’t defined, no tools were specified. Which is terrific – DevOps and Agile is just a way of improving what we can do for the business. So at Micron you’ll see each team having a different tech stack and some variation in the tools based on what their pain point is and what the customers are needing.
The rule is that each main group in IT should favor a toolchain, but should choose software architecture that fits their business needs. For R&D, for example, the focus is on getting changes into production as fast as possible. This is the cutting edge of the blade, so automation and fast turnaround cycles are everything. For them, microservices are a terrific option and the way that their development happens – it fits the business outcomes they want.
There’s a misconception out there that DevOps and COTS are in different worlds. We’ve found them completely compatible. Just for example, with one business team we found that they commonly spent months testing packages of changes for SAP. We were able to shrink their release cycle by weeks, once we automated their pain points – testing with UFT, automating their stepforms, and using LiveCompare.
Do You Need the Cloud? They’ll tell you that DevOps means the cloud; you can’t do it without rapid provisioning and that means scalable architecture and that means massive cloud-based datacenters. But we’re almost 100% on-prem. For us, not just from a legal perspective but from an operating standpoint, we must keep our software, especially R&D, privately hosted. That hasn’t slowed us down much. It would certainly be more convenient to have cloud-based data centers and rapid provisioning, but it’s not required by any means.
Metrics We Care About: We focus on three things – lead time, cycle time – the two standards – and then a third metric we watch, that of production impact. We want to know the impact in terms of lost opportunity when the fab plant to slow down or stop because of a change or problem. That resonates very well with management, it’s something everyone can understand.
But I tell people to be careful about metrics. There’s never been a metric that we haven’t fallen in love with and pushed to the point of absurdity! We’ve dabbled in tracking defects, bug caps, code coverage, volume of unit testing, number of regression tests – and that’s inevitably led to bad behavior. Just for example, let’s say we are tracking and displaying volume of regression tests. Suddenly, rather than creating a single test that makes sense, you start to see tests getting chopped up into dozens of tests with one step in them so the team can hit a volume metric point. With bug counts – developers would classify them as misunderstood requirement rather than admitting something was an actual bug. When we went after code coverage, people would write unit tests that would bring the entire module of code under test and ran that as one gigantic block to hit their numbers.
So, we decided to keep it simple – we’re only going to track these 3 things – lead time, cycle time, production impact - and talk with teams individually and find out what their true quality was.
I’ve learned a lot about metrics over the years from Bob Lewis’ IS Survivor columns. Chief among those lessons is to be very, very careful about the conversation you have with every metric. You should determine what success looks like, and then generate a metric that gives you a view of how your team is working. All subsequent conversations should be around “if we’re being successful” and not “are we achieving the metric.” The worst thing that can happen is that I got what I measured.
PMO Resistance: Sometimes we see some resistance from the BSA/PM layer. That’s usually because we’re leading with our left foot – the right way is to talk about outcomes. What if we could get code out the door faster, with a happier team, with less time testing, with less bugs? When we lead with the desired outcome, that middle layer no longer tells us to get lost, because we’re proposing changes that will make their lives easier.
I can’t stress this enough – focus on the business outcomes you’re looking for and eliminate everything else. Only pursue a change if the outcome fits one of those business needs.
When we started this quality initiative, initially our release cycle was – I wish I was exaggerating – about 300 days. We would invest a huge amount of testing at every individual fab plant before we would deploy – and there are at least 15 of these deployment locations. Today, once a feature is complete it goes into beta at one site for a short period, and then it gets deployed at all our sites – in total, about 3 weeks. But that speed couldn’t happen unless our quality had gone up. We had to beef up our communication loop with the fab centers so if there was a problem we can stop it before it gets replicated.
The Role of Communication: You can’t overstate credibility. As we create less and less impact with changes we deploy, our relationship with our customers – the fab plant managers and IT staff – gets better and better. Just for example, three years ago we had just gone through a disastrous communication tool patch that had grounded an entire fab plant for nearly an entire shift, 8 hours. I came to a plant IT director a year later and told them that we thought the quality issues were taken care of and enlisted their help.
Our next deployment required 2 minutes of downtime. And that’s been the last real impact we’ve had on them during deployment for almost 3 years – now our deployments are automated and invisible to our users there. Slowly building up that credibility and a good reputation for caring about the people you’re impacting downstream has been a big part of our turnaround.
Cross-functional Teams: It’s commonly accepted that for DevOps to work you have to be cross-functional. Well, Micron is like many other companies in that we use a Shared Services model – we have several agile teams that include development, Ops and QA roles, an infrastructure team, and Operations which handles trouble tickets from the fabs – each with their own director. This might be a pain point in many companies, but for us it’s just how we work. We’ve learned to collaborate and share the pain so that we’re not hucking work over the fence.
For example, in my area every week we have a recap meeting which Ops leads, where they talk about what’s been happening in production and work out solutions with the dev managers in the room. We believe very strongly in that saying, “let people pull the organization, don’t push” – so we haven’t pushed any major reorg since our CIO initially set up the organization, and we haven’t had to break up the company into fully cross-functional groups.
Purists might object to this – we haven’t combined Development and Operations, so can we really say that we are “doing DevOps”? If it would help us drive better business outcomes, that org reshuffling would have happened. But for us, since the focus is on business outcomes, not on who we report to, our collaboration cross team is excellent. We’re all talking the same language, and we didn’t have to reshuffle – which would have been very disruptive and risky. The point is to focus on the business outcomes and if you need to reorg, it will be apparent when teams talk about their pain points.
If It Comes Easy, It Doesn’t Stick: Circling back to energy – sometimes I sit in my office and wish that culture was easier to change. It’d be so great if there was a single metric we could align on, or a magical technique where I could flip a switch and everyone would get it and catch fire with enthusiasm. Unfortunately, that silver bullet doesn’t exist.
Sometimes I listen to Dave Ramsey on my way in to work – he talks about changing the family tree and getting out of debt. Something he said though resonated with me – “If it comes easy, it doesn’t stick.” If DevOps came easy for us, it wouldn’t really have the impact on our organization that we need. There’s a lot of effort, thought, suffering – pain, really – to get any kind of outcome that’s worth having.
As long as you focus on the outcome, I believe DevOps is a fantastic thing for just about any organization. But, if you view it as a recipe that you need to follow, or a checklist – you’re on the wrong track already, because you’re not thinking about outcomes. If you build from an outcome that will help your business and think backwards to the best way of reaching that outcome – then DevOps is almost guaranteed to work.
References:
Bob Lewis’ IS Survivor is a great site that we enjoy as well, especially on process and change management. See http://issurvivor.com/
DevOps is an approach to software development that enables better collaboration between the development and the operations teams. Basically, it is a methodology that aims to bridge the gap between developers and operations during software development to facilitate a more productive and efficient workflow. Sample Copy : http://bit.ly/2TKsB0Q