Constructing a DevOps Tactical View using Balanced Scorecard
If you are responsible for justifying investment in implementing a DevOps best practices, then this article is going to be of interest to you.
One of the first and an extremely important problem that every CEO must deal with is an economic one: Why should the company invest?
That question really was asked of me some weeks ago. My immediate and deep answer was "DevOps main goal is to improve velocity and decreasing lead time". But to my surprise, my CEO told me: "We're satisfied with our current velocity and lead time. It's not urgently necessary to deliver faster". And now? What could I tell him? Why DevOps?
I froze for a few seconds and then I told him something about quality improvement, but he cut me off and said that I really needed stronger and more structured arguments, without which he wasn't able to approve my project. But, by Lord of Light, he gave me a tip: Figure out how to use BSC in your project.
After reviewing my old material about Balanced Scorecard that I learned in my MBA 7 years ago, I remembered that it typically consists of four high-level perspectives that highlight key practices and objectives that are needed to build a strategy.
http://www.balancedscorecard.org
OK. I thought "It will be easy, I just need to put some indicators inside each perspective". And then I start thinking, thinking and thinking. But for a while I couldn't figure out what DevOps KPIs (Key Performance Indicators) which were not related to the improvement of velocity and lead time (example: deployment frequency).
After searching a lot on Google, I unfortunately didn't find much and then I started to figure out the answer by myself.
Firstly, I start to think about doing Customers happy and how DevOps practices could collaborate to them. So, I've found some answers in the classical triangle of project management: Quality, Scope, Time and Cost.
Typical Triangle of Project Management
After adopting these concepts as my main line of thinking, I defined some questions to answer to:
- How DevOps could collaborate to Quality?
- How DevOps could collaborate to delivery of Scope on Time?
- How DevOps could collaborate to decrease cost?
When I am thinking about Quality, I often think about a stable system without bugs, or with less bugs. To ensure a system with less bugs, you know, we need to test it a lot. And to do that, we need environments like production and automated integration tests. Thinking with a DevOps mindsset, I could figured out my first KPI:
EAI - Environment (DEV and QA) availability index
This index defines the availability of environments like production to our DEV and QA. There is at least 3 basic factors that influences environments availability, so I decided to make it a composed index within:
- TTT - Time To Trunk
Continuous Integration says that you must integrate as soon as possible, so, it's a measure of how long a feature created on some feature branch take to appear on a trunk branch.
- TPEPC - Time provisioning environment per cycle index
TPEPC defines how much time we spend by "lead time" provisioning environments for developers and for QA Team. Suppose that your "lead time" is 2 months, and your QA team are speeding 3 weeks provisioning environments. I guess that your QA team aren't having enough time to do good tests.
- AADI - Application Automatically Deployable index
AADI is a measure of how many applications you could deploy with a click. Imagine you have 200 applications, and you can deploy only 10, so, your AADI is 10/200 = 0,05.
- AADT - Average Application Deployment Time
AADT is how long is average time to deploy applications. After you click on "deploy button" or your Jenkins schedule is triggered, how long a deployment take to finished.
So, EAI is now composed of TPEPC, AADI and AADT. But, EAI didn't handle automated tests, so we need another KPI:
PATI - Performing Automated Tests Index
PATI is about if automated tests are really being executed. If our automated tests must run in every night continuous integration build, I measured 30 days and I found only 4 executions, PATI will be (4/30) *100 = 13,33%.
OK. These KPIs are all about Quality. But I need to answer the "Scope on Time" question too.
DOI - Development Operational Index
DOI is a measure of how DevOps influences in the operational stuff of our developers. Given that, it's directly related to productivity it should be composed by others index:
- ACT - Average Construction Time (build)
It's how long a build takes to complete. (You can get it from Jenkins average build time).
- TSIVC - Time spent on issues with version control
It's how long our developers are blocked with source control version issues. (You can get it from your DevOps/Tools Support Team, don't ask your developers team to add more one timesheet appointment).
- TSICI - Time spent on issues with continuous integration
It's how long our developers are blocked with continuous integration issues. (You can get it from your DevOps/Tools Support Team).
- NNCDOP - Number of non-compliance in DevOps practices
It's gathered from an audit process. Obviously, at first you need some DevOps procedures defined, like source control branches patterns, merges procedures, CI integration patterns, release and delivery procedures, ...)
So, with these 4 metrics we now have our DOI KPI.
Looking on Balanced Scorecard, I quickly can identify that EAI, PATI and DOI could be put inside "Internal Business Process perceptive" . All of them linked to improvement of "Quality" or improvement of "Scope on Time" in a "Customer perspective" .
My next step was came back to think about Customers happiness and thinking if there's another DevOps KPI that directly matches "Customer perspective" of BSC.
OK, I thought that our customers would be happy if our downtime during version upgrades was as lower as possible. So, I defined:
ASDDI - Average system downtime during upgrade
This KPI is not directly linked to quality or developer productivity, instead, it's related to Ops side. How effective are our Ops?
ASDDI is very important, because if the system is down, stakeholders cannot used it. Another point is, if ASDDI is high, and a unexpected issue occurs, the downtime could increase exponentially. Image that your ASDDI is 4 hours and a unexpected issue like a deploy problem raised. How long it will be fixed? And your upgrade retry time? So, you must keep your ASDDI as low as possible.
Finance perspective
I have in mind that the "Finance perspective" of BSC should be defined by the Board, so DevOps could not have a directly KPI in that perspective, but all DevOps KPIs must be aligned with key objectives of "Finance perspective". I defined a hypothetical and simplified view, considering that financial objectives consist of "Decreasing costs" and "Revenue Growth".
Learning and Growth perspective
Now, only "Learning and Growth perspective" is missing. The typical objective of this area is handle "Employee capabilities" and "Employee Satisfaction". How DevOps could collaborate with that? If DevOps practices were really improving our Developers, QA and Ops Team, I would supposed that they are very happy with all our DevOps collaborative approach and best practices. If there is a direct relation between DevOps and happiness, we can measure it using a TSI - Technical Satisfaction Index.
TSI could be gathering using a anonymous form questionnaire, something like a "Health Team Score" could be used too.
And finally, after defining all these KPIs I put each one into it's perspectives.
Now, DevOps Balanced Scorecard should look like:
And it's finished. At least a first scrawl. Now, the challenge is figure out which is our improvement actions to keep each one of them in a green state.
I hope this article had inspired you to take another look at what KPIs you are using to measure DevOps on a Tactical point of view and how you could use them with BSC.
So, make your DevOps awesome, use BSC and shape your culture.
Very informative post (y)
Hi there Denny, That's really a great article. I think the way you proposed the KPI's is interesting since most of them are related to time. Therefore, all of those that have a time component can be directly translated to a financial perspective if we use the Cost of Delay concept. Of course, defining the cost of delay itself is very hard. It can be measured from the customer's perspective (ideally) but it also can be measured or have its cost compounded by the internal company's perspective, like loss or revenue due to delays for example. Maybe a shallow analysis can be made, however a more comprehensive analysis should include the variation of cost over time, like depreciation of value, interests and others. My main concern about your proposition though is regarding the automation to gather this data. It's essencial to have the latests perspective up to date all times. Congrats for the job. All the best, Samuel Crescêncio
Great article !