Evaluate Software Quality through Analytics
Author: Pulkit Saxena
Introduction
Defects are the buzz word across IT industry since inception. In this article, I evaluate key performance indicators to measure software quality. How can we use concepts borrowed from Business Analytics and apply them to gain more insight from software quality metrics collected from past cycles?Is there any significance of using analytics? How can we gain insight and convert that insight into actionable actions? Can we predict the number of defects from past behavior? Can we capture human behavior through data? Can software KPI be linked to Performance evaluation? I will try to answer these questions in my article.
Strategic Assumption
The article presents inference based on data samples created to justify the theory. We encourage readers to use the findings & concepts mentioned in the article and validate against their real data set. I have used Rapid Miner to analyze sample data. Users can use tools of their choice R, Python, SPSS, etc.
Business Objectives
Management is continuously tracking key metrics to track software quality and take corrective actions based on the metrics collected. “Defect count” is the primary parameter which is collected and monitored continuously. Reports pulled out focus on defect count at a point of time, and if the defect count crosses management defined threshold, they divert their attention towards it and software developers are asked to fix the defects ASAP so that defect count is reduced to below the threshold.
Management thresholds are primarily based on expert judgment and based on past intuition.Below are the primary business objectives covered in my analysis
• Define optimal threshold for defect count.
• Resource optimization. Allocate resources effectively as time and cost always constrain resources.
- Identify quality problems and pinpoint affected sub-system.
Agile Methodology
Agile is the latest trend in software development. Agile is based on Sprints. Once sprint completes, code from sub-system is released to Integration.
Integration branch is the central code repository where different teams push their code for the developed feature. Teams usually work on child branches of parent code repository, and once feature is developed & tested in sprint cycle, code is pushed to the central repository and is available to different teams as part of the entire application.
These sprints continue until the final release is completed. At the end of the release, you have working software. Once the release is rolled out in production work for next release begins.
Testing is done during the sprint itself and any defect opened in the Sprint cycle is known as “Sprint Defect.” Once the code is pushed to integrated branch, any additional bug at a later stage is called “Defect.” These are the bugs which could not be captured in Sprint cycle and are part of defect leakage. In the ideal world, defect count should be zero, but that’s far from reality.
Hence management decides a limit for the number of defects opened. This limit acts as gatekeeper and developers can continue to develop new feature until the count is below the limit. But once the threshold is crossed, defect fixing becomes the priority and resources are diverted to defect fixing from development.
Management target to achieve software quality is to reduce the number of defects opened at later stages.
Defect opened at later stage are COSTLIER to fix than in the beginning.
Fig : Desired Defect Pyramid
Entire focus should be to catch as many defects possible during sprint cycle. Defect opened at later stages attract attention and are costly to fix. Bugs in production may even impact company bottom line.
Fig : Undesired Defect Pyramid
Quality Metrics
Defect count has always been the one major software metric which is favorite of Managers. Does defect-count at a point of time shows the entire picture? By looking at this metrics alone, we constrain ourselves and fix only the symptom and may lose the opportunity to identify “undesired defect pyramid.”
Business Analytics
Analytics techniques can help us gather insight from the metrics calculated over a period. Predictive analytic models can help us predict future. For my analysis, I have used linear regression to define a model which will help me to achieve business objectives.As per the workflow defects not caught during sprint stage unearth at integration stage when the code is pushed to integration branch from team branch.
So, our linear model equation is
Y=mx+b
y = defect count at integration stage
x = sprint defect count
Applying Regression analysis on the past data comprising of defect counts and sprint defects we can identify the value of m and b.
Coming back to our business objective: Define optimal threshold of number of defect count.
Our Regression model gives us the optimal threshold i.e., the value of b. If there is no sprint defect, i.e., x is zero, then the number of defects at integration stage will be given by the value of b. This is the optimal value of threshold which management can use to re-define their threshold defect count.
For my analysis have sample data for three teams, and regression model data for teams are as follows. Say management defines the threshold of 15 defects at integration stage
Analysis for Team A
Regression model shows that for Team A, intercept is below management defined threshold of 15 . As co-relation coefficient is negative i.e. -.44 , we can infer that team is working to achieve desired pyramid
Y = -.44 * x + 6.
So if -.44 * x = 6 , we will have no defect at integration stage . This is an ideal condition but say there are 10 defects opened at sprint, we will have y = 1.6 . We can infer that at integration stage predicted value of defect count is 2 defects and hence team is on the way to achieved desired inverted pyramid.
Analysis for Team B
Data shows that intercept value is 30 which means that defect count is above threshold requiring management to re-align resources. Prescription action would be a call for retrospective to do root cause analysis of defects or deploy more number of developers. And that’s how we achieve our second business objective
Resource optimization. Allocate resources effectively as resources are always constrained by time and cost.
Analysis for Team C
As the co-relation coefficient is positive it means that the number of defects at integration stage will grow as the number of sprint defects. This is a red alert and management should deploy the best developers and testers to identified sub-system. If the similar trend continues, problem may be more complex. Investigation needs to be done as to why defect leakage is increasing and more defects are opened at integration stage instead of sprint stage. Hence, we achieve our 3rd business objective
· Identify quality problems and pin-point affected sub-system.
Conclusion
What I say will you like to fight blind or have some weapon to win the battle? Analytic Regression modeling gives us the power to analyze and predict the impact. IT industry is dynamic, using analytics to back your decision will give higher profits and credibility. Data speaks for itself.
Usually, when the project begins, and the team is new, the number of defects is higher, but as the team gains, experiences the number of defects has to go down. So, if you want to capture this also in your analysis you can define another variable as performance variable, and set values say 1 for new hires and 5 for experienced and use that in your regression equation modeling. You can also normalize data to remove outliers from your analysis to define the threshold.
Great article Pulkit !! Use of Anatyics at early stage of software testing make lot sense and will reducesthe time to market too .