Understanding Software Code Stability using ODC (Orthogonal Defect Classification) and Defect Severity

Understanding Software Code Stability using ODC (Orthogonal Defect Classification) and Defect Severity

In an earlier post here on LinkedIn, I advocated for a defect classification schema that included a category called “Requirements/Design/Code” (“Code” defects hereafter) which I defined as any defect where a code change was made to fix the defect (even if a requirement and/or design element might also have been changed as well, either as a formal change request or not).

In a future post, I’ll talk about how Orthogonal Defect Classification can be used to diffuse SDLC tensions between teams relative to why and how Code defects occur, in addition to productively identifying the right corrective actions for continuous improvement across the entire life cycle.

As a first step in that direction, we would start by classifying the kind of Code defect fix made using Orthogonal Defect Classification categories for “Defect Type”, which will provide the basis for a number of follow-on analyses that tell us useful things about what is going well (and not so well) in our SDLC processes. 

The first of these analyses with respect to “Defect Type” is using the defect information to gain insight into the functional stability of the system code being tested. This information, along with a similar analysis of Defect Severity, is useful for providing teams an empirical way to demonstrate to project stakeholders that a software product or system is or is not ready to exit the test phase/move to production.

One of the major advantages to this approach is the common-sense based logic underpinning it. While data scientists may have other much more sophisticated techniques to answer the question is our software really ready for prime time? this method is very easy to model for the non-data scientist working in testing (requiring no special tooling beyond moderate Excel skills), and very easy for potentially more math-challenged business stakeholders to comprehend in making important Go/No Go determinations relative to releasing software into production.

How to Classify Code Defects using ODC:

Below is a summarized version of the ODC “Defect Type” classification schema (available for free download in original form which also includes examples from IBM Research here).  I’ve also augmented the definition information to include some additional important characteristics of these defects:

An Example of a Very Unstable System (Using ODC “Defect Type”):

Below is a fictional representation of a test phase that has the wrong kinds of proportions and trends with respect to system stability. It is not realistic in that, it would be extremely unlikely that any project would survive to the end of the phase without a significant replan if these patterns and trends were materializing, if only because the impact to the defect backlog and the blocking impacts to test progress would almost certainly be too significant to otherwise absorb with normal contingency planning. 

In a real project, the trends measured (shown here as weeks as an example, but they can also be measured in days or months, or any other periodic basis that provides a meaningful way to observe trending over time relative to volumes for the project) would almost certainly not be this extreme or uniform; I created the data in this way to make the interpretive information you should look for as easy as possible to see.

Additionally, this example also assumes functionality to be tested is NOT delivered incrementally over the timeframe measured, but rather is available from Week 1 forward.

Why These Results are Alarming:

1) Overall defect volumes, as well as Code Only defect volumes, are consistently increasing over time. In a stable system (with respect to both code, and defects overall), the rate of defect arrival from the beginning of the test period to the end should steadily decrease over time, not increase as it does here. Logically, as more and more defects are uncovered and the code and overall system become less and less error-prone over time, defect volumes, both overall and in terms of code defects specifically, should decrease. But here, the opposite trend is occurring, suggesting that as the test plan progresses from the testing of individual functions early in the test phase, to more integrated functions, and finally, end-to-end tests towards the end of the phase, the system as a whole is not stabilizing.

2) Function/Class/Object defects (highlighted and shown in red above) occur persistently, are steadily increasing over time, and occur in volumes that are too high overall (16% of the code-only defects). This means, even though the functions that were planned to be tested in this example were delivered in Week 1, significant rework of those functions was required (and/or, some functions that should have been present weren’t) as testing progressed through time. This kind of pattern may also occur in situations where requirements were poorly defined and/or significant changes were introduced to the requirements/design up to and throughout the testing period measured. This metric can be very useful in illustrating to stakeholders the real impact of constant requirement/design churn in the form of defect rates and volumes, which in turn tend to result in an unstable system overall that would be very high risk to exit test and/or move to production as is without further actions taken to mitigate risk.

3) More “sophisticated” defects are not uncovered at all.  As the basic functions of a system stabilize over time with respect to defects, Interface/O-O Messages, Timing/Serialization, and Relationship defects are then able to surface (they are typically difficult to uncover until the basic functions are largely performing error-free). The complete absence of any of these kinds of “sophisticated” code defects as shown in this example strongly suggests additional code defects likely still remain in the code upon exit.

An Example of a System That Has Stabilized (Using ODC “Defect Type”):

Below is a fictional representation of a test phase that has the right kinds of proportions and trends with respect to system stability. Like the first example, it is also not realistic in the uniformity of the trends over time but again; I created the data in this way to make the interpretive information you should look for as easy as possible to see. 

Note: this example also assumes functionality to be tested is NOT delivered incrementally over the time frame measured, but rather is available from Week 1 forward.

Why These Results Are Desirable:

1) Overall defect volumes, as well as Code Only defect volumes, are consistently decreasing over time. Logically, as more and more defects are uncovered and the code (and overall system) become less and less error-prone over time, defect volumes decrease, indicating that as the test plan progresses from the testing of individual functions early in the test phase, to more integrated functions, and finally, end-to-end tests towards the end of the phase, the system as a whole is stabilizing.

2) Function/Class/Object defects are steadily decreasing over time, and occur in volumes that are small overall (4% of the code-only defects) — and they do not occur at all after about the midpoint of the test phase. This means, very little rework of entire functions was required as testing progressed through time, and the rework that was needed for these functions was discovered and fixed early in the test phase.

3) As the code becomes more and more stable over time as evidenced by 1 & 2 above, more and more “sophisticated” defects are able to surface. In this example, we see the 3 kinds of more “sophisticated” defects (Interface/O-O Messages, Timing/Serialization, and Relationship) occur in increasing numbers as the Function/Class/Object defects disappear over time. While we would not expect these defects to occur in significant frequencies overall in most systems (with the exception of Interface/O-O message defects in more complex integration settings) it is a good sign that the code has been thoroughly exercised successfully functionally (and is therefore reasonably stable) when we find at least a small number of them prior to moving to production. 

An Example of a Very Unstable System (Using Defect Severity):

Using the same fictional data from the first example, below is how a very unstable system might look from a Defect Severity perspective. Again—the patterns are not meant to be realistic in that they are too extreme and uniform, but I created the data in this way to make the key characteristics as easy as possible to see:

Why These Results are Alarming:

1) Overall defect volumes, as well as Code Only defect volumes, are consistently increasing over time. In a stable system, the rate of defect arrival from the beginning of the test period to the end should steadily decrease over time, not increase as it does here, as previously explained in the first example using ODC “Defect Type”.

2) Severity 1 defects are climbing consistently over time (and significantly jump during the last week measured) – and overall volumes are quite high at >50%. Obviously, it would be extremely risky to attempt to exit a test phase and/or move to production with a defect arrival pattern relative to volumes and Severity levels like this one, and it should be expected that significant defects (from all severities) would escape without additional testing.

3) Severity 3 & 4 defects never surface at all. In a functionally stable system, we would at least expect some number of low severity defects to be uncovered, particularly by the last half of the test period measured. But in a situation where Severity 1 defects are so high in overall proportion, and only climbing over time, testing is effectively overwhelmed with more serious problems such that lower severity defects are unable to surface.

An Example of a System That Has Stabilized (Using Defect Severity):

Using the same fictional data from the second example, below is how a system that has stabilized by the end of testing might look from a Defect Severity perspective. Again—the patterns are not meant to be realistic in that they are too extreme and uniform, but I created the data in this way to make the key characteristics as easy as possible to see:

Why These Results Are Desirable:

1) Overall defect volumes, as well as Code Only defect volumes, are consistently decreasing over time. Logically, as more and more defects are uncovered and the code (and overall system) become less and less error-prone over time, defect volumes decrease, indicating that as the test plan progresses from the testing of individual functions early in the test phase, to more integrated functions, and finally, end-to-end tests towards the end of the phase, the system as a whole is stabilizing.

2) Severity 1 defects are steadily decreasing over time, and occur in volumes that are moderate rather than extreme (27% of the code-only defects) — and they do not occur at all in the last two weeks of the test phase. While the overall proportion of Severity 1 defects can vary widely from project to project for a variety of different reasons, the overall proportion of Severity 1 defects is not extreme (<50%). But, more important than the proportion (as long as it is not extreme) is the consistent decreasing trend over time, and the fact that they disappear entirely prior to the end of the test phase.

3) As the code becomes more and more stable over time as evidenced by 1 & 2 above, more and more low severity defects are able to surface. In this example, we see Severity 3 & 4 defects begin to surface near the midpoint of the test phase, and continue to grow in proportion over time until in the last 2 weeks measured, they collectively comprise >50% of the defects uncovered. While we would not expect these defects to occur in significant frequencies overall in most systems under most circumstances, it is a good sign that the code has been thoroughly exercised successfully functionally (and is therefore reasonably stable) when we find at least some number of them prior to moving to production.

Need help transforming your SDLC processes and/or analytics? Contact me today at SESmithConsulting@outlook.com

Bad design, bad understanding of the requirements, and implementing it without proper impact analysis and focusing on to fix the local issue are mostly the cause of introduction of potential bugs. So there should be always cross-cutting concern analysis followed by best practice(which mostly fails due to tight timeline of delivery and ignorance to follow the practice).

To view or add a comment, sign in

More articles by Susan Eileen Smith

Others also viewed

Explore content categories