Are you testing your test?

Are you testing your test?

In System on Chip (SoC) verification it is important to determine when verification is done and the design is ready to tape-out. There are many ways to measure when SoC verification is complete. One of the most popular ones is different forms of structural and functional coverage. It has been used in simulation and Formal Property Verification (FPV) for many years and is what many teams base their decision to tape out a chip on. While coverage is useful, it doesn’t answer the real question: “Are there any more bugs in the design?” and if there are, will my test or set of assertions find them? This is especially important in Formal Property Verification because all assertions being proven looks like verification is complete.

Covered != Verified

The problem with coverage is that 100% coverage doesn’t mean 100% correct or even 100% verified. Having achieved 100% RTL line coverage in simulation for example, means that all lines in the RTL have been executed during the test. But executed does not mean that they necessarily are correct. In FPV, the corresponding metric is structural Cone of Influence (COI) coverage. Any logic that is not in the COI of an assertion (P in the picture below), is clearly not tested. As with simulation coverage, the opposite is not true. Just because some part of the design is in the COI of an assertion doesn’t mean it is correct. Logic that is in the COI may have been optimized away by the formal engines and are thus not verified.

A better metric to determine what is verified in Formal Property Verification is Formal Core coverage. The Formal Core of a property is the union of all the registers, inputs and constraints that are necessary to prove the assertion. This is much more accurate than structural simulation coverage metrics. The logic is not just executed; it impacts the proof or falsification of the assertions. When you achieve 100% Formal Core coverage, does it mean you are done and there are no bugs in your design? Unfortunately, not.

For example, the design below has a counter that can be loaded with a value from input pins when the load signal is asserted, otherwise it counts up. An assertion to verify that the counter register takes the correct value when the load signal is asserted will cause the counter to be covered. However, this doesn’t cover the many other aspects of the counter behavior that needs to be verified, for example: is the reset value correct and does it roll over or stop when reaching its maximum value. 

module counter (input clk, reset,            input load,            input [7:0] load_data,            output reg [7:0] cnt_value);

always @(posedge clk) begin
  if (reset)
    cnt_value <= 0;
  else if (load)
    cnt_value <= load_data;
  else
    cnt_value <= cnt_value + 2;
end

check_load: assert property (@(posedge clk)  load |=> cnt_value == load_data);

endmodule

Common for all coverage is that it measures if the stimulus in simulation reach all logic in the design and if all logic in the design are affecting the outcome of the assertions in FPV. This says nothing about the correctness of the behavior of the design and it isn’t answering the question: “Are there any more bugs in the design?”. To do that you must analyze the effectiveness of the assertions. You must test your test.

Testing the test

Another way to look at verification coverage is that 100% coverage means that any bug will be found by the test or verification environment (assertions). You want to know that if there are bugs in the design, they will be found. How do you know this is the case?

One way of course is to insert a bug, also known as a mutation, yourself and see if any of the assertions fails. If they do, great! The test found the bug. If no assertion fails, you have a hole in your formal test environment or set of assertions. This means you must add some assertions or improve the existing ones.

Manually inserting bugs to assess the quality of the test environment works but quickly runs into problems. Where do you insert bugs? What type of bugs to insert? It is likely that you are going to insert bugs in parts of the design you are familiar with i.e. where the assertions are so you are still likely to miss bugs and you will probably only insert a limited type of bugs.

Automatic Mutation Testing

A much better method is to use a tool such as Formal Testbench Analyzer (FTA) to insert hundreds or thousands of artificial bugs (mutations) in your design and see if any of the assertions fails. Examples of mutations inserted in the design are: changing operators e.g. a + to a -, change a for loop to start at a different value or increment by a different value or making an if statement branch dead. The design in the example above incorrectly increments by 2 instead of 1 but the verification environment does not detect it. By inserting a mutation on the line “cnt_value <= cnt_value + 2;” we easily find unverified areas of the design. When manually inserting mutations in the design, you need to verify all assertions for every mutation inserted to ensure it is causing at least one assertion to fail. FTA on the other hand will track the effect of every mutation so you can check all assertion and all faults in one run resulting in orders of magnitude faster run-time.

Summary

Coverage is a necessary but not sufficient metric to determine if SoC verification is complete. It determines if there is logic in the design that is not tested but it doesn’t determine if bugs in the design will be found by the test environment. You also need to test your test. Only by using a thorough automated bug insertion (mutation) methodology can you be sure that your formal testbench (assertions) will test your entire design and you will be ready for tape-out.

how to determine "completeness" of the set of mutations w.r.t. Its ability to establish 100% test effectiveness in detecting errors, i.e. to establish freedom of errors.

Like
Reply

To view or add a comment, sign in

More articles by Anders Nordstrom

Others also viewed

Explore content categories