Baselines for Verification – Creating and Using them

Keith Redmond

Published Aug 20, 2024

Introduction

In the previous post we talked about how we can reduce the churn of bugs being introduced into the repository by using a Continuous Integration flow. This involved having a release flow, where each release would go out only after the code has passed a sanity test. When the release goes out, we create something called a 'baseline'. A baseline is just a way to set up a workspace to use a release. In this post, we’ll discuss our process for creating baselines and how syncing to baselines also greatly improves your ability to reproduce failures in regressions.

Creating a Baseline

There are many ways to implement creating a baseline, but I’ll talk about what is done on our team. Instead of directly calling repository commands to sync out a workspace, we have a script that we call to do the syncing for us. This script also lives in our repository and is managed by Jenkins (which runs our sanity tests). Each time a sanity completes successfully, Jenkins will update and submit the script that contains all the file versions used to run the sanity. When a team member wants to sync to the latest baseline (called the ‘head’ baseline), they can just sync the latest version of the sync script and run it – at that point they are guaranteed to have the same file versions as what has passed the most recent sanity.

Other environment setup like Linux modules to load (tool/library versions) and environment variable setup should live in a separate script that is part of the baseline. The reason for this is those items need to be done for every terminal that gets opened. Ideally the baseline script only contains information/operations that need to be done once to set up the workspace, then can be reused by any terminals you open.

Give each Baseline a Unique Number (or name)

We’ve talked about the need to sync to the head baseline, but it should always be possible to sync to any baseline that has ever been created. This is done by giving each baseline a unique number/name. In our implementation, since each baseline is just a revision of a sync script we use the changelist number for each submission of the script as the baseline number. When syncing to a non-head baseline, we can just sync to the version of the script at the time that the changelist was submitted to sync to the target baseline.

Baselines Improve Regression Reliability

One problem that can occur when launching regressions, especially if they are automatically launched, is that they can be taken down by a bad check-in. If something is checked in too close to the regression launch that breaks the compile or introduces a catastrophic failure in the regression, you can lose the whole regression. If you launch regressions nightly (for instance), this means you lose a day’s worth of results.

Conversely, if you always launch your regressions on a baseline, it is guaranteed to have passed a sanity which should mean that it will always at least compile and can pass basic testing. If a catastrophic bug is introduced right before a regression launch, that bug will not be allowed into the regression as the sanity would fail, so your regression has some level of protection against incorrect submissions.

Recommended by LinkedIn

Simple branching strategy

Jayant Kumar Yadav 9 years ago

Supercharging your Integration Test Strategy

Thiago Alves de Moraes 10 years ago

Struggling with Integration Test Failures? Here’s How…

Raj Jose 1 year ago

Baselines for Reproducibility

Since we normally work on constrained random environments, reproducing failures can be a challenge. Simply knowing the command to run a test isn’t always enough, even if you know what seed to use. This is because changes in the code base can affect randomization results or the behavior of your stimulus even when using the same command. This is particularly challenging when trying to reproduce hard-to-hit corner cases.

This is generally not a problem if you don’t need to recompile and can just reuse the simulation binary, but this means that the simulation binary must be kept until all failures have been debugged. Also it has the severe limitation that you cannot change any files and rerun, as changing a file requires a recompile – something you cannot do without also having access to the original runner’s workspace. This is problematic because often when debugging one needs to change files to add debug code or to rerun to test a fix to a given failure.

Another strategy is to re-use the workspace that launched the regression when doing the debugging. This approach does work but has the same limitation of being unable to modify files if you are not the owner of the workspace. Additionally, if the workspace is inadvertently changed (i.e. the owner syncs the workspace), the ability to reproduce is lost. Similarly, if the owner decides to add in debug code or test some fix, then the ability to reproduce failures can be lost.

Baselines provide an elegant solution to the problems above. Since all regressions are launched on a baseline, all team members can sync their workspace to the baseline that was used to run any regression. Even if the regression results are deleted and the launching workspace has changed, as long as you know the baseline to use and the command to run, you’re able to reproduce failures indefinitely.

Baselines for Sharing Workspace State

Although less common, from time-to-time it is helpful to have a way for team members to share their workspace state with each other. Baselines provide a convenient way to do that. If a workspace is synced to a baseline, then team members can pass along workspace state by simply providing the baseline number. This can be helpful if you need to get input from another team member on some issue (or perhaps a local change that you have) but need to continue to make changes in the workspace while waiting.

Conclusion

In this post, we’ve talked about how to create a baseline and the importance of doing so. The baselines not only protect our regressions and our team from bugs being introduced into the code base, but it also provides us with the ability to sync our workspace to the baseline used to run a regression or test to ease failure reproducibility.

In the subsequent blog posts, we’ll talk about regression reporting for nightly regressions, and a strategy that can be used to allow for indefinite failure reproduction with only the regression report.

Verification Viewpoints

743 followers

+ Subscribe

Pradeep P Santdasani 1y

Insightful!

To view or add a comment, sign in

Baselines for Verification – Creating and Using them

Keith Redmond

Introduction

Creating a Baseline

Give each Baseline a Unique Number (or name)

Baselines Improve Regression Reliability

Recommended by LinkedIn

Baselines for Reproducibility

Baselines for Sharing Workspace State

Conclusion

Verification Viewpoints

743 followers

More articles by Keith Redmond

Others also viewed

Utilizing Blue Green Deployments to Reduce Downtime

Transitioning non-technical teams to be technical – Part 4 - Measuring success and continued change

Structure Your Git Branching Strategy

Leveraging Compose Profiles for Dev, Prod, Test, and Staging Environments

Platform lesson #3: Balance architecture and continuous integration/test

Automaton of Testing and Deployment Process

Finding bugs faster: A smarter way to debug integration failures

Logging in test automation with NLog

The Era of Hybrid Testing

Why is automated test on z/OS more important than ever?

Explore content categories

Introduction

Creating a Baseline

Give each Baseline a Unique Number (or name)

Baselines Improve Regression Reliability

Recommended by LinkedIn

Baselines for Reproducibility

Baselines for Sharing Workspace State

Conclusion

Verification Viewpoints

743 followers

More articles by Keith Redmond

Dynamic Test Loading to reduce Compile and Runtime

VCS Save/Restore to Reduce Simulation Runtime

Partition Compile to Reduce Compile Time

Regression Infrastructure Overview

Regression Manifest

Verification Regression Reports

Continuous Integration for Verification

Shift Left with Sanity Testing

Verification Viewpoints: A New Blog Series for Today's Verification Expert

Others also viewed

Utilizing Blue Green Deployments to Reduce Downtime

Transitioning non-technical teams to be technical – Part 4 - Measuring success and continued change

Structure Your Git Branching Strategy

Leveraging Compose Profiles for Dev, Prod, Test, and Staging Environments

Platform lesson #3: Balance architecture and continuous integration/test

Automaton of Testing and Deployment Process

Finding bugs faster: A smarter way to debug integration failures

Logging in test automation with NLog

The Era of Hybrid Testing

Why is automated test on z/OS more important than ever?

Explore content categories