Tests for randomness - Frequency based
In the previous few posts we talked about various flavours of random number generators particularly those that are built of hardware and those of software. In this post we will discuss various ways to test for randomness or rather test for entropy generated by these random number generators using frequency analysis.
There are a large suite of statistical tests to evaluate randomness. NIST provides a very comprehensive overview of these. These tests can be used to give you a certain level of confidence with the quality of a random number generator. Note however statistical confidence cannot be used as a replacement for cryptanalysis.
Most of the statistical tests use the following terminology
- Null Hypothesis: For the purpose of randomness testing, the null hypothesis under a test would state the sequence being tested is random.
- Alternative Hypotheses: As the name would suggest, for randomness testing the alternative hypothesis would state that the sequence being tested is not random.
- p-value: The probability (under the null hypothesis of randomness) that the chosen test statistic will assume values that are equal to or worse than the observed test statistic value when considering the null hypothesis. The p-value is frequently called the “tail probability.”
Given the above terms, let us explore how frequency based tests would work. Frequency tests usually measure the frequency of 0's and 1's in a random bit stream. One such test is the monobit test. This purpose of this test is to determine the closeness of number of 1's in the bit sequence to 1/2 the length of the sequence. So the number of 0's should be approximately equal to the number of 1's for the stream to be considered random. The python snippet below shows how one would test for the same.
def monobit(self, bitstream: str): """ :param bitstream: a binary string :return: the p-value from the test """ count = 0 # If the char is 0 minus 1, else add 1 for char in bitstream: if char == '0': count -= 1 else: count += 1 # Calculate the p value observed = count / math.sqrt(len(bitstream)) p_val = spc.erfc(math.fabs(observed) / math.sqrt(2)) return p_val
This test typically is the first amongst most tests applied for random number generation. If this test fails then other randomness tests need not be applied. A variant of this test is the block based frequency test where the test is applied to blocks of size M (with M = 1, this is simply the monobit test).
Food for thought: Is it possible to craft a bit stream which will pass the frequency tests despite of not being randomly generated ?
What about a string that uses 0 for even indices and 1 for odd? Maintains the right distribution but generated via an algorithm with the next number always predictable.