What are random numbers ?

*Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.*

*John Von Neumann, 1951*

Although it may look simple at first sight to give a definition of what a random number is, it proves to be quite difficult in practice.

A random number is a number generated by a process, whose outcome is unpredictable, and which cannot be subsequentially reliably reproduced. This definition works fine provided that one has some kind of a black box – such a black box is usually called a random number generator – that fulfills this task.

However, if one were to be given a number, it is simply impossible to verify whether it was produced by a random number generator or not. In order to study the randomness of the output of such a generator, it is hence absolutely essential to consider sequences of numbers.

It is quite straightforward to define whether a sequence of infinite length is random or not. This sequence is random if the quantity of information it contains – in the sense of Shannon’s information theory – is also infinite. In other words, it must not be possible for a computer program, whose length is finite, to produce this sequence. Interestingly, an infinite random sequence contains all possible finite sequences. Such an infinite sequence does for example contain the Microsoft Windows source code or the text of the Geneva conventions. Unfortunately, this definition is not very useful, as it is not possible in practice to produce and process infinite sequences.

In the case of a finite sequence of numbers, it is formally impossible to verify whether it is random or not. It is only possible to check that it shares the statistical properties of a random sequence – like the equiprobability of all numbers – but this a difficult and tricky task. To illustrate this, let us for example consider a binary random number generator producing sequences of ten bits. Although it is exactly as likely as any other ten bits sequences, 1 1 1 1 1 1 1 1 1 1 does look less random than 0 1 1 0 1 0 1 0 0 0.

In order to cope with this difficulty, definitions have been proposed to characterize “practical” random number sequences. According to Knuth [1], a sequence of random numbers is a sequence of independent numbers with a specified distribution and a specified probability of falling in any given range of values. For Schneier [2], it is a sequence that has the same statistical properties as random bits, is unpredictable and cannot be reliably reproduced. A concept that is present in both of these definition and that must be emphasized is the fact that numbers in a random sequence must not be correlated. Knowing one of the numbers of a sequence must not help predicting the other ones. Whenever random numbers are mentioned in the rest of this paper, it will be assumed that they fulfill these “practical” definitions.

Testing randomness

Statistical randomness tests aim at determining whether a particular sequence of numbers was produced by a random number generator. The approach is to calculate certain statistical quantities and compare them with average values that would be obtained in the case of a random sequence. These average values are obtained from calculations performed on the model of an ideal random number generator. Testing randomness is an empirical task. There exists numerous tests, each one of them revealing a particular type of imperfection in a sequence.

One example is the frequency test. In the case of binary sequences, it focuses on the relative frequency of 1′s with respect to 0′s. The autocorrelation test, which investigates correlations between adjacent bits, is another example. A good reference on randomness testing can be found in [1]. Maurer demonstrated that all tests can be derived from trying to compress a sequence [3]. If a given sequence can be compressed, then it is not random. When put into the perspective of the definition given above for an infinite random sequence, this observation is natural.

Because of the difficulty of defining what a random number is, it is essential to choose an adequate generator to produce these numbers. Moreover, it is safer to have a good understanding of the underlying randomness generating process.

[1] Knuth, D., The Art of Computer Programming, Vol. 2, 2nd ed., Addison-Wesley, Reading, (1981).

[2] Schneier, B., Applied Cryptography: Protocols, Algorithms, and Source Code in C, John Wiley & Sons, New York, (1996).

[3] Maurer, U., “A universal statistical test for random bits generators”, Journ