Uncertainty and Randomness

Introduction

Uncertainty and Randomness are concepts used in many fields such as statistics, risk management and information theory. Although we all have an intuitive understanding of their meaning, we fail to express with clarity their differences and sometimes use them interchangeably. This article explores the roots of these concepts and presents some of their unique properties.

Random numbers

No number is truly random by itself; a set of numbers is said to be random when each number in the set is produced by chance and is not dependent in any way on previous or future members of the set. The idea of randomness is characterised by a combination of local irregularity and statistical regularity.

This idea is not easy to formalise. In a uniformly distributed sequence of digits, each digit should appear 1/10 of the time, but the probability of a random sequence having this exact configuration is rather small, nevertheless we expect this to be the average case.

The first attempt at formalising the definitions of a random sequence was proposed by Von Mises (1936):

An infinite sequence $a_1, a_2 \dots$ of 0’s and 1’s is a random sequence if the two following conditions are satisfied :

1. If $f(r)$ is the number of 1’s among the first r terms of $a_1, a_2 \dots$ , then $f(r)/r$ approaches a limit $p$ as $r$ approaches infinity.
2. If $a_{n1}, a_{n2} \dots$ is any infinite sub-sequence of $a_1, a_2 \dots$, formed by deleting some of the terms of the latter sequence according to a rule which makes the deletion or retention of $a_n$ depend only on $n$ and $a_1, a_2, \dots , a_{n-1}$ , and if $g(r)$ is the number of 1’s among the first $r$ terms of $a_{n1} a_{n2}, \dots$, then $g(r)/r$  approaches the same limit $p$ as $r$ approaches infinity.

There are a number of complications with this definition, in particular its reliance on infinity which poses some fundamental problems on the concept of randomness. For any given natural random number the probability of any other random number being greater than itself must approach certainty.

Another problem can be exemplified with the decimal representation of $\pi$. The distribution of digits is very close to uniform at least up to six million digits, but it is not a true random sequence since each digit is precisely determined in the sequence and we can always reproduce it, therefore we know the sequence a priori.

These and other problems led to Kolmogorov to propose an alternative definition, (Chaitin, Martin-Löf and others have proposed equivalent formalisations).

The Kolmogorov complexity $K(s)$ of a binary sequence $s$ is defined as follows: $K(s)$ is the length of the smallest program $p$ such that $C(p) = s$, where $C(p)$ is the output of program $p$ executed on a computer $C$.

An upper bound to $K(s)$ is given by the length of $s$ plus some fixed constant, because we can always take our program $p$ to be “print s”.

A sequence is called complex or Kolmogorov random if its Kolmogorov complexity roughly equals its length.

Another approach to the problem are the definitions proposed by Eagle for the concepts of unpredictability and randomness.

1. An event E (at some temporal distance t) is unpredictable for a predictor P if and only if P’s posterior credence in E after conditioning on current evidence and the best prediction function available to P is not 1, that is, if the prediction function yields a posterior probability distribution that does not assign probability 1 to E.
2. An event E is random for a predictor P using theory T if and only if E is maximally unpredictable. An event E is maximally unpredictable for P and T if and only if the posterior probability of E yielded by the prediction functions that T makes available, conditional on current evidence, is equal to the prior probability of E.

All these definitions partially satisfy operational requirements but are only concerned with observed randomness; none of the definitions determine whether randomness constitutes an intrinsic feature of reality or is merely the product of epistemic limitations.

We can determine the mean and the variance or a set of random numbers, and these could be identical to those from a set of pre-determined numbers. The difference is that in one case we know the members of the set a priori, and in the other we don’t have that information. A set of random numbers has a variance, but a set with a variance is not necessarily random. carrying this idea to the end, the difference between two identical sequences, one produced randomly and another following a design is about the information we have about the way in which they are constructed. There is a key element of information that we need to explore.

Information

Let $\eta$ be an event with a probability of occurrence $\Pr(\eta)$. When $\eta$ takes place we say that we have obtained $I(\eta) =\log\displaystyle\frac{1}{\Pr(\eta)}$

units of information. The choice of log base determines the units of measure; for log base 2 the unit is the familiar “bit” of information. We can extend this concept to a memoryless information source that produces a sequence of symbols from a finite set $S = \{s_1, s_2, \dots s_n\}$ with a given probability distribution (a random sequence). The probability associated with each symbol will be given by $\Pr(s_1), \Pr(s_2) \dots \Pr(s_n)$.

The mean information provided by the the information source is the expected value of the information, calculated in the usual way: $\sum\limits_S\Pr(s_i)I(s_i)$

This quantity is defined as the entropy $H(S)$ of the memoryless source of information. $H(S) = \sum\limits_S \Pr(s_i)\log \displaystyle\frac{1}{\Pr(s_i)}$

The entropy can be interpreted in two ways. It represents the mean  information provided by the source, or the mean uncertainty of an observer in relation to the information that is yet to be obtained before it is received.

The entropy has some interesting properties. Let’s consider two sets of probabilities, $x_1, x_2 \dots x_n$ and $y_1, y_2 \dots y_n$. Then we can write the following expression: $\displaystyle\sum\limits_{i=i}^n x_i\log\frac{y_i}{x_i} = \frac{1}{\ln 2}\sum\limits_{i=1}^n x_i\ln\frac{y_i}{x_i}$

Applying the following inequation $\ln x \le x-1$ we arrive at the following inequality, which is an equality for any value of $i$ only when $x_i = y_1$ $\displaystyle\sum\limits_{i=1}^n x_i \log\frac{1}{x_i} \le \sum\limits_{i=1}^n x_i \log\frac{1}{y_i}$.

Using this equation we can determine the quantity of information provided by a memoryless source. Let $S = \{s_i\}$  with associated probability set $P(s_i)$; the entropy is then $H(S) = \displaystyle\sum\limits_{i=1}^n P_i \log\frac{1}{P_i}$

Now we consider the expression $\log n - H(S) = \displaystyle\sum\limits_{i=1}^n P_i \log n - \sum\limits_{i=1}^n P_i \log \frac{1}{P_i} = \sum\limits_{i=1}^n P_1 \log n P_i = \log e \sum\limits_{i=1}^n P_i \ln n P_i$

Using the inequality deduced above, we can write $\log n - H(S) \ge \log e \sum\limits_{i=1}^n P_i \displaystyle (1-\frac{1}{nP_i}) = \log e (\sum\limits_{i-1}^n P_i - \frac{1}{n} \sum\limits_{i=1}^n \frac{P_i}{P_i}) = 0$

From this equation we can see that the entropy is always less or equal to $\log n$. The case when they are equal corresponds to $P_i = \displaystyle\frac{1}{n}$

We can plot the entropy function for an alphabet of two symbols with continuous probability. From the graph we can see that the maximum entropy (maximum uncertainty) occurs when the outcomes are equiprobable, in other words a uniform distribution. For any other distribution the uncertainty is not as high. At the extremes when the probability is either 0 or 1 there is no uncertainty at all. The event is no longer random as we can predict the exact outcome.

Conclusion

Uncertainty is an attribute of randomness. A random sequence produced by identical random variables with say a Poisson distribution has less uncertainty than if the random variables have a uniform distribution. Knowing the probability distribution allows us to calculate uncertainty, otherwise we need to consider the extreme case and apply the principle of indifference. In Bayesian probability this is also known as the principle of insufficient reason, and consists in assigning equal probabilities to all independent events. In frequency based probability this concept is meaningless.

The uncertainty principle

An interesting result from Quantum mechanics is the uncertainty principle. It states that there is a limit to the observed precision of the values of a pair of conjugate variables. For instance if p is the momentum of a particle, and x its position, then $\sigma_p\sigma_x \ge \displaystyle\frac{\hbar}{2}$

where $\hbar \approxeq 10^{-34}$

What does this mean in practice? Let’s take the case where both standard deviations are equal. $\sigma_p = \sigma_x \approxeq 10^{-17}$

We can get an idea of the significance by considering the position. The uncertainty is equivalent to one micron compared to the distance between the earth and the sun. That is beyond any significant error that can be measured, and for practical purposes it does not have any effect on our macroscopic world.

References

Eagle, A. (2005) “Randomness is Unpredictability”
Knuth, D. (1998) “The Art of Computer Programming, Vol. 2: Semi numerical Algorithms”
Kolmogorov, A.N. (1965) “Three Approached to the quantitative definition of information”
Shannon, C. (1948) “A Mathematical Theory of Communication”
Venn, J. (1962) “The Logic of Chance.”
Von Mises, R. (1936) “Wahrscheinlichkeit, Statistik und Wahrheit ”