Introduction to Probability

1. Discrete outcomes

1.1. Two equally likely outcomes

1.1.1. Analysis using relative frequencies

Consider an experiment of flipping a fair coin and count how many time we observe \(H\) and how many times we observe \(T\). We aim to observe the number of \(H\) and \(T\) counts (actually the relative frequency of these counts) in a sample of size \(N\). Let us firstly consider a small number of repetitions of the experiment (or consider a small sample), say of size \(N = 10\).

N <- 10
set.seed(123)
outcomes <-sample(c(1,2), N, replace = TRUE, prob = c(0.5,0.5))
sorting<-data.frame(table(outcomes))
sorting$rel.freq<- sorting$Freq/N
barplot(sorting$rel.freq, main = "Flipping a fair coin 10 times - relative frequencies", names.arg=c("H"," T"), col=c("red", "blue"), xlab = "Figure 1")

The output shows (Figure 1) that out of 10 trials 6 of them were \(H\) and 4 were \(T\). So, what will happen with the relative freaquency if we increase the number of trials (the random sample size) ? Next, observe the output if the number of trials is increased to 45. What does the output show? What do we expect to see?

For \(N=45\) we still observe (Figure 2) more \(H\) outcomes than \(T\) outcomes. Because we are flipping a fair coin, we expect to have almost the same number of observed \(H\) and \(T\) and these outcomes (\(H\) and \(T\)) should have equal relative frequencies; each of these outcomes should have a relative frequency approximately equal to \(0.5\).

Increase the sample size to \(N=100\) and observe that the relative frequencies are getting closer to 0.5, but still different. Also, run the code to observe that if the sample size \(N\) becomes very large these two frelative requencies gets to be approximately equal to each other and equal to \(0.5\), for example for \(N=1000000\) the result is given in Figure 3.

Summary of our findings:

for an experiment with only two equally outcomes (say \(H\) & \(T\)) after many trials we should see approximately equal number of occurencies of the outcomes, i.e., the relative frequencies of the outcomes should be the same and equal to \(0.5\);
we note that the relative frequencies add up to 1.

We define the limiting relative frequency of the otcomes as a statistical probability of the outcome and denote it by \[P(H) = 0.5 \,\,\, \mbox{and}\,\,\, P(T) = 0.5, \]

and \[P(H) + P(T) = 1.\]

Having in mind the results of this experiment, we introduce the idea of a discrete random variable and a discrete probability distribution. We introduce a random variable \(X\) (its value is unknown before the experiment is performed) as follows:

define \(X\) \begin{align*} \mbox{} \hspace{2ex} X &= \left\lbrace \begin{array}{lll} & 1 \hspace{1ex}&\mbox{if}\hspace{1ex} \mbox{outcome}\hspace{1ex} \mbox{H} \\ & 2 \hspace{1ex}&\mbox{if} \hspace{1ex} \mbox{outcome}\hspace{1ex} \mbox{T} \\ \end{array} \right. \end{align*}

the probability mass distribution function of \(X\) is given by the table below:

X	1	2
P	0.5	0.5

The probability mass distribution function alocates probabilities to all possible values of the random variable.

1.1.2 Analysis using cumulative relative frequencies

Consider the same experiment of flipping a fair coin and count how many time we observe \(H\) and how many times we observe \(T\). We aim to observe the number of \(H\) and \(T\) counts (actually the relative frequency of these counts) in a sample of size \(N\) and present our observations using an output based on cumulative relative frequencies. Because we have already have an intuition on how the sample size affects the relative frequencies we go right away to a large samlpe size scenario, say \(N = 100000\).

The R code that “performs” this experiment is given below.

N <- 100000
set.seed(123)
outcomes <-sample(c(1,2), N, replace = TRUE, prob = c(0.5,0.5))
sorting<-as.data.frame(table(outcomes))
K <- length(sorting$outcome)
m <- matrix(0, ncol = length(sorting$outcome), nrow = length(sorting$outcome))
for(i in (1:K)){
    tail<-c()
    for (j in ((K-(i-1)):K)){
    tail<-c(tail, sorting[[2]][K+1-j]/N)}
    tail<-rev(tail)
    col<-c(rep(0,K-i),tail)
    m[,i]<-col}
barplot(as.matrix(m), col=c(rep("red",K-1),"darkblue"), main = "Flipping a fair coin 100000 times - cumulative relative frequencies", space=0, xlab="Figure 4")

Let us see what information we can extract from Figure 4. We can think that the entire range of relative frequencies values (0, 1) is split into two non-overlapping subsets - (0, 0.5) and (0.5, 1), where the subset (0, 0.5) is associated with the value of \(X =1\) (or observing \(H\)) and the subset (0.5, 1) is associated with the value of \(X =2\) (or observing \(T\)). So, if we have a mechanism of uniformly, randomly selecting a number \(r\) such that \(0 \le r \le 1\), than the value of \(r\) consists enough information to identify the outcome of the experiment, i.e.,

\begin{align*} \mbox{} \hspace{2ex} Outcome &= \left\lbrace \begin{array}{lll} & H \hspace{2ex}&\mbox{if} \hspace{2ex} 0 \le r \le \frac{1}{2} \\ & T \hspace{2ex}&\mbox{if} \hspace{2ex} \frac{1}{2} \le r \le 1. \\ \end{array} \right. \end{align*}

In other words, if for example, we have generated value \(r = 0.866\), then we say that the experiment outcome is \(T\) (or we have observed \(T\)), and we say that the simulated outcome of the experiment is \(T\).

1.2. Two non-equally likely outcomes

1.2.1. Analysis using relative frequencies

We repeat the same experiment with a two non-equally likely outcomes. So the values of the parameters are: Number of trials 100 000; possible observations - (1,2) and the distribution (0.3,0.7) and observe the following output the output given in Figure 5.

N <- 100000
set.seed(123)
outcomes <-sample(c(1,2), N, replace = TRUE, prob = c(0.3,0.7))
sorting<-data.frame(table(outcomes))
sorting$rel.freq<- sorting$Freq/N
barplot(sorting$rel.freq, main = "Flipping an unfair coin 100000 times - relative frequencies", names.arg=c("H"," T"), col=c("red", "blue"), xlab = "Figure 5")

1.2.1. Analysis using cumulative relative frequencies

So, to simulate an outcome of this experiment, we do the following:

generate an uniform random number in \((0, 1)\);
simulate the outcome using the rule below

\begin{align*} \mbox{} \hspace{2ex} Outcome &= \left\lbrace \begin{array}{lll} & H \hspace{2ex}&\mbox{if} \hspace{2ex} 0 \le r \le 0.7 \\ & T \hspace{2ex}&\mbox{if} \hspace{2ex} 0.7 \le r \le 1. \\ \end{array} \right. \end{align*}

In \(R\), the command to generate, for example, \(n =3\) uniform random number in \((0,1)\) is:

n = 3
r <-runif(n)
print (r)

## [1] 0.60240988 0.02285169 0.82056246

1.3. More than two equally likely outcomes

1.3.1. Analysis using relative frequencies

Let us consider rolling a fair die.

1.3.2. Analysis using cumulative relative frequencies

So, to simulate an outcome of this experiment, we do the following:

generate an uniform random number in \((0, 1)\);
simulate the outcome using the rule below

\begin{align*} \mbox{} \hspace{2ex} Outcome &= \left\lbrace \begin{array}{lll} & 1 \hspace{2ex}&\mbox{if} \hspace{2ex} 0 \le r \le \frac{1}{6} \\ & 2 \hspace{2ex}&\mbox{if} \hspace{2ex} \frac{1}{6} \le r \le \frac{2}{6} \\ & 3 \hspace{2ex}&\mbox{if} \hspace{2ex} \frac{2}{6}\le r \le \frac{3}{6} \\ & 4 \hspace{2ex}&\mbox{if} \hspace{2ex} \frac{3}{6} \le r \le \frac{4}{6} \\ & 5 \hspace{2ex}&\mbox{if} \hspace{2ex} \frac{4}{6} \le r \le \frac{5}{6} \\ & 6 \hspace{2ex}&\mbox{if} \hspace{2ex} \frac{5}{6} \le r \le 1 \\ \end{array} \right. \end{align*}

For example, if the generated random number is \(r = 0.56777\), then the generated/simulated outcome from this experiment is 4.

1.4. More than two non-equally likely outcomes

1.4.1. Analysis using relative frequencies

Consider an experiment with four possible outcomes, say \(\{A1, A2, A3, A4\}\) and assume that their chances of occurence (their probabilities) are

\[P(A1) = 0.2, \,\,\, P(A2) = 0.3,\,\,\, P(A3) = 0.1,\,\,\, P(A4) = 0.4.\] Note that all probabilities are positive and they add up to 1, i.e., \[P(A1) + P(A2) + P(A3) + P(A4) = 1.\]

1.4.2. Analysis using cumulative relative frequencies

So, to simulate an outcome of this experiment, we do the following:

generate an uniform random number in \((0, 1)\);
simulate the outcome using the rule below

\begin{align*} \mbox{} \hspace{2ex} Outcome &= \left\lbrace \begin{array}{lll} & 1 \hspace{2ex}&\mbox{if} \hspace{2ex} 0 \le r \le 0.2 \\ & 2 \hspace{2ex}&\mbox{if} \hspace{2ex} 0.2 \le r \le 0.5 \\ & 3 \hspace{2ex}&\mbox{if} \hspace{2ex} 0.5 \le r \le 0.6 \\ & 4 \hspace{2ex}&\mbox{if} \hspace{2ex} 0.6 \le r \le 1.0 \\ \end{array} \right. \end{align*}

Similarly to the ideas in Section 1.1.1., we introduce a random (its value is unknown before the experiment is performed) variable \(Y\) that will represent the outcome of this experiment as follows:

define \(Y\) \begin{align*} \mbox{} \hspace{2ex} Y &= \left\lbrace \begin{array}{lll} & 1 \hspace{1ex}&\mbox{if}\hspace{1ex} \mbox{outcome}\hspace{1ex} \mbox{A1} \\ & 2 \hspace{1ex}&\mbox{if}\hspace{1ex} \mbox{outcome}\hspace{1ex} \mbox{A2} \\ & 3 \hspace{1ex}&\mbox{if}\hspace{1ex} \mbox{outcome}\hspace{1ex} \mbox{A3} \\ & 4 \hspace{1ex}&\mbox{if}\hspace{1ex} \mbox{outcome}\hspace{1ex} \mbox{A4} \\ \end{array} \right. \end{align*}
the probability mass distribution function of \(Y\) is given by the table below:

Y	1	2	3	4
P	0.2	0.3	0.1	0.4

Now, we can simulate the value of this random variable as follows:

generate an uniform random number \(r\) in \((0, 1)\);
simulate the value of \(Y\) using the rule below:

\begin{align*} \mbox{} \hspace{2ex} Y &= \left\lbrace \begin{array}{lll} & 1 \hspace{1ex}&\mbox{if }\hspace{1ex} 0.0 \le r \le 0.2 \\ & 2 \hspace{1ex}&\mbox{if } \hspace{1ex} 0.2 \le r \le 0.5 \\ & 3 \hspace{1ex}&\mbox{if } \hspace{1ex} 0.5 \le r \le 0.6 \\ & 4 \hspace{1ex}&\mbox{if } \hspace{1ex} 0.6 \le r \le 1.0 \\\hspace{1ex}\hspace{1ex}\hspace{1ex} \end{array} \right. \end{align*}

In addition to the probability mass distribution function the random variable \(Y\) can be described by its, so called cumulative distribution function defined as \[F(y) = P(Y \le y).\] The cumulative distribution function F(y) measures the probability that the value of the random variable is to the left or at \(y\). For example, for the random variable defined above:

F(0.7) = 0.0 – there are no possible values of Y less or equal to 0.7;

F(1.8) = 0.2 – only (1 with prob 0.2) is less or equal to 1.8;

F( 3.0) = 0.6 – (1 with prob 0.2; 2 with prob 0.3; and 3 with prob 0.1) are less or equal to 3, so add up these probs.

2. Continuous outcomes

Now consider an experiment with infinitely many possible outcomes. The random variable associated with this type of experiment will have infinitely many possible values. This type of random variables are called continuous random variables. It is not possible to describe a continuous random variable using its mass probability distribution function - the table will be infinitely long and the probabilitites on the second row will be very small. On the other hand the cumulative distribution function (or equivalently the cumulative frequencies) could provide an useful tool to generate/simulate values of a continuous random variable.

N <-10000
set.seed(123)
outcomes <-sample(c(rep(1:90)), N, replace = TRUE)
sorting<-as.data.frame(table(outcomes))
K <- length(sorting$outcome)
m <- matrix(0, ncol = length(sorting$outcome), nrow = length(sorting$outcome))
for(i in (1:K)){
    tail<-c()
    for (j in ((K-(i-1)):K)){
    tail<-c(tail, sorting[[2]][K+1-j]/N)}
    tail<-rev(tail)
    col<-c(rep(0,K-i),tail)
    m[,i]<-col}
barplot(as.matrix(m),  xlab="Figure 11",col=c(rep("red",K-1),"darkblue"),space=0)
abline(v=54, col="blue")
text(54, 0.7, "54") 
abline(h=0.6, col="blue")

If the number of outcomes of an experiment is very large the top blue line of the plot in Figure 10 will be a very close approximation for the cumulative distribution function \(F(x)\) of the random variable \(X\) related to this experiment. The cumulative distribution function \(F(x)\) of a continuous random variable is an increasing continuous function, approaching 1 for very large values of \(x\) and approaching 0 for very small values of \(x\). So, how to simulate the value of the random variable \(X\) using the cumulative relative frequency plot (or the equvalent of it for large number of outcomes - the cumulative distribution function)? The algorithm is as follows:

generate an uniform random number \(r\) in \((0, 1)\);
find the value \(x^*\) of the random variable \(X\) such that \(F(x^*) = r.\)

The geometric interpretation of the above algorithm is:

Draw a horizontal line \(y = r\) and find its intercept at point \(P(u, r)\) with the graph of \(F(x)\) (or the top blue line).
The value of \(x^*\) is the intercept \(Q(x^*, 0)\) of the vertical line through point \(P(u, r)\) and the x-axis.

For example, if the generated uniform random number \(r = 0.6\), then (see Figure 10) \(x^* = 54.\)

Lecture 1

Stefanka Chukova

Introduction to Probability

1. Discrete outcomes

1.1. Two equally likely outcomes

1.1.1. Analysis using relative frequencies

1.1.2 Analysis using cumulative relative frequencies

1.2. Two non-equally likely outcomes

1.2.1. Analysis using relative frequencies

1.2.1. Analysis using cumulative relative frequencies

1.3. More than two equally likely outcomes

1.3.1. Analysis using relative frequencies

1.3.2. Analysis using cumulative relative frequencies

1.4. More than two non-equally likely outcomes

1.4.1. Analysis using relative frequencies

1.4.2. Analysis using cumulative relative frequencies

2. Continuous outcomes

Inverse Transformation Method