reasoning-under-uncertainty-basics

1. Random Event, Proposition, Variable, Domain ¶

We can describe the uncertainty as random events/propositions. Following are some examples of propositions:

Tomorrow will be rainy (The forecast is not always correct, and is only a guess).
The US stock market return will be at least 10% in 2022 (Again, this prediction cannot be guaranteed).

A basic proposition can be presented by a random variable, which can take values from its domain. In the above example 1, the random variable is weather, whose domain is {rainy, sunny, cloudy}. In the example 2, the random variable is US_stock_return, whose domain is all the real values $\mathbb{R}$.

For the sake of simplicity, here we only consider finite discrete domains. We convert continuous domains to finite discrete ones by discretisation. For example, in the above example 2, we can discretise the continuous domain $\mathbb{R}$ into {very poor, poor, ok, good, very good}, where

very poor $ = (-\infty, -20\%]$,
poor $ = (-20\%, -10\%]$,
ok $ = [-10\%, 10\%)$,
good $ = [10\%, 20\%)$,
very good $ = [20\%, \infty)$.

In other words, a basic proposition $A$ can be presented as

$$A: X = x, $$

where $X$ is a random variable in $A$, and $x$ is a value from its domain. Examples include weather = rainy and US_stock_return = good.

A composite event/proposition combines one or more basic events/propositions with logic operators. Given two propositions $A$ and $B$,

$A\ \mathsf{AND}\ B$ (denoted as $A \wedge B$): both $A$ and $B$ are true.
$A\ \mathsf{OR}\ B$ (denoted as $A \vee B$): $A$ is true or $B$ is true.
$\mathsf{NOT}\ A$ (denoted as $\neg A$): $A$ is not true ($A$ is false).

In the above example 2, the proposition "$A$: The US stock market return will be at least 10% in 2022" is a composite proposition that can be written as

$A$: US_stock_return = good $\mathsf{OR}$ US_stock_return = very good.

2. Probability ¶

Probability is to measure how likely an event occur, or a proposition is true. Given an event/proposition $A$, the probability for $A$ to be true is denoted as $P(A)$. The probability has the following axioms.

$0 \leq P(A) \leq 1$,
$P(\neg A) = 1-P(A)$,
If $A$ is always true, then $P(A) = 1$; If $A$ is always false, then $P(A) = 0$,
$\sum_{x \in \Omega(X)}P(X = x) = 1$, where $\Omega(X)$ is the domain of the random variable $X$.

If we roll a fair die with all the six numbers occur with the same probability, we know that the probability of each number is as follows:

Die	1	2	3	4	5	6
Prob	1/6	1/6	1/6	1/6	1/6	1/6

QUESTION: If we roll two fair dice, What is the probability that the total number of the two dice is 11?

Click to see the answer and explanation!

**ANSWER**: $1/18$. **Explanation**: Let random variables $X_1$ and $X_2$ denote the outcome numbers of the two dice, both with domain $\{1,2,3,4,5,6\}$. Then, there are $6 \times 6 = 36$ possible outcomes for $X_1 = x_1$, $X_2 = x_2$ where $x_1, x_2 \in \{1,2,3,4,5,6\}$. Among the 36 outcomes, there are two outcomes with total number of 11, i.e., (1) $X_1 = 5$, $X_2 = 6$ and (2) $X_1 = 6$, $X_2 = 5$. Since all the outcomes have the same probability, we have the probability $P(total = 1) = 2/36 = 1/18$.

2.1 Joint Probability¶

For a composite proposition with the $\mathsf{AND}$ logic, e.g., $A\ \mathsf{AND}\ B$, the probability is also called the joint probability $P(A,B)$.

$$ P(A,B) = P(A\ \mathsf{AND}\ B). $$

In other words, the joint probability $P(A,B)$ means the probability that $A$ and $B$ are both true.

2.2 Uncondition/Conditional Probabilities¶

The unconditional/prior probability reflects the belief in propositions in the absence of any other information/evidence. Examples include:

The prior probability that "a person gets lung cancer" is 0.01%. Let $L$ be the proposition that a person gets lung cancer, the prior probability can be written as $P(L) = 0.01\%$.
The prior probability that "tomorrow will rain" is 30%. Let $R$ be the proposition that tomorrow will rain, the prior probability can be written as $P(R) = 30\%$.

The conditional probability is the upldated belief in propositions given some more information/evidence. Below are possible examples:

The conditional probability that "a person gets lung cancer" given that "the person is a regular smoker" is 20%. Let $L$ be the proposition that a person gets lung cancer, and $S$ be the proposition that a person is a regular smoker, the conditional probability can be written as $P(L | S) = 20\%$.
The conditional probability that "tomorrow will rain" given that "today is cloudy" is 60%. Let $R$ be the proposition that tomorrow will rain, and $C$ be the proposition that tomorrow is cloudy, then the conditional probability can be written as $P(R | C) = 60\%$.

We can see that the conditional probability of a proposition can be dramatically different from its prior probability. In other words, conditions/evidence can greatly change the belief of propositions.

Picking Objects from a Bag¶

In the above example, there are 18 objects in a bag with different colours and shapes. The numbers of objects with each colour and shape are given below.

	Circle	Square	Triangle	Total
Blue	4	2	3	9
White	3	3	3	9
Total	7	5	6	18

If we randomly pick an object from the bag, the colour and shape of the picked object are random variables.

We can calculate the probabilities by counting the numbers. For example,

The probability of picking a blue object is 9/18 (there are 9 blue objects out of the 18 objects in total), i.e., P(blue) = 9/18.
The probability of picking a white and square object is 3/18 (there are 3 objects with white colour and square shape), i.e., P(white, square) = 3/18.
The (conditional) probability that the picked object has a circle shape, given that we have seen its colour is blue, is 4/9 (there are 4 objects with circle shape among the 9 objects with blue colour), i.e., P(circle | blue) = 4/9.

3. Probability Rules ¶

Below are the three commonly used probability rules.

Product Rule¶

Given two propositions $A$ and $B$, we have

$$ P(A,B) = P(A) * P(B | A). $$

The product rule can be explained by establishing the beliefs of $A$ and $B$ in two different ways: parallel and sequential.

Parallel: The probability that both $A$ and $B$ are true is directly denoted as $P(A,B)$ by definition.
Sequential: First, we establish our belief for $A$ alone. The probability that $A$ is true is denoted as $P(A)$. Then, we establish our belief for $B$ given that $A$ is true. This conditional probability is $P(B|A)$. Combining these two beliefs together, the probability that both $A$ and $B$ are true is equivalent to "$A$ is true" $\mathsf{AND}$ "$B$ is true given $A$ is true", which is $P(A) * P(B|A)$.

Since we will reach the same belief through either parallel or sequential way, we have $P(A,B) = P(A) * P(B | A)$.

In the sequential way, we can also establish our belief for $B$ first, and then $A$ given $B$. Then we can obtain $P(A,B) = P(B) * P(A | B)$. In other words, the product rule works for any order of random variables.

We can further extend the product rule from two propositions to any number of propositions. Given a list of propositions $A_1, A_2, \dots, A_n$, we have

$$ P(A_1, A_2, \dots, A_n) = P(A_1) * P(A_2 | A_1) * \dots * P(A_n | A_1, A_2, \dots, A_{n-1}). $$

This is the famous chain rule) for probability.

Sum Rule¶

Given two discrete random variables $X$ and $Y$, with their domains as $\Omega(X)$ and $\Omega(Y)$, then for each $x \in \Omega(X)$, we have

$$ P(X = x) = \sum_{y \in \Omega(Y)} P(X = x, Y = y). $$

In other words, if we jointly consider a random variable $X$ with another random variable $Y$, then the marginal probability $P(X = x)$ is equivalent to the sum of its joint probabilities over all the possible $Y$ values.

For example, if we consider tomorrow's weather $X_{t+1}$ jointly with today's weather $X_t$, where there are three possible weathers $\{sunny, rainy, cloudy\}$, then we have

$$ \begin{aligned} P(X_{t+1} = sunny) & = P(X_{t+1} = sunny, X_t = sunny) \\ & + P(X_{t+1} = sunny, X_t = rainy) \\ & + P(X_{t+1} = sunny, X_t = cloudy) \end{aligned} $$$$ \begin{aligned} P(X_{t+1} = rainy) & = P(X_{t+1} = rainy, X_t = sunny) \\ & + P(X_{t+1} = rainy, X_t = rainy) \\ & + P(X_{t+1} = rainy, X_t = cloudy) \end{aligned} $$$$ \begin{aligned} P(X_{t+1} = cloudy) & = P(X_{t+1} = cloudy, X_t = sunny) \\ & + P(X_{t+1} = cloudy, X_t = rainy) \\ & + P(X_{t+1} = cloudy, X_t = cloudy) \end{aligned} $$

We can further extend the sum rule from two random variables to any number of variables. Given random variables $X_1, X_2, \dots, X_m$ and $Y_1, Y_2, \dots, Y_n$, then for each $x_1 \in \Omega(X_1), x_2 \in \Omega(X_2), \dots x_m \in \Omega(X_m)$, we have

$$ P(X_1 = x_1, \dots, X_m = x_m) = \sum_{y_i \in \Omega(Y_i), i = 1, \dots, n} P(X_1 = x_1, \dots, X_m = x_m, Y_1 = y_1, \dots, Y_n = y_n). $$

Normalisation Rule¶

Given a discrete random variable $X$ with its domain $\Omega(X)$, we have

$$ \sum_{x = \Omega(X)} P(X) = 1. $$

It is obvious that the random variable must take one of the values in its domain. Therefore, the probabilities of all the possible values should add up to 1 (100%).

The normalisation rule holds under any condition as well. Given any conditional proposition $A$, we have

$$ \sum_{x = \Omega(X)} P(X | A) = 1. $$

In the above weather forecast example, the normalisation rule without condition can be written as:

$$ P(X_{t+1} = sunny) + P(X_{t+1} = rainy) + P(X_{t+1} = cloudy) = 1, $$

Similarly, the normalisation rule under the condition $X_t = sunny$ can be written as:

$$ P(X_{t+1} = sunny | X_t = sunny) + P(X_{t+1} = rainy | X_t = sunny) + P(X_{t+1} = cloudy | X_t = sunny) = 1, $$

4. Independence ¶

Given two random variables $X$ and $Y$, if for any possible value $x \in \Omega(X)$, the probability $P(X = x)$ does not depend on the $Y$ value, then we call that $X$ and $Y$ are independent of each other, denoted as $X \perp Y$.

If two random variables $X$ and $Y$ are independent, then for each $x \in \Omega(X)$ and $y \in \Omega(Y)$, we have

$$ P(X = x | Y = y) = P(X = x). $$

This is obvious by definition, since our belief on $P(X = x)$ does not change at all no matter whether we have information on $Y$ or not.

Similarly, we have

$$ P(Y = y | X = x) = P(Y = y). $$

Further, combined with the product rule, we have

$$ P(X = x, Y = y) = P(X = x) * P(Y = y), $$

For example, if we flip a fair coin twice, where the two outcome head or tail have the same probability of 0.5, then the outcome of the two flips are independent of each other. We have

$$ \begin{aligned} P(flip_2 = head | flip_1 = head) & = P(flip_2 = head) \\ & = 0.5, \end{aligned} $$$$ \begin{aligned} P(flip_1 = head, flip_2 = head) & = P(flip_1 = head) * P(flip_2 = head) \\ & = 0.5 * 0.5 = 0.25. \end{aligned} $$

For more variables, we say that $X_1, \dots, X_n$ are independent with each other, if

$$ P(X_1, \dots, X_n) = P(X_1) * \dots * P(X_n). $$

Conditional Independence¶

The independence can also be conditional. Given random variables $X$, $Y$ and $Z$, we say that $X$ and $Y$ are conditionally independent given $Z$, if for each $x \in \Omega(X)$, $y \in \Omega(Y)$ and $z \in \Omega(Z)$, we have

$$ P(X = x, Y = y | Z = z) = P(X = x | Z = z) * P(Y = y | Z = z). $$

We can see that the formula of conditional independence is obtained by adding the condition (e.g., $| Z = z)$ into each probability of the unconditional formula.

For more variables, we say that $X_1, \dots, X_n$ are independent given $Y_1, \dots, Y_m$, if

$$ P(X_1, \dots, X_n | Y_1, \dots, Y_m) = P(X_1 | Y_1, \dots, Y_m) * \dots * P(X_n | Y_1, \dots, Y_m). $$

5. Case Study on Weather Forecast ¶

In this case study, we aim to forecast the weather in the future. The weather is denoted as random variable $X$, and its domain (possible weathers) is $\{Sunny, Rainy\}$.

For the weather tomorrow, denoted as $X_t$, the forecast says that it will be sunny with probability of 20%, and rainy with probability of 80% (obviously, they sum up to 100% due to normalisation rule). That is,

$$ P(X_t = sunny) = 0.2,\\ P(X_t = rainy) = 0.8. $$

We also know that the weather tends to change smoothly over time. Thus, the weather for the day after tomorrow, denoted as $X_{t+1}$, depends on the weather tomorrow as follows.

If tomorrow is sunny, then the day after tomorrow will be sunny with probability of 60%, and rainy with probability of 40%.
If tomorrow is raining, then the day after tomorrow will be sunny with probability of 30%, and rainy with probability of 70%.

In other words, we have

$$ P(X_{t+1} = sunny | X_t = sunny) = 0.6, \\ P(X_{t+1} = rainy | X_t = sunny) = 0.4, $$$$ P(X_{t+1} = sunny | X_t = rainy) = 0.3, \\ P(X_{t+1} = rainy | X_t = rainy) = 0.7. $$

QUESTION¶

What is the probability that the day after tomorrow will be sunny?

Answer¶

The probability that the day after tomorrow will be sunny can be written as $P(X_{t+1} = sunny)$.

First, we can use the sum rule to take the weather tomorrow into consideration as follows:

$$ P(X_{t+1} = sunny) = P(X_{t+1} = sunny, X_t = sunny) + P(X_{t+1} = sunny, X_t = rainy). $$

Then, for each term on the right hand side, we can use the product rule that considers $X_t$ first, and then $X_{t+1}$ given $X_t$. Specifically, we have

$$ P(X_{t+1} = sunny, X_t = sunny) = P(X_t = sunny) * P(X_{t+1} = sunny | P(X_t = sunny), \\ P(X_{t+1} = sunny, X_t = rainy) = P(X_t = rainy) * P(X_{t+1} = sunny | P(X_t = rainy). $$

All the above probabilities on the right hand side are known from the forecast. Therefore, we have

$$ \begin{aligned} P(X_{t+1} = sunny, X_t = sunny) & = P(X_t = sunny) * P(X_{t+1} = sunny | P(X_t = sunny) \\ & = 0.2 \times 0.6 = 0.12, \end{aligned} $$$$ \begin{aligned} P(X_{t+1} = sunny, X_t = rainy) & = P(X_t = rainy) * P(X_{t+1} = sunny | P(X_t = rainy) \\ & = 0.8 \times 0.3 = 0.24. \end{aligned} $$

Therefore, we have

$$ \begin{aligned} P(X_{t+1} = sunny) & = P(X_{t+1} = sunny, X_t = sunny) + P(X_{t+1} = sunny, X_t = rainy) \\ & = 0.12 + 0.24 = 0.36. \end{aligned} $$

Reasoning Under Uncertainty: Basics¶

Table of Contents¶