{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reasoning Under Uncertainty: Basics\n",
    "\n",
    "The real world is fully of *uncertainties*. The source of uncertainties can be from *true uncertainty* (e.g., the outcome of flipping a coin is unknown in advance), *theoretical ignorance* (e.g., lack of knowledge in medical diagnosis), *laziness* (e.g., do not want to collect useful information such as doing survey to understand customers' opinion about a product), *practical ignorance* (e.g., missing information due to security/privacy), etc.\n",
    "\n",
    "This lecture will introduce the basics of reasoning under uncertainty.\n",
    "\n",
    "# Table of Contents\n",
    "1. [Random Event, Proposition, Variable, Domain](#epvd)\n",
    "2. [Probability](#prob)\n",
    "3. [Probability Rules](#probrules)\n",
    "4. [Independence](#independence)\n",
    "5. [Case Study on Weather Forecast](#case-study)\n",
    "\n",
    "------------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Random Event, Proposition, Variable, Domain <a name=\"epvd\"></a>\n",
    "\n",
    "We can describe the uncertainty as **random events/propositions**. Following are some examples of propositions:\n",
    "\n",
    "1. Tomorrow will be rainy (The forecast is not always correct, and is only a guess).\n",
    "2. The US stock market return will be at least 10% in 2022 (Again, this prediction cannot be guaranteed).\n",
    "\n",
    "A *basic* proposition can be presented by a **random variable**, which can take values from its **domain**. In the above example 1, the random variable is `weather`, whose domain is `{rainy, sunny, cloudy}`. In the example 2, the random variable is `US_stock_return`, whose domain is all the real values $\\mathbb{R}$.\n",
    "\n",
    "For the sake of simplicity, here we only consider *finite discrete* domains. We convert continuous domains to finite discrete ones by *discretisation*. For example, in the above example 2, we can discretise the continuous domain $\\mathbb{R}$ into `{very poor, poor, ok, good, very good}`, where\n",
    "\n",
    "- very poor $ = (-\\infty, -20\\%]$,\n",
    "- poor $ = (-20\\%, -10\\%]$,\n",
    "- ok $ = [-10\\%, 10\\%)$,\n",
    "- good $ = [10\\%, 20\\%)$,\n",
    "- very good $ = [20\\%, \\infty)$.\n",
    "\n",
    "In other words, a basic proposition $A$ can be presented as \n",
    "\n",
    "$$A: X = x,\n",
    "$$\n",
    "\n",
    "where $X$ is a random variable in $A$, and $x$ is a value from its domain. Examples include `weather = rainy` and `US_stock_return = good`.\n",
    "\n",
    "A *composite* event/proposition combines one or more *basic* events/propositions with logic operators. Given two propositions $A$ and $B$,\n",
    "\n",
    "- $A\\ \\mathsf{AND}\\ B$ (denoted as $A \\wedge B$): both $A$ and $B$ are true.\n",
    "- $A\\ \\mathsf{OR}\\ B$ (denoted as $A \\vee B$): $A$ is true or $B$ is true.\n",
    "- $\\mathsf{NOT}\\ A$ (denoted as $\\neg A$): $A$ is not true ($A$ is false).\n",
    "\n",
    "In the above example 2, the proposition \"$A$: The US stock market return will be at least 10% in 2022\" is a composite proposition that can be written as \n",
    "- $A$: `US_stock_return = good` $\\mathsf{OR}$ `US_stock_return = very good`.\n",
    "\n",
    "------------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Probability <a name=\"prob\"></a>\n",
    "\n",
    "**Probability** is to measure how likely an event occur, or a proposition is true. Given an event/proposition $A$, the probability for $A$ to be true is denoted as $P(A)$. The probability  has the following axioms.\n",
    "\n",
    "- $0 \\leq P(A) \\leq 1$,\n",
    "- $P(\\neg A) = 1-P(A)$,\n",
    "- If $A$ is always true, then $P(A) = 1$; If $A$ is always false, then $P(A) = 0$,\n",
    "- $\\sum_{x \\in \\Omega(X)}P(X = x) = 1$, where $\\Omega(X)$ is the domain of the random variable $X$.\n",
    "\n",
    "If we roll a *fair* die with *all the six numbers occur with the same probability*, we know that the probability of each number is as follows:\n",
    "\n",
    "| Die  | 1   |  2  |  3  |  4  |  5  |   6 |\n",
    "|------|-----|-----|-----|-----|-----|-----|\n",
    "| Prob | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 |\n",
    "\n",
    "> **QUESTION**: If we roll two fair dice, What is the probability that the total number of the two dice is 11?\n",
    "\n",
    "---------\n",
    "<details>\n",
    "  <summary>Click to see the answer and explanation!</summary>\n",
    "    \n",
    "**ANSWER**: $1/18$.\n",
    "    \n",
    "**Explanation**: Let random variables $X_1$ and $X_2$ denote the outcome numbers of the two dice, both with domain $\\{1,2,3,4,5,6\\}$. Then, there are $6 \\times 6 = 36$ possible outcomes for $X_1 = x_1$, $X_2 = x_2$ where $x_1, x_2 \\in \\{1,2,3,4,5,6\\}$. \n",
    "    \n",
    "Among the 36 outcomes, there are two outcomes with total number of 11, i.e., (1) $X_1 = 5$, $X_2 = 6$ and (2) $X_1 = 6$, $X_2 = 5$. Since all the outcomes have the same probability, we have the probability $P(total = 1) = 2/36 = 1/18$.\n",
    "</details>\n",
    "\n",
    "---------\n",
    "\n",
    "### 2.1 Joint Probability\n",
    "\n",
    "For a *composite* proposition with the $\\mathsf{AND}$ logic, e.g., $A\\ \\mathsf{AND}\\ B$, the probability is also called the **joint probability** $P(A,B)$.\n",
    "\n",
    "$$\n",
    "P(A,B) = P(A\\ \\mathsf{AND}\\ B).\n",
    "$$\n",
    "\n",
    "In other words, the joint probability $P(A,B)$ means the probability that $A$ and $B$ are both true.\n",
    "\n",
    "### 2.2 Uncondition/Conditional Probabilities\n",
    "\n",
    "The **unconditional/prior** probability reflects the belief in propositions *in the absence of any other information/evidence*. Examples include:\n",
    "- The prior probability that \"**a person gets lung cancer**\" is 0.01%. Let $L$ be the proposition that a person gets lung cancer, the prior probability can be written as $P(L) = 0.01\\%$.\n",
    "- The prior probability that \"**tomorrow will rain**\" is 30%. Let $R$ be the proposition that tomorrow will rain, the prior probability can be written as $P(R) = 30\\%$.\n",
    "\n",
    "The **conditional** probability is the upldated belief in propositions *given some more information/evidence*. Below are possible examples:\n",
    "- The conditional probability that \"**a person gets lung cancer**\" given that \"**the person is a regular smoker**\" is 20%. Let $L$ be the proposition that a person gets lung cancer, and $S$ be the proposition that a person is a regular smoker, the conditional probability can be written as $P(L | S) = 20\\%$.\n",
    "- The conditional probability that \"**tomorrow will rain**\" given that \"**today is cloudy**\" is 60%. Let $R$ be the proposition that tomorrow will rain, and $C$ be the proposition that tomorrow is cloudy, then the conditional probability can be written as $P(R | C) = 60\\%$.\n",
    "\n",
    "We can see that the conditional probability of a proposition can be dramatically different from its prior probability. In other words, conditions/evidence can greatly change the belief of propositions. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Picking Objects from a Bag\n",
    "\n",
    "<img src=\"img/counts.png\" width=300 />\n",
    "\n",
    "In the above example, there are 18 objects in a bag with different **colours** and **shapes**. The numbers of objects with each colour and shape are given below.\n",
    "\n",
    " |   | Circle | Square | Triangle | Total |\n",
    " | ---| ---| ---| --- | --- |\n",
    " | Blue | 4 | 2 | 3 | 9 |\n",
    " | White | 3 | 3 | 3 | 9 |\n",
    " | Total | 7 | 5 | 6 | 18 |\n",
    "\n",
    "If we randomly pick an object from the bag, the `colour` and `shape` of the picked object are random variables.\n",
    "\n",
    "We can calculate the probabilities by counting the numbers. For example,\n",
    "- The probability of picking a `blue` object is 9/18 (there are 9 blue objects out of the 18 objects in total), i.e., `P(blue) = 9/18`.\n",
    "- The probability of picking a `white` and `square` object is 3/18 (there are 3 objects with white colour and square shape), i.e., `P(white, square) = 3/18`.\n",
    "- The (conditional) probability that the picked object has a `circle` shape, given that we have seen its colour is `blue`, is 4/9 (there are 4 objects with circle shape among the 9 objects with blue colour), i.e., `P(circle | blue) = 4/9`.\n",
    "\n",
    "------------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Probability Rules <a name=\"probrules\"></a>\n",
    "\n",
    "Below are the three commonly used probability rules.\n",
    "\n",
    "### Product Rule\n",
    "\n",
    "Given two propositions $A$ and $B$, we have\n",
    "\n",
    "$$\n",
    "P(A,B) = P(A) * P(B | A).\n",
    "$$\n",
    "\n",
    "The product rule can be explained by establishing the beliefs of $A$ and $B$ in two different ways: **parallel** and **sequential**.\n",
    "\n",
    "- **Parallel**: The probability that both $A$ and $B$ are true is directly denoted as $P(A,B)$ by definition.\n",
    "- **Sequential**: First, we establish our belief for $A$ alone. The probability that $A$ is true is denoted as $P(A)$. Then, we establish our belief for $B$ given that $A$ is true. This conditional probability is $P(B|A)$. Combining these two beliefs together, the probability that both $A$ and $B$ are true is equivalent to \"$A$ is true\" $\\mathsf{AND}$ \"$B$ is true given $A$ is true\", which is $P(A) * P(B|A)$. \n",
    "\n",
    "Since we will reach the same belief through either parallel or sequential way, we have $P(A,B) = P(A) * P(B | A)$.\n",
    "\n",
    "In the sequential way, we can also establish our belief for $B$ first, and then $A$ given $B$. Then we can obtain $P(A,B) = P(B) * P(A | B)$. In other words, **the product rule works for any order of random variables**.\n",
    "\n",
    "We can further extend the product rule from two propositions to any number of propositions. Given a list of propositions $A_1, A_2, \\dots, A_n$, we have\n",
    "\n",
    "$$\n",
    "P(A_1, A_2, \\dots, A_n) = P(A_1) * P(A_2 | A_1) * \\dots * P(A_n | A_1, A_2, \\dots, A_{n-1}).\n",
    "$$\n",
    "\n",
    "This is the famous **[chain rule](https://en.wikipedia.org/wiki/Chain_rule_(probability))** for probability.\n",
    "\n",
    "### Sum Rule\n",
    "\n",
    "Given two discrete random variables $X$ and $Y$, with their domains as $\\Omega(X)$ and $\\Omega(Y)$, then for each $x \\in \\Omega(X)$, we have\n",
    "\n",
    "$$\n",
    "P(X = x) = \\sum_{y \\in \\Omega(Y)} P(X = x, Y = y).\n",
    "$$\n",
    "\n",
    "In other words, if we jointly consider a random variable $X$ with another random variable $Y$, then the *marginal* probability $P(X = x)$ is equivalent to the sum of its joint probabilities over all the possible $Y$ values.\n",
    "\n",
    "For example, if we consider tomorrow's weather $X_{t+1}$ jointly with today's weather $X_t$, where there are three possible weathers $\\{sunny, rainy, cloudy\\}$, then we have \n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "P(X_{t+1} = sunny) & = P(X_{t+1} = sunny, X_t = sunny) \\\\\n",
    "& + P(X_{t+1} = sunny, X_t = rainy) \\\\\n",
    "& + P(X_{t+1} = sunny, X_t = cloudy)\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "P(X_{t+1} = rainy) & = P(X_{t+1} = rainy, X_t = sunny) \\\\\n",
    "& + P(X_{t+1} = rainy, X_t = rainy) \\\\\n",
    "& + P(X_{t+1} = rainy, X_t = cloudy)\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "P(X_{t+1} = cloudy) & = P(X_{t+1} = cloudy, X_t = sunny) \\\\\n",
    "& + P(X_{t+1} = cloudy, X_t = rainy) \\\\\n",
    "& + P(X_{t+1} = cloudy, X_t = cloudy)\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "We can further extend the sum rule from two random variables to any number of variables. Given random variables $X_1, X_2, \\dots, X_m$ and $Y_1, Y_2, \\dots, Y_n$, then for each $x_1 \\in \\Omega(X_1), x_2 \\in \\Omega(X_2), \\dots x_m \\in \\Omega(X_m)$, we have\n",
    "\n",
    "$$\n",
    "P(X_1 = x_1, \\dots, X_m = x_m) = \\sum_{y_i \\in \\Omega(Y_i), i = 1, \\dots, n} P(X_1 = x_1, \\dots, X_m = x_m, Y_1 = y_1, \\dots, Y_n = y_n).\n",
    "$$\n",
    "\n",
    "### Normalisation Rule\n",
    "\n",
    "Given a discrete random variable $X$ with its domain $\\Omega(X)$, we have\n",
    "\n",
    "$$\n",
    "\\sum_{x = \\Omega(X)} P(X) = 1.\n",
    "$$\n",
    "\n",
    "It is obvious that the random variable must take one of the values in its domain. Therefore, the probabilities of all the possible values should add up to 1 (100%).\n",
    "\n",
    "The normalisation rule holds under any condition as well. Given any conditional proposition $A$, we have\n",
    "\n",
    "$$\n",
    "\\sum_{x = \\Omega(X)} P(X | A) = 1.\n",
    "$$\n",
    "\n",
    "In the above weather forecast example, the normalisation rule without condition can be written as:\n",
    "\n",
    "$$\n",
    "P(X_{t+1} = sunny) + P(X_{t+1} = rainy) + P(X_{t+1} = cloudy) = 1,\n",
    "$$\n",
    "\n",
    "Similarly, the normalisation rule under the condition $X_t = sunny$ can be written as:\n",
    "\n",
    "$$\n",
    "P(X_{t+1} = sunny | X_t = sunny) + P(X_{t+1} = rainy | X_t = sunny) + P(X_{t+1} = cloudy | X_t = sunny) = 1,\n",
    "$$\n",
    "\n",
    "---------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Independence <a name=\"independence\"></a>\n",
    "\n",
    "Given two random variables $X$ and $Y$, if for any possible value $x \\in \\Omega(X)$, the probability $P(X = x)$ does not depend on the $Y$ value, then we call that $X$ and $Y$ are **independent** of each other, denoted as $X \\perp Y$.\n",
    "\n",
    "If two random variables $X$ and $Y$ are independent, then for each $x \\in \\Omega(X)$ and $y \\in \\Omega(Y)$, we have\n",
    "\n",
    "$$\n",
    "P(X = x | Y = y) = P(X = x).\n",
    "$$\n",
    "\n",
    "This is obvious by definition, since our belief on $P(X = x)$  does not change at all no matter whether we have information on $Y$ or not.\n",
    "\n",
    "Similarly, we have \n",
    "\n",
    "$$\n",
    "P(Y = y | X = x) = P(Y = y).\n",
    "$$\n",
    "\n",
    "Further, combined with the product rule, we have\n",
    "\n",
    "$$\n",
    "P(X = x, Y = y) = P(X = x) * P(Y = y),\n",
    "$$ \n",
    "\n",
    "For example, if we flip a fair coin twice, where the two outcome `head` or `tail` have the same probability of 0.5, then the outcome of the two flips are independent of each other. We have\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "P(flip_2 = head | flip_1 = head) & = P(flip_2 = head) \\\\\n",
    "& = 0.5,\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "P(flip_1 = head, flip_2 = head) & = P(flip_1 = head) * P(flip_2 = head) \\\\\n",
    "& = 0.5 * 0.5 = 0.25.\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "For more variables, we say that $X_1, \\dots, X_n$ are independent with each other, if \n",
    "\n",
    "$$\n",
    "P(X_1, \\dots, X_n) = P(X_1) * \\dots * P(X_n).\n",
    "$$\n",
    "\n",
    "### Conditional Independence\n",
    "\n",
    "The independence can also be conditional. Given random variables $X$, $Y$ and $Z$, we say that $X$ and $Y$ are **conditionally independent** given $Z$, if for each $x \\in \\Omega(X)$, $y \\in \\Omega(Y)$ and $z \\in \\Omega(Z)$, we have\n",
    "\n",
    "$$\n",
    "P(X = x, Y = y | Z = z) = P(X = x | Z = z) * P(Y = y | Z = z).\n",
    "$$\n",
    "\n",
    "> We can see that the formula of conditional independence is obtained by adding the condition (e.g., $| Z = z)$ into each probability of the unconditional formula.\n",
    "\n",
    "For more variables, we say that $X_1, \\dots, X_n$ are independent given $Y_1, \\dots, Y_m$, if \n",
    "\n",
    "$$\n",
    "P(X_1, \\dots, X_n | Y_1, \\dots, Y_m) = P(X_1 | Y_1, \\dots, Y_m) * \\dots * P(X_n | Y_1, \\dots, Y_m).\n",
    "$$\n",
    "\n",
    "--------------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Case Study on Weather Forecast <a name='case-study'></a>\n",
    "\n",
    "In this case study, we aim to forecast the weather in the future. The weather is denoted as random variable $X$, and its domain (possible weathers) is $\\{Sunny, Rainy\\}$.\n",
    "\n",
    "For the weather tomorrow, denoted as $X_t$, the forecast says that it will be sunny with probability of 20%, and rainy with probability of 80% (obviously, they sum up to 100% due to normalisation rule). That is,\n",
    "\n",
    "$$\n",
    "P(X_t = sunny) = 0.2,\\\\\n",
    "P(X_t = rainy) = 0.8.\n",
    "$$\n",
    "\n",
    "We also know that the weather tends to change smoothly over time. Thus, the weather for the day after tomorrow, denoted as $X_{t+1}$, depends on the weather tomorrow as follows.\n",
    "\n",
    "- If tomorrow is sunny, then the day after tomorrow will be sunny with probability of 60%, and rainy with probability of 40%.\n",
    "- If tomorrow is raining, then the day after tomorrow will be sunny with probability of 30%, and rainy with probability of 70%.\n",
    "\n",
    "In other words, we have\n",
    "\n",
    "$$\n",
    "P(X_{t+1} = sunny | X_t = sunny) = 0.6, \\\\\n",
    "P(X_{t+1} = rainy | X_t = sunny) = 0.4,\n",
    "$$\n",
    "\n",
    "$$\n",
    "P(X_{t+1} = sunny | X_t = rainy) = 0.3, \\\\\n",
    "P(X_{t+1} = rainy | X_t = rainy) = 0.7.\n",
    "$$\n",
    "\n",
    "### QUESTION\n",
    "\n",
    "What is the probability that the day after tomorrow will be sunny?\n",
    "\n",
    "### Answer\n",
    "\n",
    "The probability that the day after tomorrow will be sunny can be written as $P(X_{t+1} = sunny)$.\n",
    "\n",
    "First, we can use the **sum rule** to take the weather tomorrow into consideration as follows:\n",
    "\n",
    "$$\n",
    "P(X_{t+1} = sunny) = P(X_{t+1} = sunny, X_t = sunny) + P(X_{t+1} = sunny, X_t = rainy).\n",
    "$$\n",
    "\n",
    "Then, for each term on the right hand side, we can use the **product rule** that considers $X_t$ first, and then $X_{t+1}$ given $X_t$. Specifically, we have\n",
    "\n",
    "$$\n",
    "P(X_{t+1} = sunny, X_t = sunny) = P(X_t = sunny) * P(X_{t+1} = sunny | P(X_t = sunny), \\\\\n",
    "P(X_{t+1} = sunny, X_t = rainy) = P(X_t = rainy) * P(X_{t+1} = sunny | P(X_t = rainy).\n",
    "$$\n",
    "\n",
    "All the above probabilities on the right hand side are known from the forecast. Therefore, we have\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "P(X_{t+1} = sunny, X_t = sunny) & = P(X_t = sunny) * P(X_{t+1} = sunny | P(X_t = sunny) \\\\\n",
    "& = 0.2 \\times 0.6 = 0.12,\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "P(X_{t+1} = sunny, X_t = rainy) & = P(X_t = rainy) * P(X_{t+1} = sunny | P(X_t = rainy) \\\\\n",
    "& = 0.8 \\times 0.3 = 0.24.\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "Therefore, we have \n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "P(X_{t+1} = sunny) & = P(X_{t+1} = sunny, X_t = sunny) + P(X_{t+1} = sunny, X_t = rainy) \\\\\n",
    "& = 0.12 + 0.24 = 0.36.\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "--------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The original Juypter Notebook can be downloaded [here](https://homepages.ecs.vuw.ac.nz/~yimei/tutorials/reasoning-under-uncertainty-basics.ipynb).\n",
    "\n",
    "More tutorials can be found [here](https://meiyi1986.github.io/tutorials/).\n",
    "\n",
    "[Yi Mei's homepage](https://meiyi1986.github.io/)"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "authorship_tag": "ABX9TyOLYiX4MNVN+YKTRqYwBPrZ",
   "collapsed_sections": [],
   "name": "probability_product_rule.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}