4. Continuous distributions: An introduction and simulation

4.1 Introduction of continuous distributions

4.1.1 Probability density function (pdf)

Intuition for probability density function

Let us choose a number, say \(X,\) and \(0 \le X \le 10\). Assuming that we do not have any preference for any particular number, we are dealing with discrete uniform distribution \(X \sim U[0, 10]\) and \(P(X=i)=\frac{1}{11}\) for \(i=0,1,2,…,10.\)

This solution is under the assumption that the probability that \(X\) is any other non-integer number \(x\), \(x \in [0,10]\) is zero. Here we make a distinction between the discrete random number \(X\) and the continuous random variable \(Y\), which can assume any value in \([0, 10]\). We can write this implicit assumption as \(P(X=x)=0\) if \(x\) is not one of \(\{0,1,2,…,10\}.\)

What if \(0 \le Y \le 1?\) Since we assume that we are not favoring any particular number, then we should insist that the probability is the same for each number. In other words, the probability that the selected random number \(Y\) is any particular number \(y \in [0,1]\) should be some constant value, say \(c\), i.e., \(P(Y = y) = c\). But, now we run into trouble due to the fact that there are an infinite number of possibilities. If each possibility has the same probability \(c\) and the probabilities must add up to 1 and there are an infinite number of possibilities, what could the individual probability \(c\) possibly be? If \(c\) were any finite number greater than zero, once we add up an infinite number of the \(c\)’s, we must get to infinity, which is definitely larger than the required sum of 1. In order to prevent the sum from blowing up to infinity, we must have \(c\) be extremely small, i.e., we must insist that \(c = 0\), i.e., \(P(Y = y) =0\) for any real number \(y.\)

But we know that the total probability must add up to one. We know that \(Y\) is somewhere in that interval with probability one, and the probability that \(Y\) is outside that interval is zero.

The probability density

It turns out, for the case where we allow \(y\) to be any real number, we need to think about the probability that \(Y\) is close to a single number \(y\) not equal to a single number. We capture the notion of being close to a number with a probability density function which is often denoted by \(f(x)\). If the probability density around a point \(y\) is large, that means the random variable \(Y\) is likely to be close to \(y\). If, on the other hand, \(f(x) = 0\) in some interval, then \(Y\) won’t be in that interval.

To translate the probability density \(f(y)\) into a probability, imagine that \(I_y\) is some small interval around the point \(y\). Then, assuming \(f(y)\) is continuous, the probability that \(Y\) is in that interval will depend both on the density \(f(y)\) and the length of the interval: \(P(Y \in I_y) \approx f(y) \times\) length of \(I_y.\)

We don’t have a true equality here, because the density \(f(y)\) may vary over the interval \(I_y\). But, the approximation becomes better and better as the interval \(I_y\) shrinks around the point \(y\), as \(f(y)\) will be come closer and closer to a constant inside that small interval. The probability \(P(Y \in I_y)\)) approaches zero as \(I_y\) shrinks down to the point \(y\) (consistent with our above result for single numbers), but the information about \(I_y\) is contained in the rate that this probability goes to zero as \(I_y\) shrinks. In general, to determine the probability that \(Y\) is in any subset \(A\) of the real numbers, we simply add up the values of \(f(y)\) in the subset. By “add up,” we mean integrate the function \(f(y)\) over the set \(A\), i.e., is

\[P(Y \in A) = \int_A f(x)dx.\]

This integral represents the area under the graph of \(f(y)\) over the set \(A\), e.g., if \(A = (a, b)\) then the \(P(Y \in (a, b))\) is equal to the shaded area.

Probability as an Area
Probability as an Area

We will not be solving integrals in SCIE201, but understand them as an appropriate replacement for summation.

So, to summarise

  1. For continuous random variable \(Y\) the probability density function \(f(y)\) mimics the mpdf \(p(x)\) for discrete random variable \(X\), with the following major difference

    • \(p(x)\) is equal to the probability assigned to a particular value (a value at a point) of \(X\), i.e., \[P(X = x) = p(x)\]

    • \(f(y)\) is approximately equal to the probability the random variable \(Y\) assumes a value within a very small interval around the point \(y\) (a value within \(I_y\)), i.e., \[ P(Y \in I_y) \approx f(y) \times \mbox{length} \hspace{0.5ex} \mbox{of} \hspace{0.5ex} I_y.\]

4.1.2 Cumulative distribution function (cdf)

The definition of the cumulative distribution function \(F(x)\) is the same as for the discrete case \[F(y) = P(Y \le y).\] In other words, this is the probability that the random variable \(Y\) will assume value less or equal to \(y\), i.e., \(Y\) will be within the open interval \((-\infty, y)\). Therefore, using the previous idea of a continuous random variable belonging to a given interval, we get

\[F(y) = P(Y \le y) = \int_{-\infty}^{y} f(x) dx,\] with pictorial representation for the pdf and the cdf of \(N(0,4)\) distribution.

x <- seq(-4, 4, length=100)
hx <- dnorm(x, 0, 2)
px <- pnorm(x, 0, 2)

plot(x, hx, type="l", lty=1, col ="red", xlim=c(-4,4), ylim=c(0,1), xlab="x value", ylab="probability func", main="Normal pdf(red) and cdf(blue)")
lines(x, px, type="l", lty=1, col ="blue")

4.2 Expectation and Variance of a continuous random variable (optional)

4.2.1 Expected value of \(Y\)

The expected value of \(Y\) (or the mean of \(Y\)) with possible values \(a \le y \le b\) is denoted by \(E(Y)\) and

\[E(Y) = \int_{a}^{b} y f(y) dy .\]

The expected value of a continuous random variable \(Y\) also can be seen as a probability-weighted average of all of its possible values \(y\), say \(a \le y \le b\) of \(Y\). In other words, each possible value the random variable can assume \(y\) is multiplied by its approximate probability of occurring \(f(y)\times dy\), where \(dy\) is the length of interval \(I_y\) around \(y\), and the resulting products are summed (represented by the integral) to produce the expected value.

4.2.2 Variance of \(Y\)

The variance of a continuous random variable is a measure of the spread for a distribution of a random variable. It determines the degree to which the values of a random variable differ from the expected value. The square root of the variance is equal to the standard deviation.

Taking the mean as the ‘center’ of a random variable’s probability distribution, the variance is a measure of how much the probability mass is spread out around this center. So,

\[Var(Y) = E((Y − EY)^2),\]

and the standard deviation \(σ\) of \(Y\) is defined by \[ σ = \sqrt{Var(Y)}.\] If the relevant random variable is clear from context, then the variance and standard deviation are often denoted by \(σ^2\) and \(σ\) (‘sigma’), just as the mean is \(μ\) (‘mu’). \[Var(Y) = E((Y − μ)^2) = \int_{a}^{b} (y − μ)^2 f(y)dy.\]

In words, the formula for \(Var(Y)\) says to take a weighted average of the squared distance to the mean. By squaring, we make sure we are averaging only non-negative values, so that the spread to the right of the mean won’t cancel that to the left. By using expectation, we are weighting high probability values more than low probability values.

4.3 Continuous Uniform distribution \(U(a,b)\)

The simplest continuous distribution is the uniform continuous distribution, also known as the “equally likely outcomes” distribution.

The pdf of \(U(a,b)\) A random variable \(Y\) with pdf f(y) given by:

\[\begin{align*} \mbox{} \hspace{2ex} f(y) &= \left\lbrace \begin{array}{lll} & \frac{1}{b-a} \hspace{1ex}&\mbox{if}\hspace{1ex} a \le y \le b \\ & 0 \hspace{1ex}&\mbox{otherwise,} \\ \end{array} \right. \end{align*}\]

where \(a \le b\) are constants, is said to be uniformly distributed \(U(a,b)\), i.e., \(Y \sim U(a,b)\).

Example: The density of \(U(1,3)\)

curve(dunif(x, 1, 3), 0, 5, ylim = c(0,0.7), xlab ="y", ylab ="f(y)", main="PDF for U(1,3)")

The cdf of \(U(a,b)\)

\[\begin{align*} \mbox{} \hspace{2ex} F(y) &= \left\lbrace \begin{array}{lll} & 0 \hspace{1ex}&\mbox{if}\hspace{1ex} y < 0 \\ & \frac{y-a}{b-a} \hspace{1ex}&\mbox{if}\hspace{1ex} a \le y < b \\ & 1 \hspace{1ex}&\mbox{if}\hspace{1ex} y \ge b \\ \end{array} \right. \end{align*}\]
curve(punif(x, 1, 3), 0, 5, ylim = c(0,1.1), xlab ="y", ylab ="F(y)", main=" CDF for  U(1,3)")

The expectation of \(Y \sim U(a,b)\) is

\[E(Y) = \frac{a+b}{2}\]

and the variance is

\[ V(X) = \frac{(b-a)^2}{12}.\]

Simulating an observation from \(U(a,b)\) for \(a = 1\) and \(b = 3\).

# pdf
#    dunif(x, min = 0, max = 1, log = FALSE)
# cumulative distribution (cdf)
#    punif(x, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
# quantiles
#    qunif(q, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
# random generation:
#    runif(n, min = 0, max = 1)


# mpdf - the value of f(1.5) 
dunif(1.5, min=1, max=3)
## [1] 0.5
# cumulative distribution F(1.5) at 1.5
punif(1.5, min=1, max=3)
## [1] 0.25
# quantiles 50% (the median)
qunif(0.5, min=1, max=3) 
## [1] 2
# one random observation
runif(1,1,3)
## [1] 1.948901

4.4 The exponential distribution \(Y \sim exp(\lambda)\)

Usually the exponential distribution is use to model waiting times. We will use it in modeling and simulating queues.

The pdf of \(exp(\lambda)\) A random variable \(Y\) with pdf f(y) given by:

\[\begin{align*} \mbox{} \hspace{2ex} f(y) &= \left\lbrace \begin{array}{lll} & \lambda e^{-\lambda y} \hspace{1ex}&\mbox{if}\hspace{1ex} y \ge 0\\ & 0 \hspace{1ex}&\mbox{otherwise,} \\ \end{array} \right. \end{align*}\]

where \(\lambda \ge 0\) is a constant called the parameter of the exponential distribution, is said to be exponentially distributed \(Y \sim exp(\lambda)\).

Example: The density of \(exp(2)\)

curve(dexp(x, 2), 0, 2.5, ylim = c(0,2.1), xlab ="y", ylab ="f(y)", main="PDF for exp(2)")

The cdf of \(exp(\lambda)\)

\[\begin{align*} \mbox{} \hspace{2ex} F(y) &= \left\lbrace \begin{array}{lll} & 0 \hspace{1ex}&\mbox{if}\hspace{1ex} y < 0 \\ & 1 - e^{-\lambda y} \hspace{1ex}&\mbox{if}\hspace{1ex} y \ge 0 \\ \end{array} \right. \end{align*}\]
curve(pexp(x, 2), 0, 2.5, ylim = c(0,1.1), xlab ="y", ylab ="F(y)", main=" CDF for  exp(2)")

The expectation of \(Y \sim exp(\lambda)\) is

\[E(Y) = \frac{1}{\lambda}\]

and the variance is

\[ V(X) = \frac{1}{\lambda^2}.\]

Simulating an observation from \(exp(2)\).

# mpdf
#    dexp(x, rate = 1, log = FALSE)
# cumulative distribution (cdf)
#    pexp(x, rate = 1, lower.tail = TRUE, log.p = FALSE)
# quantiles
#    qexp(q, rate = 1, lower.tail = TRUE, log.p = FALSE)
# random generation:
#    rexp(n, rate = 1)


# pdf - the value of f(1.5) 
dexp(1.5, 2)
## [1] 0.09957414
# cumulative distribution F(1.5) at 1.5
pexp(1.5, 2)
## [1] 0.9502129
# quantiles 50% (the median)
qexp(0.5, 2) 
## [1] 0.3465736
# one random observation
rexp(1,2)
## [1] 1.462556

Special property of the exponential distribution

The exponential distribution is the only continuous distribution that has the so called “memoryless” (“forgetfulness”) property. It means the following:

If \(Y \sim exp(\lambda),\) then

\[P(Y > s + t | Y > t) = P(Y > s), \] which says that, if, for example, we think of \(Y\) as our waiting time at a bus stop until the arrival of the next bus, it does not matter how long have we been already waiting, the remaining waiting time \(Y_r\) until the bus arrival has the same distribution \(Y_r \sim exp(\lambda).\)

Erlang Distribution (based on the exponential distribution)

Definition A random variable \(X\) is said to have an Erlang distribution with parameters (k, \(\lambda\)) if it can be presented as a sum of k iid \(exp(\lambda)\) random variables, i.e., \(X \sim erlang(k, \lambda).\) Therefore

\[E(X) = \frac{k}{\lambda} \hspace{5ex} \mbox{and} \hspace{5ex} V(X) = \frac{k}{\lambda^2}.\]

Next we simulate from \(erlang(k, \lambda)\) distribution.

erlang <-function(k, lam){
  components <- rexp(k, lam)
  return (sum(components))
}

erlang(3, 2)
## [1] 1.511003

Check the mean and the variance of \(erlang(k, \lambda)\) distribution.

erlang_mean <- function(k,lam, n){
      L<-c()
      for (i in 1:n){
        L<-c(L,erlang(k,lam))}
    return (mean(L))
      }

erlang_mean(3,2,1000)
## [1] 1.492368

Exercise: Modify the code to get the variance of \(erlang(k, lam)\).

4.4 The normal distribution \(Y \sim N(\mu, \sigma^2)\)

The most popular distribution due to the Central limit theorem. Roughly, the central limit theorem states that the distribution of the sum (or average) of a large number of independent, identically distributed random variables will be approximately normal, regardless of the underlying distribution. The importance of the central limit theorem is hard to overstate; indeed it is the reason that many statistical procedures work.

The pdf of \(N(\mu, \sigma^2)\)

A random variable \(Y\) with pdf \(f(y\;|\;\mu,\sigma^2)\) given by:

\[{\displaystyle f(y\;|\;\mu ,\sigma ^{2})={\frac {1}{\sqrt {2\pi \sigma ^{2}}}}\;e^{-{\frac {(y-\mu )^{2}}{2\sigma ^{2}}}}}\]

where \(-\infty < \mu < \infty\) and \(\sigma^2 \ge 0\) are constants called the parameters of the normal distribution, is said to be normally distributed \(Y \sim N(\mu, \sigma^2)\).

Example: The density of \(N(0,1)\)

curve(dnorm(x, 0, 1), -3, 3, ylim = c(0,0.5), xlab ="y", ylab ="f(y)", main="PDF for N(0,1)")

The cdf of \(N(\mu, \sigma^2)\)

The cdf of \(N(\mu, \sigma^2)\) distributed random variable is hard to compute, because of the form of its pdf function. So, we need computers (in the past tables with tabulated values of \(N(0,1)\)) to compute required values of the cumulative distribution function \(F(y\;|\;\mu,\sigma^2).\)

curve(pnorm(x, 0, 1), -3, 4, ylim = c(0,1.1), xlab ="y", ylab ="F(y | µ, σ2)", main=" CDF for  N(0,1)")

The expectation of \(Y \sim N(\mu, \sigma^2)\) is

\[E(Y) = \mu\]

and the variance is

\[ V(X) = \sigma^2.\]

Simulating an observation from \(Y \sim N(\mu, \sigma^2)\) for \(\mu = 0\) and \(\sigma^2 = 1\).

# pdf
#    dnorm(x, mean = 0, sd = 1, log = FALSE)
# cumulative distribution (cdf)
#    pnorm(x, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
# quantiles
#    qnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
# random generation:
#    rnorm(n, mean = 0, sd = 1)

# pdf - the value of f(1.5) 
dnorm(1.5, 0, 1)
## [1] 0.1295176
# cumulative distribution F(1.5) at 1.5
pnorm(1.5, 0, 1)
## [1] 0.9331928
# quantiles 50% (the median)
qnorm(0.5, 0, 1) 
## [1] 0
# one random observation
rnorm(1, 0, 1)
## [1] -0.9425147

Chi-squared distribution with k degrees of freedom (based on the normal distribution)

Definition A random variable \(X\) is said to have a chi-squared distribution with k degrees of freedom if it can be presented as a sum of the squares of k iid \(N(0,1)\) random variables, i.e., \(X \sim \chi _{k}^{2}.\) Therefore

\[E(X) = k \hspace{5ex} \mbox{and} \hspace{5ex} V(X) = 2k.\]

Next we simulate from \(\chi_k^2\) distribution.

chi <-function(k){
  components <- rnorm(k)**2
  return (sum(components))
}

chi(3)
## [1] 1.103959

Check the mean and the variance of \(\chi_k^2\) distribution.

chi_mean <- function(k, n){
      L<-c()
      for (i in 1:n){
        L<-c(L,chi(k))}
    return (mean(L))
      }

chi_mean(3,1000)
## [1] 2.889024

Modify the code to get the variance of \(\chi_k^2.\)

Other continuous distributions

Gamma distribution \(Gamma(\alpha, \beta)\)

dgamma(x, shape, rate = 1, scale = 1/rate, log = FALSE)

pgamma(x, shape, rate = 1, scale = 1/rate, lower.tail = TRUE, log.p = FALSE)

qgamma(q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE, log.p = FALSE)

rgamma(n, shape, rate = 1, scale = 1/rate)

If scale is omitted, it assumes the default value of 1. Also, shape and scale parameters must be positive, the scale strictly positive. For the Gamma distribution with parameters shape = \(\alpha\) and scale = \(\beta\) has

\[E(Y) = \alpha \beta \hspace{1ex} \mbox{and} \hspace{1ex} V(Y) = \alpha\beta^2.\]

pgamma(1.8, 2.7, rate=5.0, lower.tail = TRUE,
       log.p = FALSE)
## [1] 0.9959725
pgamma(0.8, 2.7, rate=5.0, lower.tail = TRUE,
       log.p = FALSE)
## [1] 0.8128771
curve(pgamma(x, 2.7,rate=5), 0, 2, ylim = c(0,1.1), xlab ="y", ylab ="F(2.7,5)", main=" CDF for  Gamma(2.7,5) ")

Weibull Distribution \(Weibull(\alpha, \beta)\)

dweibull(x, shape, scale = 1, log = FALSE)

pweibull(x, shape, scale = 1, lower.tail = TRUE, log.p = FALSE)

qweibull(q, shape, scale = 1, lower.tail = TRUE, log.p = FALSE)

rweibull(n, shape, scale = 1)

As usual dweibull gives the density, pweibull gives the distribution function, qweibull gives the quantile function, and rweibull generates random observations.

The Weibull distribution with shape parameter \(\alpha\) and scale parameter \(\beta\) has

\[E(X) =\beta \Gamma(1 + 1/\alpha) \hspace{1ex} \mbox{and} \hspace{1ex} V(X) = \beta^2 (Γ(1 + 2/\alpha) - (Γ(1 + 1/\alpha))^2),\]

where gamma(x) returns the value of the gamma function Γ(x) above

gamma(5)
## [1] 24