Chapter 13 Variance Estimation in Complex Sample Designs

\(\DeclareMathOperator*{\argmin}{argmin}\) \(\newcommand{\var}{\mathrm{Var}}\) \(\newcommand{\bfa}[2]{{\rm\bf #1}[#2]}\) \(\newcommand{\rma}[2]{{\rm #1}[#2]}\) \(\newcommand{\estm}{\widehat}\)

In this course we look at a number of straightforward sample designs (e.g. SRSWOR, Stratified SRSWOR, 1- and 2-stage Cluster sampling). These designs have well defined means of forming estimates and variance formulae. For example in STSRS estimates of a total and its variance are given by: \[\begin{eqnarray*} \widehat{Y} &=& \sum_h N_h \frac{1}{n}\sum_{k\in s_h}y_{hk}\\ \bfa{\widehat{Var}}{\widehat{Y}} &=& \sum_h N_h^2\left(1-\frac{n_h}{N_h}\right)\frac{s_h^2}{n_h} \end{eqnarray*}\] These formulae assume that the sample weights of each unit within a stratum are the same: i.e. \(\frac{N_h}{n_h}\). However where a post-stratification or nonresponse weighting class adjustment is made the weights will differ between units after the adjustment. In such circumstances the analytical formulae become at best merely approximations to the actual variances, or they may fail altogether.

Taylor Series Linearisation seeks analytical approximations to variance formulae, and this is the approach used when calculating expressions for the variance of the ratio estimator.

An alternative which is feasible is to use numerical resampling techniques to estimate the variance. There are various different ways this is done, the most common being the jacknife and bootstrap methods. These methods, as well as the Random Group method and Balanced Repeated Replication (BRR) share the property that \(R\) subsets or replicates of the sample are taken. For each subset a separate estimate \(\widehat{T}_{(r)}\) is calculated, and the variability of these estimates is used to calculate the variance of the estimator \(\widehat{T}\) without the need for analytical formulae.

Example

We’ll take the following data as an example: \(n=5\) observations of \(y_k\) the number of times each person exercises per week in a simple random sample from a population of \(N=500\) people. The sample weight of each person is \(N/n=100\).

Unit	Sex	\(y_k\)	Weight, \(w_k\)	\(w_k^\ast\)
1	Male	2	100	83.33
2	Male	0	100	83.33
3	Male	3	100	83.33
4	Female	5	100	125.00
5	Female	9	100	125.00
Total			500	500.00

Also assume that we know that there are 500 males and 500 females in this population. The post-stratified version of the estimate requires us to adjust the weights so that they add up correctly to the population benchmarks. The adjusted weights are given above as \(w_k^\ast\).

We want to estimate the mean number of times per week that people in this population exercise \(\bar{Y}\).

The theoretical result for SRSWOR (without poststratification) is: \[\begin{eqnarray*} \widehat{\bar{Y}} &=& \frac{\sum_k w_ky_k}{\sum_k w_k} = \bar{y} = 3.8\\ \bfa{\widehat{Var}}{\widehat{\bar{Y}}} &=& \left(1-\frac{n}{N}\right) \frac{s_y^2}{n}\\ &\simeq& \frac{s_y^2}{n} = \frac{3.4^2}{5} = 2.34 \end{eqnarray*}\] If we use the poststratified weights then the estimate is: \[\begin{eqnarray*} \widehat{\bar{Y}}_{\text{post}} &=& \frac{\sum_k w_k^\ast y_k}{\sum_k w_k^\ast} = 4.3 \end{eqnarray*}\] however there is no exact variance formula for this estimate since the poststrata (sex) do not correspond to selection strata (there were no selection strata in this case: it was just a SRSWOR of the whole population). Approximate analytic expressions do exist for SRSWOR, however for complex designs, especially those involving clustering, there is no formula available. We will use numerical methods to estimate the variance in this simple example case to demonstrate the procedure which can be applied in those complex cases.

13.1 Jackknife

When applied to SRSWOR the Jackknife method requires us to drop out a unit from the sample and recalculate the estimate from the reduced sample (adjusting the weights as necessary to match population benchmarks). Since there are \(n\) sample members there are \(n\) possible Jackknife replicate samples.

Ignoring the weights for the moment, there are 5 possible samples in our example:

Deleted Unit \(r\)	\(y_k\) values of remaining units				Mean \(\bar{y}_{r}\)
1	0	3	5	9	4.25
2	2	3	5	9	4.75
3	2	0	5	9	4.00
4	2	0	3	9	3.50
5	2	0	3	5	2.50

Each replicate mean \(\bar{y}_{(r)}\) is an estimate of the population mean \(\bar{Y}\). We can calculate the overall mean of these replicate means: \[ \widetilde{\bar{Y}}_{\text{JK}} = \frac{1}{n}\sum_{r=1}^n \widehat{\bar{Y}}_{(r)} = \frac{1}{n}\sum_{r=1}^n \bar{y}_{(r)} = 3.8 \] In this simple case \(\widetilde{\bar{Y}}\) is identical to \(\widehat{\bar{Y}}=\bar{y}\).

Our variance estimate comes from the variability of the \(n\) replicate means: \[ \bfa{\widehat{Var}}{\widehat{\bar{Y}}_{\text{JK}}} = \frac{n-1}{n}\sum_{r=1}^n \left(\widehat{\bar{Y}}_{(r)}-\widetilde{\bar{Y}} \right)^2 = \frac{n-1}{n}\sum_{r=1}^n \left(\bar{y}_{(r)}-\widetilde{\bar{Y}} \right)^2 = 2.34 = (1.53)^2 \] (Note that this is not just the variance of the replicate means, but rather a scaled up version of it: there is an extra factor of \(n-1\).) Again this is identical to the theoretical result in this simple case.

However we see a difference when we poststratify each of the replicates. Dropping out a unit is equivalent to setting its weight to zero, and we adjust the weights within the poststrata (sex) so that the weights correctly add up to the benchmarks. Each replicate sample thus has its own set of replicate weights.

					Replicates
Unit	Sex	\(y_k\)	\(w_k\)	\(w_k^\ast\)	\(w_k^{(1)}\)	\(w_k^{(2)}\)	\(w_k^{(3)}\)	\(w_k^{(4)}\)	\(w_k^{(5)}\)
1	Male	2	100	83.33	0.00	125.00	125	83.33	83.33
2	Male	0	100	83.33	125.00	0.00	125	83.33	83.33
3	Male	3	100	83.33	125.00	125.00	0	83.33	83.33
4	Female	5	100	125	125.00	125.00	125	0.00	250.00
5	Female	9	100	125	125.00	125.00	125	250.00	0.00
				Estimates: \(\widehat{\bar{Y}}_{(r)}\)	4.25	4.75	4	5.33	3.33

We compute each replicate estimate \(\widehat{\bar{Y}}_{(r)}\) using its own weights: \[\begin{eqnarray*} \widehat{\bar{Y}}_{(r)} &=& \frac{\sum_k w_k^{(r)} y_k}{\sum_k w_k^{(r)}} \end{eqnarray*}\] then can calculate the overall mean of these replicate means: \[ \widetilde{\bar{Y}}_{\text{JK,post}} = \frac{1}{n}\sum_{r=1}^n \widehat{\bar{Y}}_{(r)} = 4.3 \] with variance estimate comes from the variability of the \(n\) replicate means: \[ \bfa{\widehat{Var}}{\widehat{\bar{Y}}_{\text{JK,post}}} = \frac{n-1}{n}\sum_{r=1}^n \left(\widehat{\bar{Y}}_{(r)}-\widetilde{\bar{Y}}_{\text{JK,post}} \right)^2 = 1.83 = (1.35)^2 \] The variance is reduced here because males and females (in this population) are different in the frequency they exercise. Post-stratification controls for and removes this source of variability in the final estimate.

Note the Jackknife variance formula lacks any finite population correction, (i.e. any terms like \((1-n/N)\)) and moreover it effectively treats the sample design as having been with replacement. These effects are unimportant when sampling from large populations.

In principle the Jackknife can be applied to any estimator at all – we simply treat each replicate as if it were a real sample, evaluate the estimator for that replicate, and then look at the variance of the resulting set of estimates.

The Jackknife can be used in complex sample designs. In cluster designs we drop out whole PSUs rather than individual units, and in stratified designs the Jackknife can be applied separately to each stratum.

The number of replicates where we delete one sample member clearly increases with the sample size, and hence the computational load also increases with sample size. For very large samples it is sufficient to take a sample of the replicates when calculating the variance. Also note that the Jackknife does not perform well for some statistics (in particular quantiles).

In addition to an estimate of the variance of an estimator, the Jackknife also provides an estimate of its bias: i.e. the rescaled difference between the whole sample estimate \(\widehat{T}\) and the mean of the replicates \(\widetilde{T}\): \[ \bfa{\widehat{Bias}}{\widehat{T}}_{\text{JK}} = (n-1)\left[\frac{1}{n}\sum_{r=1}^n\widehat{T}_{(r)} - \widehat{T}\right] = (n-1)\left[\widetilde{T}_{\text{JK}} - \widehat{T}\right] \]

13.2 Bootstrap

The bootstrap differs from the Jackknife in that it treats the sample as a mini-population, and each replicate is drawn with replacement from the sample. As with the Jackknife, the method of resampling depends on the sample design.

In our example where the original sample was a SRSWOR we also select our replicates by SRS, but with replacement. We draw \(R\) replicate samples where \(R\) is a large number (usually hundreds or thousands), and recalculate the estimate each time.

For example, if we ignore the postratification benchmarks we obtain the following samples each with their estimates:

\(r\)	Observations					\(\bar{y}_{(r)}\)
1	2	5	2	2	2	2.6
2	9	2	2	3	5	4.2
3	3	9	9	5	2	5.6
4	0	3	9	3	5	4
5	3	9	5	0	3	4
6	5	9	5	5	2	5.2
7	0	3	9	9	3	4.8
8	3	0	3	0	2	1.6
9	0	2	9	2	2	3
10	0	2	5	2	2	2.2
\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)

Note that units can get into the same sample multiple times. This is intended.

The overall bootstrap estimate after \(R=1000\) replicates is \[ \widetilde{\bar{Y}}_{\text{BS}} = \frac{1}{R}\sum_{r=1}^R \widehat{\bar{Y}}_{(r)} = \frac{1}{n}\sum_{r=1}^R \bar{y}_{(r)} = 3.8 \] with variance \[ \bfa{\widehat{Var}}{\widehat{\bar{Y}}_{\text{BS}}} = \frac{1}{R-1}\sum_{r=1}^R \left(\widehat{\bar{Y}}_{(r)}-\widetilde{\bar{Y}}_{\text{BS}} \right)^2 = 1.92 \] The bootstrap also provides a bias estimate \[ \bfa{\widehat{Bias}}{\widehat{T}}_{\text{BS}} = \frac{1}{R}\sum_{r=1}^R\widehat{T}_{(r)} - \widehat{T} = \widetilde{T}_{\text{BS}} - \widehat{T} \]

Poststratifying each replicate gives an estimate of the mean of 4.3 and bootstrap variance 0.73. (Although note: for this small population this variance is a poor estimate, since there are many bootstrap replicate samples where there are no males and many where there are no females selected. These have to be deleted from the calculation – since we can’t poststratify them – and this artificially reduces the variances.)

The bootstrap can be even more computationally intensive than the Jackknife, but is even more widely applicable.

13.3 Complex Designs

The jacknife and bootstrap resampling operate always at the first stage of selection. So if we have a cluster sample, the jacknife drops out whole clusters, but we don’t then go on to drop out individual units from those clusters – even if the sample design selects a subsample from each selected cluster. 3 In a stratified design, we would ideally run the jacknife separately in each stratum – i.e. we should separately drop out a single cluster from each stratum leaving all the other strata complete.

However in practice, a single cluster may be dropped from each stratum when constructing a set of replicate weights.

Once the clusters are deleted, all of the other clusters are reweighted to add up to the correct population totals, or benchmarks if poststratification is being done.

It is common that unit record datasets will be provided with a set of 100 or more replicate weight columns which are to be used to calculate the variances of all estimates. These sets of weights are subsets of the full set of possible replicate weights – but are a sufficiently large number to give good variance estimates. They are calculated by the data provider who may be unwilling to release the information about the sample frame which is required for the practitioner to calculate such weights themselves. Moreover, release of a single set of replicate weights means that all users will calculate the same estimates from the dataset.

Sets of replicate weights provided in this way are usually jacknife (rather than bootstrap) replicates, since it is usually the case that a smaller number of jacknife replicates are required to give the same reliability.