Chapter 2 Sample Surveys
A sample survey is an exercise in which data are collected from a sample (a subset) or a population. The data collected are used to create estimates of the characteristics or parameters of the population.
Some examples:
- A sample of voters are telephoned to ask their voting intentions, with the aim of predicting the outcome of the election;
- A sample of sites in a river catchment is tested for the presence of algae to determine the spread of the algae throughout the entire catchment;
- A sample of households is selected, and the residents asked about their employment status, in order to estimate the national unemployment rate.
A census is a special case of a sample survey in which every member of the population is surveyed.
2.1 The Survey Process
The different steps in the survey process are shown in Figure 2.1.

Figure 2.1: The Survey Process
- Every survey has a set of objectives – the particular populations which are of interest;
- These parameters are properties of the target population: the population about which estimates are to be made;
- Not all members of the target population are able to be identified: the ones that can be surveyed make up the survey population;
- The sample frame is a listing of all members of the survey population;
- The sample design is the method of selecting the sample from the frame. The sample size is usually decided as a compromise between the required accuracy of estimates and the survey costs and other constraints (e.g. the time available for the survey);
- survey instrument is the means of data collection. It is usually a paper or electronic questionnaire, completed by the respondent or the interviewer/observer. The instrument aims to measure the properties of the sample members which mean that the desired population characteristic can be estimated. The instrument must be valid (it must actually measure what it intends to measure) and reliable (repeated measurements of the same sample member under identical circumstances should always yield similar results);
- The sample members are contacted and recruited into the sample. Not all selected members will be able to be contacted (even after strenuous efforts), and some will not respond even if contacted;
- The data are collected from the sample members in some mode (e.g. face-to-face, telephone, web, observational, …);
- The data collected are captured (stored on a computer); coded (converted into standard classification systems); edited (checks for data consistency); and stored in a final dataset;
- The data may be adjusted, e.g. imputation or weight adjustments for nonresponse may be done;
- Estimates of the parameters of interest are constructed, and other analysis of the data is carried out (e.g. regression estimation, comparison with other data etc.);
- The results are summarised in a report;
- The original data may be archived in some appropriate form, or destroyed;
- A post-survey evaluation may be made to determine how well the survey met its goals.
For example: The Household Labour Force Survey (HLFS).
- One of the main objectives of the HLFS is to estimate the unemployment rate every quarter;
- The target population is the working age population of New Zealand;
- The survey population is the civilian, non-institutionalised usually resident population of adults aged 15+ living in permanent, private dwellings on the main islands of New Zealand (North Island, South Island and Waiheke Island);
- The sample frame is a list of dwellings created at the most recent census and regularly updated to reflect changes;
- The sample design is a stratified cluster design. Within each local government area a sample of small areas (PSUs) is selected. A sample of households is selected from within each PSU. Every adult from the selected households is surveyed. 15000 households and 30000 adults are surveyed every three months, in order to create unemployment estimates accurate to within \(0.5\)%.
- The survey instrument is an electronic questionnaire.
- The interviewers contact the households in person or by telephone, making up to 10 call backs to ensure contact is made. Proxy responses are permitted (i.e. one household member can respond on behalf of another). The interview takes place by computer assisted personal or telephone interviewing (CAPI or CATI).
- The results are post-stratified to match the current population estimates in each local government area;
- Estimates of the unemployment rate are published within about 6 weeks of the end of the quarter.
The following tables summarise the properties of three very different surveys (following a template by Groves et al., 2004).
2.1.1 Household Labour Force Survey
Title | Household Labour Force Survey (HLFS) |
Country | New Zealand |
Sponsor | Statistics New Zealand |
Collector | Statistics New Zealand |
Purpose | To produce each quarter, a comprehensive range of statistics relating to the employed, the unemployed and those not in the labour force who comprise New Zealand’s working-age population. |
Year started | 1985 |
Target Population | The civilian non-institutionalised usually resident New Zealand population aged 15 and over |
Sample Frame | Dwellings enumerated at the previous census and grouped into areas |
Sample Design | Multistage, stratified clustered area probability sample of primary sampling units (PSUs); sample of dwellings within PSU drawn, all eligible adults within selected households |
Coverage | Excludes households on offshore islands |
Sample Size | 15000 households and 30,000 adults |
Use of Interviewer | Interviewer administered |
Mode of Administration | Face-to-Face (first) and telephone (subsequent) interviews for each household and each person |
Computer Assistance | Computer assisted personal interview (CAPI) or telephone interview (CATI) |
Selection Unit | Household |
Reporting Unit | Household, person |
Time Dimension | Ongoing rotating panel survey of dwellings |
Frequency | Conducted quarterly |
Interviews per Round of Survey | Each household is surveyed every three months over two years (8 times in all) |
Levels of Observation | Household, person |
Response Rate | Usually 90% |
Web link | http://www.stats.govt.nz/ |
Source | http://www.stats.govt.nz/datasets/work-income/household-labour-force-survey.htm |
2.1.2 Quality of Life Survey
Title | Quality of Life Survey |
Country | New Zealand |
Sponsor | City Councils of NZ’s largest cities; Ministry of Social Development |
Collector | TNS |
Purpose | To provide information to decision-makers to improve |
the Quality of Life in major New Zealand urban areas | |
Year started | 1999 |
Target Population | Residents of the largest New Zealand cities |
Sampling Frame | Electoral Roll |
Sample Design | People were selected from the electoral roll; addresses matched to phone numbers; rang phone numbers and asked for person with the next birthday. Quota sampling within electoral wards (quotas by age, sex, ethnicity) |
Coverage | Excludes households with no landline telephone |
Sample Size | 7720 achieved interviews |
Use of Interviewer | Interviewer administered |
Mode of Administration | Telephone interview |
Computer Assistance | Computer assisted telephone interview (CATI) |
Selection Unit | Household |
Reporting Unit | Person |
Time Dimension | Repeated Cross sectional survey, (most recent 2006) |
Frequency | Once |
Interviews per Round of Survey | One |
Levels of Observation | Person |
Response Rate | 22% |
Web link | http://www.bigcities.govt.nz/ |
Source | http://www.bigcities.govt.nz/ |
2.1.3 Survey of Hector’s Dolphins between Motunau and Timaru
Title | Survey of Hector’s Dolphins between Motunau and Timaru |
Country | New Zealand |
Sponsor | Department of Conservation |
Collector | Department of Conservation |
Purpose | To measure the abundance of Hector’s Dolphin (Cephalorhynchus hectori) between Motunau and Timaru in 1998. |
Year started | 1998 |
Target Population | Hector’s Dolphin between Motunau and Timaru |
Sampling Frame | Line transects taken with 4 nautical miles of the coast between Motunau and Timaru |
Sample Design | Line transects in four strata (Akaroa Harbour; other large bays on Banks Peninsula; Inshore zone (<4nm from shore); offshore zone (4-10nm)) |
Coverage | Excludes dolphins far offshore |
Sample Size | Transects 1 nm apart within harbours and bays; 2 nm apart in Marine Mammal Sanctuary; 4nm elsewhere. 4 replicate surveys. |
Use of Interviewer | Left and Right Observers on 15m catamaran |
Mode of Administration | Observational, using seven-power binoculars |
Computer Assistance | Third observer enters into palmtop as collected |
Selection Unit | Transects; Dolphin Groups |
Reporting Unit | Dolphin Groups |
Time Dimension | Two month observation period |
Frequency | Once (may be repeated in future) |
Interviews per Round of Survey | One |
Levels of Observation | Dolphin Groups |
Response Rate | |
Web link | http://www.doc.govt.nz/ |
Source | Dawson et al. (2000) ‘Line-transect survey of Hector’s dolphin abundance between Motunau and Timaru’; DoC report. |
2.2 Survey Error
At the end of a sample survey analysis we will have an estimate \(\widehat{T}\) of a population parameter of interest \(T\). For example \(T\) might be the unemployment rate in the December quarter, and we find from the HLFS the estimate \(\widehat{T}=3.8%\).
We can be pretty sure that the true unemployment rate isn’t exactly 3.8%, but we expect it to be close to this. The difference between our survey estimate and the (unknown) truth is called the survey error: \[ \text{Survey Error} = \text{Estimate} - \text{Truth} = \widehat{T}-T \] The value of the error is unknown to us (because the truth \(T\) is unknown), but it is useful to think about what factors contribute to the error, and what effects those factors have. The points at which the different survey errors enter the survey process are shown in Figure 2.2.

Figure 2.2: Sources of Survey Error
The error sources in the diagram are usually divided into Sampling Errors and everything else: i.e. Non-sampling Errors. The reason for this is that only sampling error can be properly quantified and allowed for using statistical theory. All of the other types of error need to be controlled and minimised as far as possible, and these are generally unquantifiable. There are methods for reducing the effect of some of these errors, but most such adjustments rely on assumptions that are untestable.
2.2.1 Sampling Error
Sampling error is the error which is the result of collecting information from only a subset of the population, rather than the whole population. Thus censuses have zero sampling error by definition. Sampling error is caused by the variability in the responses across set of possible samples from the population.
The extent of the sampling error depends on many factors, including:
- Sample size: increasing the sample size reduces the sample error, although there is a point beyond which little practical gain is made by further increasing the sample size.
- Variability of the characteristic of interest: the greater the variation in the population, the greater the sampling error.
- Sample design: designs which use known population characteristics may reduce the sampling error by targetting the sampling most efficiently.
2.2.2 Non-Sampling Error
Non-sampling error includes all other sources of error. Almost every step in the survey process is a potential source of non-sampling error, but the size of the error is often not easy or impossible to measure, and may be larger than sampling error.
Non-sampling errors may be related to:
- Frame bias/coverage error: the sample frame which does not match the target population
- Non-random sample selection
- Non-response or false response
- Poor questionnaire design, leading questions, measurement error
- Interiewer error
- Data entry, processing, coding, editing errors
- Post-survey adjustment errors
- Model misspecification in analysis
- Incorrect treatment of data from a survey with a complex design
2.2.3 Examples
Invalid Instrument
Australian National Referendum question:
Do you approve the proposed law to alter the Constitution to establish the Commonwealth of Australia as a republic with the Queen and Governor-General being replaced by a President appointed by a two-thirds majority of the Members of the Commonwealth Parliament?
This is two questions in one – the question does not address support for a republic in Australia, but only support for a particular model. 45% said yes to this question.
Invalid Instrument
Referendum question at the 1999 New Zealand general election:
Should there be a reform of the justice system placing greater emphasis on the needs of victims, providing restitution and compensation for them and imposing minimum sentences and hard labour for all serious violent offences?
Almost 92% of the population answered yes. But what question where they answering? There are 5 questions here! In fact they were probably just answering the question ‘Are you worried about violent crime?’
Coverage Error and Non-response Bias
The Literary Digest magazine ran a postal poll of 10 million people selected from phone books and car registration lists before the 1936 US election. It received a response rate of 23% (2.3 million responses), and incorrectly predicted 55% support for Alf Landon (Rep.) over the incumbent F. D. Roosevelt (Dem.) (41%).
The actual result was 37% for Landon, and 61% for Roosevelt.
George Gallup polled 5,000 people, also by post, but balanced the demographics of his sample. He predicted 54% for Roosevelt, and also predicted that the Literary Digest would get the result wrong.
The Literary Digest made two mistakes:
- Coverage Error - the sampling frame (car owners and phone users) was more affluent than the general population (more likely to vote Republican);
- Non-response Bias/Non-random selection - people wanting change (i.e. a Republican victory) were more likely to respond.
The Literary Digest, having previously been a highly successful polling organisation, went bankrupt the next year.
In 1948 the Gallup organisation overconfidently stopped polling 2 weeks before an election, missed a dramatic late shift in public opinion, and called the election incorrectly: predicting Harry S Truman would be defeated.