Chapter 12 Excel

Excel is a powerful spreadsheet program which is useful for data management, summary and display. We’ll need it for a few operations that iNZight can’t do.

Here is a document from STAT193 which gives a good summary of some common operations in Excel.

12.1 Simple functions

  • =AVERAGE(cell range) compute the mean \[ \bar{y}=(1/n)\sum_{k=1}^n y_k \]
  • =STDEV(cell range) - compute the standard deviation \[ s_y = \sqrt{\sum_{k=1}^n (y_k-\bar{y})^2/(n-1)} \]
  • =RAND() - generate a random number distributed uniformly between 0 and 1

The cell range in AVERAGE() and STDEV() specifies the data \(y_1,\ldots,y_n\) that the functions will work on.

The cells in rows 2 to 100 in column B are specified as B2:B100: e.g. =AVERAGE(B2:B100) will give the mean of those values.

12.2 Sorting datasets

To sort an entire data set, assuming it has a header row with column/variable names:

  • Select the Data tab, then click Sort
  • The Sort dialog box will appear.
  • Make sure My data has headers is ticked
  • Choose the variable you want to sort by first, and choose the sorting order (either ascending [A-Z] or descending [Z-A])
  • Click Add Level to add a further variable sort by, and its order
  • You can add or delete levels as you wish
  • click OK to run the sort

12.3 Selecting a SRSWOR

If you have a dataset of \(N\) rows and want to select an SRSWOR of \(n\) rows from it:

  1. Insert a new column, and give it a name (e.g. rannum)
  2. Fill each cell in the column with random numbers using the =RAND() function. (To do this type =RAND() in the first cell of the column, and then copy this into all of the cells in the column.)
  3. Make the values permanent by selecting the whole column, going Right-Click>>Copy, then Right-Click>>Paste Special. Select Paste Values and paste into the same column.
  4. Sort the whole spreadsheet by this new column (using Data>>Sort)
  5. Select the first \(n\) rows of the sorted spreadsheet and copy into a new spreadsheet.

This results in a SRSRWOR of size \(n\) from the population.

12.4 Selecting a Stratified SRSWOR

To selected a stratified SRSWOR, using one or more stratification variables, carry out all of the above steps, but at the sorting step sort by the stratification variable(s) as well as the random number column.

Then copy the required number of rows from the top of each block of sorted rows, grouped by the stratification variable.

12.4.1 Example

The file titanic.csv contains a list of passengers from the Titanic. To select a stratified SRSWOR of 10 passengers from each of the passenger classes (stratification variable pclass) open the spreadsheet, add the random number column, and then sort by pclass and rannum specified in that order. The spreadsheet will be ordered into blocks by passenger class. Take 10 rows from the top of the block where pclass is 1st, then scroll down and find the top of the second class passengers: take the top 10 of those, and finally take the top 10 of the block of third class passengers.

Lumley, Thomas. 2010. Complex Surveys: A Guide to Analysis Using R. Hoboken, NJ: Wiley.