Chapter 10 Excel
Excel is a powerful spreadsheet program which is useful for data management, summary and display. We’ll need it for a few operations that iNZight can’t do.
Here is a document from STAT193 which gives a good summary of some common operations in Excel.
10.1 Simple functions
=AVERAGE(
cell range)
compute the mean \[ \bar{y}=(1/n)\sum_{k=1}^n y_k \]=STDEV(
cell range)
- compute the standard deviation \[ s_y = \sqrt{\sum_{k=1}^n (y_k-\bar{y})^2/(n-1)} \]=RAND()
- generate a random number distributed uniformly between 0 and 1
The cell range in AVERAGE()
and STDEV()
specifies the data \(y_1,\ldots,y_n\) that the functions will work on.
The cells in rows 2 to 100 in column B
are specified as B2:B100
: e.g. =AVERAGE(B2:B100)
will give the mean of those values.
10.2 Sorting datasets
To sort an entire data set, assuming it has a header row with column/variable names:
- Select the Data tab, then click Sort
- The Sort dialog box will appear.
- Make sure
My data has headers
is ticked - Choose the variable you want to sort by first, and choose the sorting order (either ascending [A-Z] or descending [Z-A])
- Click
Add Level
to add a further variable sort by, and its order - You can add or delete levels as you wish
- click
OK
to run the sort
10.3 Selecting a SRSWOR
If you have a dataset of \(N\) rows and want to select an SRSWOR of \(n\) rows from it:
- Insert a new column, and give it a name (e.g.
rannum
) - Fill each cell in the column with random numbers using the
=RAND()
function. (To do this type=RAND()
in the first cell of the column, and then copy this into all of the cells in the column.) - Make the values permanent by selecting the whole column, going
Right-Click>>Copy
, thenRight-Click>>Paste Special
. SelectPaste Values
and paste into the same column. - Sort the whole spreadsheet by this new column (using
Data>>Sort
) - Select the first \(n\) rows of the sorted spreadsheet and copy into a new spreadsheet.
This results in a SRSRWOR of size \(n\) from the population.
10.4 Selecting a Stratified SRSWOR
To selected a stratified SRSWOR, using one or more stratification variables, carry out all of the above steps, but at the sorting step sort by the stratification variable(s) as well as the random number column.
Then copy the required number of rows from the top of each block of sorted rows, grouped by the stratification variable.
10.4.1 Example
The file titanic.csv contains a list of passengers from the Titanic. To select a stratified SRSWOR of 10 passengers from each of the passenger classes (stratification variable pclass
) open the spreadsheet, add the random number column, and then sort by pclass
and rannum
specified in that order. The spreadsheet will be ordered into blocks by passenger class. Take 10 rows from the top of the block where pclass
is 1st
, then scroll down and find the top of the second class passengers: take the top 10 of those, and finally take the top 10 of the block of third class passengers.