R and RStudio

R and RStudio are free programs, which are straightforward to install on Mac, Windows and Linux machines.

R is a computing language which is very easy to use, and for statisticians and increasing numbers of applied scientsts, it has become the standard language. Many people are also using Python in increasing numbers too, and some courses at Victoria use Python. However once you’ve learned one of these languages, you’ll find it easy to learn another.

There is a community of R developers (who build and extend R’s underlying functionality) as well as R users. One nice property of R, and of this community, is that users can write their own sofware, bundle it up, and share it as a package. This means that we don’t all have to learn how to solve every problem - we can use solutions that other people have found and implemented.

In this course we’ll be using many packages which have been written by others. These are add-ons, that we’ll have to load into R at various times, depending on the task in hand.

R has its own, simple, programming interface. If you install R and open it up you’ll get window with subwindows for typing commands (the console) for editing programs (the script editor) and output graphics windows.

RStudio is a program that wraps around R, and makes it even easier to use by providing a programming interface with more functionality. It’s the method we’re going to use in this course to use R.

You can run RStudio and R on the ITS machines in the Cybercommons rooms at Victoria, and on the ITS machines in the library. They are also installed on machines in the School of Mathematics and Statistics, and the School of Engineering and Computer Science.

If you want, you can install R and RStudio on your own device, and do everything in the course on your own machine - to do this follow the instructions below.

Installation of R and RStudio

If you’re installing on your own machine, then go to the two websites and get the latest version for your operating system.

Starting with RStudio

Make sure when you open RStudio that you have a cheat sheat next to you summarising the commands. Here’s one.

Find RStudio on the Start menu (in Windows) or whereever else it appears after it was installed on your machine.

When you open RStudio for the first time you’ll find the screen divided into three panes.

When we start working with script files later, the left hand side of the page will split into the script editing area at top left, and the console will shrink down to bottom left.

Typing at the console

The most important window by far is the console on the left side. This is where R works.

Click in this window and hit the Enter/Return key a few times. You’ll see the prompt “>” appear each time you do. In this window you type your command next to the prompt, then hit Enter.

Try this: type the number 4 at the prompt and hit Enter.

>
>
> 4

R responds with

[1] 4

So that’s the first thing to know about R: it prints out whatever you type in. Or to be more exact, it evaluates what you type in and prints the result. To calculate the answer to 2+2:

> 2+2
[1] 4

Or something slightly more complex: \(\frac{4}{20+3} + \frac{2.2}{34}\):

4/(20+3) + 2.2/34
## [1] 0.2386189

We have the regular operations of arithmetic: addition -, subtraction +, multiplication * and division /. To raise a number to a power go 2^3 to get \(2^3\). In addition there are mathematical functions such as sqrt(4) for \(\sqrt{4}\). Other common functions are log(), exp(), sin() and cos().

Make use of brackets (...) to ensure that expressions are evaluated correctly.

Exercises

Questions

Evaluate the following:

  1. \(\sqrt{4}\)
  2. \(\frac{5.1}{3} + 6\times 5\)
  3. \(\frac{5.1}{3+6\times 5}\)
  4. \(\sqrt{\frac{4^2}{20} + \frac{2.2^2}{34}}\)
Answers
sqrt(4)
## [1] 2
5.1/3 + 6*5
## [1] 31.7
5.1/(3+6*5)
## [1] 0.1545455
sqrt(4^2/20 + 2.2^2/34)
## [1] 0.9707486

Comments

One exception to R evaluating whatever you type is a comment. If you type a hash symbol #, then R will ignore everything on the line from that point onwards. This allows us to put comments into code, to make it more interpretable.

3+7 # Here's some code to evaluate the sum of 3 and 7
## [1] 10

Comments are going to save you from yourself: when you write code you can write notes that remind you what you’re doing at each step.

Continuation lines

If you’re typing a long expression you may want to split it across several lines. R will let you do this, but you need to take care. As soon as R sees something it can evaluate, it will evaluate it - not knowing that you want to add more to your expression.

Let’s say you want to multiply the numbers 1, 2, 3, …, 10. You could type

1*2*3*4*5*6*7*8*9*10
## [1] 3628800

(There are way more efficient ways of doing this in R, but bear with me for this example. The easiest thing to do is to evaluate prod(1:10), or even more simply factorial(10).)

Compare what happens if I type Enter/Return after the * after 4:

1*2*3*4*
5*6*7*8*9*10

with what happens if I type Enter straight after the 4:

1*2*3*4
*5*6*7*8*9*10

In the first case R knows that I’m wanting to type more: the * sign that I typed just before hitting Enter indicates that there is more in the expression to come. So R prints the continuation prompt + at the beginning of line to indicate it’s waiting for more. (It’s always a + sign: indicated add some more - nothing to do with the kind of expression it’s expecting you to type.)

> 1*2*3*4*
+ 5*6*7*8*9*10
[1] 3628800

In the second case 1*2*3*4 is a perfectly fine expression, and R evaluates it. Then the next line *5*6*7*8*9*19 doesn’t make sense: I can’t start a mathematical expression with a multiplication sign. (You can of course start with a + or a -, but not a * or /). The consequence of this is that I get an error message. R complains that it can’d interpret what I’ve asked it to do.

> 1*2*3*4
[1] 24
> *5*6*7*8*9*10
Error: unexpected '*' in "*"

You’re going to see lots of error messages as you learn to program. Some are more helpful than others, but you’ll get better and better at working out what the problem is.

To avoid this particular kind of problem occurring, a sure-fire way to make sure R knows when your expression begins and ends is to bracket the whole thing:

> (1*2*3*4
+ *5*6*7*8*9*10)

Here it’s the initial unclosed bracket ( that tells R to wait until it sees the matching closing bracket right at the end of the expression. Doing this is going to save you hours of searching for errors one day.

By the way, if you’re in the middle of typing a long expression with continuation characters, and you decide that you want to abandon the expression and get back to the console prompt - hit Esc (the Escape key).

Getting Help

You can find out information about functions in R (such as sqrt()) by typing ? and the function name.

?sqrt

You’ll see a help file appear in the bottom right Help window. These help files have helpful examples at the end: code that you can immediately copy and execute to see how things work.

You can search all the help files, and also within any particular help file once you’ve opened it.

Script files

We will want to keep a record of what we’ve done, and won’t want to keep retyping commands over and over again. So instead of typing directly at the console, we’ll write code into a file that we can save and keep for later, which we can edit and change, and to which we can add explanatory comments.

Files containing R code and comments are called script files.

To create a new script file go File >> New File >> R Script and the left hand pane will split into the script editor at top left, and console at bottom left.

You can type R code into this new file, but it won’t be noticed by R and executed until you push it into the Console window.

In a new file (which you’ll notice is called ‘Untitled1’ or similar) type a comment to indicate what is going to be in the file. It’s good practice too to write the date, just in case you’re looking for work you did on a particular day.

## Testing out some R code - 5 March 2018

Then under that on a new line type the R code for evaluating \(\sqrt{5}\):

sqrt(5)

To evaluate an R expression, put the cursor anywhere on the same line, and go Ctrl-R or Ctrl-Enter to run the expression. It gets copied into the console window and evaluated.

sqrt(5)
## [1] 2.236068

You can run several lines of code at once by highlighting them with the mouse, then going Ctrl-R. All the code will be submitted at once to the Console.

If you want to keep a record of what you’ve just calculated, you can copy and paste the results from the console back into your script file. If you do so, it’s good practice to insert a comment symbol # at the start of each line of output you copy back. That means that later on if you push that line of output into the console with Ctrl-R, R will just ignore it.

You can copy and paste by hightlighting a block of text in the console, then going Ctrl-C to copy and Ctrl-V to paste it - just like in many programs. Alternatively use the Edit menu if you’d rather point and click.

If we do this then the file we’ve created will look like this:

## Testing out some R code - 5 March 2018

sqrt(5)
# [1] 2.236068

Note that it’s good practice to put blank lines here and there to space things out. A line of hash (comment) marks is a good way to break up blocks of work within a script file.

##########################################################
## Testing out some R code - 5 March 2018

##########################################################
# Working with square roots
sqrt(5)
# [1] 2.236068

You should save your script file regularly. Give it a sensible and descriptive name - in this case firststeps.R would be a good one. Always give your script files a .R extension - so that RStudio and R can recognise them as files containing R code.

Creating objects in R

We can obviously do a lot more with R than simply use it as a glorified calculator as we’ve done so far. The first thing we want to be able to do is to store the results of calculations in objects. The assignment operator <-, which is a leftward pointing arrow is the way to do this.

To create an object x with the value 4 type

x <- 4

R doesn’t respond at the console when you do this, but if you look at top right in the Environment window you’ll see x as appeared next to its value 4.

Typing x at the console makes R show the value of x

x
## [1] 4

We can make a copy of x and store it as y by

y <- x

and we can delete x by removing it with the rm() function:

rm(x)

Note you can achieve the same thing as x <- 4 by typing x = 4, however the = sign is used at other times in R in a slightly different way, and it’s strongly advisable to use <-.

Exiting RStudio

When you exit RStudio R asks you if you want to save your workspace. The workspace is the set of all the objects in the Environment window, and if you do save your workspace then those objects will all be there the next time you open R.

There are three ways to exit:

  1. Choose File >> Quit Session... from the menu
  2. Go Ctrl+Q (Hold the control key and the q key together)
  3. Type q() in the console prompt - asking R to execute its quit function. (Note that this function q(), unlike a function like sqrt(4), doesn’t need any information from us to execute, so the brackets are empty.)

Working with files with R

Whenever we open RStudio, it always starts in one of the folders on your computer. R calls folders directories, and so will we. This particular directory, the working directory, is the place that R creates, saves and looks for script files, and also where it saves other files (such as the workspace). It is the first place it looks when you ask it to load data files (more about that later).

You can find out what the current working directory by typing

getwd()
## [1] "C:/Users/richard/Documents/usr/courses/scie201/2018/simplepages"

that’s “get working directory”.

The Files tab in the bottom right pane of RStudio shows the files are that are in the current working directory.

RStudio saves the workspace to a file called .RData and the history of commands that you’ve typed at the console is stored in .Rhistory.

When working with R we often want to keep sets of files separate from one another. So it’s good practice to make a separate folder for each piece of work you’re doing. So you might have a folder for each lab or assignment in this course - and you can maintain a separate workspace for each one. Researchers keep their various projects separate from one another in this way.

You can make RStudio look at a particular folder, or create a new one, by going Session >> Set Working Directory >> Choose Directory... in the menu, or using the keyboard shortcut Ctrl-Shift-H. Use the New Folder option to create new folder.

R Packages

The functionality of R is broken up into pieces called packages. The most important of these is called base which contains the key components of the R language and what it does.

The base package is already installed on your system (it’s there on your computer), and it is already loaded when you open RStudio (its functionality is available to use). To see which packages are loaded on your system, go

(.packages())
## [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
## [7] "base"

(All the brackets in this expression are required.)

The stats package contains important functions for statistics, graphics is for making plots etc.

To see which packages are installed on your system take a look at the Packages window at bottom right, or go

installed.packages()

This is likely to be a very long list.

Loading an installed package

The MASS package provides additional statistical functions beyond those present in the stats package. This package is always installed, but isn’t loaded by default. That means we have to load it ourselves befre we can use it.

Try viewing typing the data set Animals from the MASS package by typing

Animals

You’ll get the response

Error: object 'Animals' not found

But if we load the MASS library

library(MASS)

and then try typing Animals again, we see a small dataset of body weights (in kg) and brain weights (in g) for 28 species of animal.

Installing a package you don’t already have on your system

The members of the community of R users are writing R code all the time, and they share it as packages through the CRAN website. R has a direct interface to this website, and if you know of a package that someone else has written, you can download and install it from CRAN.

Say you want to use the maps package. You try to load it

library(maps)

but get the response

Error in library(maps) : there is no package called ‘maps’

which means it’s not installed.

So you go

install.packages("maps")

Note that the double quote marks are needed here. If all is well you’ll get messages from R saying that that the package has been successfully downloaded, unpacked, and checked. R will tell you where on your system it’s storing the package.

If you’re on a University computer the package will be downloaded and installed only on the machine you’re sitting at. If you move to another machine you may need to install it again.

To use the maps package we still need to load it:

library(maps)

and then we can draw a simple map of the New Zealand, marking Wellington:

map("nz")
points(174.7762, -41.2865, pch=22, col="black", bg="red", cex=1.8)
text(174.7762, -41.2865, labels="Wellington", pos=4, xpd=TRUE)

The map appeas in the Plots window at lower right.

The data from the Gapminder website is in the gapminder package

install.packages("gapminder")
library(gapminder)
colvec <- c("green","blue","red","black","purple")
for(year in c(1952,2007)) {
   dsub <- gapminder[gapminder$year==year,]
   dsub <- dsub[order(dsub$pop,decreasing=TRUE),]
   plot(dsub$gdpPercap, dsub$lifeExp, pch=21,
        cex=pmax(0.5,sqrt(dsub$pop)/10000),
        col="black", bg=colvec[factor(dsub$continent)],
        xlab="GDP per Capita", ylab="Life expectancy", main=year)
   legend("bottomright", pch=16, col=colvec,
         legend=levels(factor(dsub$continent)))
}

Kuwait is the outlier in 1952.

Writing an assignment or report

If you’re writing a document in which you are including R code, output, tables and graphs you need to copy material out of RStudio.

Options are:

  • Copy and paste into Word - this is probably the easiest: most students will want to do this.
  • Use a document preparation system like LaTeX - a bit more work
  • Use Rmarkdown - the package what was used to create these lecture notes. It’s a little bit limited, but very easy to integrate with R

Some notes for people using Word

Including R Code

Make sure you

  • Eliminate spaces between lines of code (choose Home > No Spacing)
  • Change to Courier New font
  • Indent everything rightwards to distinguish it from other text

This will make your code much more readable.

## Load required packages
library(httr)
library(XML)

## A user defined function
mynrow <- function(x) ifelse(is.null(nrow(x)),NA,nrow(x))

## Get the flight arrivals
url <- "https://www.wellingtonairport.co.nz/flights"
doc <- htmlParse(rawToChar(GET(url)$content))
tabs <- readHTMLTable(doc)
n.rows <- sapply(tabs, mynrow)
atab <- tabs[[which.max(n.rows)]]
atab
Including graphics

Once you’ve created a graph in the RStudio Plots window:

  • From the Plots window choose “Export”
  • PNG image format is fine, and give the image a sensible name, and save it somewhere you can find it again. It’s good practice to keep your image and graphs all together in a subdirectory called “images” or something - that will keep your workspace tidy.
  • In Word you can then insert the plot into your document “Insert > Pictures > [then choose the file]”
Inserting tables

There are two ways to do this: to export an object called fred either find it in the Environment window, and left click it, or alternatively type

View(fred)

Then in the window that opens select using the mouse from bottom right to top left – right click “copy” then left click “paste”.

However a more robust way, especially if you want to avoid a lot of manual formatting, is to save the output to a .csv file, open it in Excel, and then copy and paste the cells you want into a Word table from there.

write.csv(fred, file="fred.csv")