Introduction to R

Last updated on 2025-05-06 | Edit this page

Estimated time: 40 minutes

Authors: Maria Doyle, Jessica Chung, Vicky Perreau, Kim-Anh Lê Cao, Saritha Kodikara, Eva Hamrud.

Overview

Questions

  • Why might you prefer to run code in a script rather than directly from the console?
  • What is the assignment operator and how can you use it to store objects?

Objectives

  • Introduce the R programming language and RStudio
  • Explain functions, objects, and packages
  • Demonstrate how to view help pages and documentation

R for Biologists course


R takes time to learn, like a spoken language. No one can expect to be an R expert after learning R for a few hours. This course has been designed to introduce biologists to R, showing some basics, and also some powerful things R can do (things that would be more difficult to do with Excel). The aim is to give beginners the confidence to continue learning R, so the focus here is on tidyverse and visualisation of biological data, as we believe this is a productive and engaging way to start learning R. After this short introduction you could use this book to dive a bit deeper.

Most R programmers do not remember all the command lines we share in this document. R is a language that is continuously evolving. They use Google extensively to use many new tricks. Do not hesitate to do the same!

Intro to R and RStudio


RStudio is an interface that makes it easier to use R. There are four panels in RStudio. The screenshot below shows an analogy linking the different RStudio panels to cooking.

R script vs console


There are two ways to work in RStudio in the console or in a script.

Let’s start by running a command in the console.

Your turn 1.1

Run the command below in the console.

R

1 + 1

Once you’ve typed in the command into your console, just press enter. The output should be printed into the console.

Alternatively, we can use an R script. An R script allows us to have a record (recipe) of what we have done, whilst commands we type into the console are not saved. Keeping a record speeds up our analysis because we can re-use code, its also helpful to remember what we have done before!

Your turn 1.2

Create a script from the top menu in RStudio:
File > New File > R Script, then type the command below in your new script.

R

2 + 2

To run a command in a script, we place the cursor on the line you want to run, and then either:

  • Click on the run button on top of the panel
  • Use Ctrl + Enter (Windows/Linux) or Cmd + Enter (MacOS).

You can also highlight multiple lines at once and run them at once - have a go!


Commenting


Comments are notes to ourself or others about the commands in the script. They are useful also when you share code with others. Comments start with a # which tells R not to run them as commands.

R

# testing R
2 + 2

Keeping an accurate record of how you have manipulated your data is important for reproducible research. Writing detailed comments and documenting your work are useful reminders to your future self (and anyone else reading your scripts) on what your code does.

Commenting code is good practice, why not try commenting on the code you write in this session to get into the habit, it will also make your R script more informative when you come back to it in the future.


Working directory


Opening an RStudio session launches it from a specific location. This is the ‘working directory’.

Understanding the Working Directory

The working directory is the folder where R reads and saves files on your computer. When working in R, you’ll often read data files and write outputs like analysis results or plots. Knowing where your working directory is set helps ensure R finds your files and saves outputs in the right place.

You can find out where your current working directory is set to by using two different approaches:

  • Run the command getwd(). This will print out the path to your working directory in the console. It will be in this format: /path/to/working/directory (Mac) or C:\path\to\working\directory (Windows), or
  • In the bottom-right panel, click the blue cog icon on the menu at the top, then click Go To Working Directory. This will show you the location and files in your working directory in the files window.

Your turn 1.3

Where is you working directory set to at the moment? Is this a useful place to have it set?

By default the working directory is often your home directory. To keep data and scripts organised its good practice to set your working directory as a specific folder.

Your turn 1.4

Create a folder for this course somewhere on your computer. Name the folder something meaningful, for example, intro_r_course or Introduction_to_R. Then, to set this folder as your working directory, you can do this in multiple ways, e.g.:

  • Click in the menu at the top on Session > Set Working Directory > Choose directory and choose your folder, or
  • In the bottom-right panel, navigate to the folder that you want to be your working directory. You can also do this by clicking on the three dots icon on the top-right of the panel. Then once you’re in a suitable directory, in the menu bar of the in the bottom-right panel, navigate to the blue cog icon and click Set As Working Directory

You will see that once you have set your working directory, the files inside your new folder will appear in the ‘Files’ window on RStudio.

Your turn 1.5

Save the script you created in the previous section as intro.R in this directory. You can do this by clicking on File > Save and the default location should be the current working directory (e.g. intro_r_course).

Multiple Ways to Achieve the Same Goal in R

You might have noticed by now that in R, there are often several ways to accomplish the same task. You might find one method more intuitive or easier to use than others — and that’s okay! Experiment, explore, and choose the approach that works best for you.

You might have noticed that when you set your working directory in the previous step, a line appeared in your console saying something like setwd("~/Desktop/intro_r_course"). As well as the point-and-click methods described above, you can also set your working directory using the setwd() command in the console or in a script.

Your turn 1.6

What might be an advantage of using the command line option (i.e. setwd()) over point-and-click methods to set your working directory?

There is no easy way to record what you point and click on (unless you write it all down!). Putting a command at the top of the script means you are less likely to forget where you have your working directory, and when you come back to it another day you can quickly re-run it.

Your turn 1.6 (continued)

Add a line at the top of your newly created script intro.R so that the working directory is set to your newly made folder (e.g. intro_r_course).

You can also use RStudio projects as described here to automatically keep track of and set the working directory.


Functions


In mathematics, a function defines a relation between inputs and output. In R (and other coding languages) it is the same. A function (also called a command) takes inputs called arguments inside parentheses, and output some results.

We have actually already used two functions in this workshop - getwd() and setwd(). getwd() does not take an input, but outputs your working directory. setwd() takes a path as its input, and sets it as your working directory.

Let’s take a look at some more functions below.

Your turn 1.7

Compare these two outputs. In the second line we use the function sum().

R

2 + 2
sum(2, 2)

Your turn 1.8

Try using the below function with different inputs, what does it do?

R

sqrt(9)
sqrt(81)

Tab completion

A very useful feature is Tab completion. You can start typing and use Tab to autocomplete code, for example, a function name.


Objects


It is useful to store data or results so that we can use them later on for other parts of the analysis. To do this, we can store data as objects. We can use the assignment operator <-, where the name of the object (which are called variables) is on the left side of the arrow, and the data you want to store is on the right side.

For example, the below code assigns the number 5 to the object x using the <- operator. You can print out what the x object is by just typing it into the console or running it in your script.

Your turn 1.9

Play around and create some objects. Then print out the objects using their names.

R

x <- 5
x

result_1 <- 2 + 2
result_1

Assignment operator shortcut

In RStudio, typing Alt + - (holding down Alt at the same time as the - key) will write <- in a single keystroke in Windows, while typing > Option + - (holding down Option at the same time as the - key) does the same in a Mac.

Once you have assigned objects, you can perform manipulations on them using functions.

Your turn 1.10

Compare the two outputs.

R

sum(1, 2)

x <- 1
y <- 2
sum(x, y)

Remember, if you use the same object name multiple times, R will overwrite the previous object you had created.

Your turn 1.11

What is the value of x after running this code?

R

x <- 5
x <- 10

x is 10. The previous value of 5 has been overwritten.

Your turn 1.12

Can you write some code to calculate the sum of the square root of 9 and the square root of 16?

R

sum(sqrt(9), sqrt(16))

OUTPUT

[1] 7

So far we have looked at objects which are numbers. However objects can also be made of characters, these are called strings. To create a string you need to use quotation marks "".

R

my_string <- "Hello!"
my_string

OUTPUT

[1] "Hello!"

There are a whole host of different objects you can make in R, too many to cover in this session! Later on when we do some data wrangling we will work with objects which are dataframes (i.e. tables) and vectors (a series of numbers and/or strings). Let’s make a simple vector now to get familiar. To make a vector you need to use the command c().

R

my_vector <- c(1, 2, 3)
my_vector

OUTPUT

[1] 1 2 3

R

my_new_vector <- c("Hello", "World")
my_new_vector

OUTPUT

[1] "Hello" "World"

Your turn 1.13

Try making an object and setting it as 1:5, what does this object look like?

R

x <- 1:5
x

OUTPUT

[1] 1 2 3 4 5

1:5 creates a vector with a sequence of numbers from 1 to 5.

Once you have a vector, you can subset it. We will cover this further when we do some data wrangling but lets try a simple example here.

R

my_vector <- c("A", "B", "C")
# extract the first element from the vector
my_vector[1]

OUTPUT

[1] "A"

R

# extract the last element from the vector
my_vector[3]

OUTPUT

[1] "C"

Your turn 1.14

Create a vector from 1 to 10 and print the 9th element of the vector.

R

my_vector <- 1:10
my_vector[9]

OUTPUT

[1] 9

Packages


We have seen that functions are really useful tools which can be used to manipulate data. Although some basic functions, like sum() and setwd() are available by default when you install R, some more exciting functions are not. There are thousands of R functions available for you to use, and functions are organised into groups called packages or libraries. An R package contains a collection of functions (usually that perform related tasks), as well as documentation to explain how to use the functions. Packages are made by R developers who wish to share their methods with others.

Once we have identified a package we want to use, we can install and load it so we can use it. Here we will use the tidyverse package which includes lots of useful functions for data managing, we will use the package later in this session.

If it’s not already installed on your computer, you can use the install.packages function to install a package. A package is a collection of functions along with documentation, code, tests and example data.

R

install.packages("tidyverse")

Packages in the CRAN or Bioconductor


Packages are hosted in different locations. Packages hosted on CRAN (stands for Comprehensive R Archive Network) are often generic package for all sorts of data and analysis. Bioconductor is an ecosystem that hosts packages specifically dedicated to biological data.

The installation of packages from Bioconductor is a bit different, e.g to install the mixOmics package we type:

R

# You don't need to run this codeblock for this workshop

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("mixOmics")

You don’t need to remember this command line, as it is featured in the Bioconductor package page (see here for example).

One advantage of Bioconductor packages is that they are well documented, updated and maintained every six months.


Getting help


As described above, every R package includes documentation to explain how to use functions. For example, to find out what a function in R does, type a ? before the name and help information will appear in the Help panel on the right in RStudio.

Your turn 1.15

Find out what the sum() command does.

R

?sum

What is really important is to scroll down the examples to understand how the function can be used in practice. You can use this command line to run the examples:

Your turn 1.16

Run some examples of the sum() command.

R

example(sum)

Packages also come with more comprehensive documentation called vignettes. These are really helpful to get you started with the package and identify which functions you might want to use.

Your turn 1.17

Have a look at the tidyverse package vignette.

R

browseVignettes("tidyverse")

Common R errors


R error messages are common and often cryptic. You most likely will encounter at least one error message during this tutorial. Some common reasons for errors are:

  • Case sensitivity. In R, as in other programming languages, case sensitivity is important. ?install.packages is different to ?Install.packages.
  • Missing commas
  • Mismatched parentheses or brackets or unclosed parentheses, brackets or apostrophes
  • Not quoting file paths ("")
  • When a command line is unfinished, the “+” in the console will indicate it is awaiting further instructions. Press ESC to cancel the command.

To see examples of some R error messages with explanations see here

More information for when you get stuck

As well as using package vignettes and documentation, Google and Stack Overflow are also useful resources for getting help.

Key Points

  • Use scripts for analyses over typing commands in the console. This allows you to keep an accurate record of what you did, which is important for reproducible research
  • You can store data as objects using the assignment operator <-
  • Installing packages from CRAN is done with the install.packages() function
  • You can view the help page of a function by typing a ? before the name (e.g. ?sum)