Introduction to R


  • Use scripts for analyses over typing commands in the console. This allows you to keep an accurate record of what you did, which is important for reproducible research
  • You can store data as objects using the assignment operator <-
  • Installing packages from CRAN is done with the install.packages() function
  • You can view the help page of a function by typing a ? before the name (e.g. ?sum)

Exploring the Data


  • The read_csv() and read_tsv() functions can be used to load in CSV and TSV files in R
  • The head() and tail() functions can print the first and last parts of an object and the dim() function prints the dimensions of an object
  • Subsetting can be done with the $ operator using column names or using square brackets [ ]
  • The str() and summary() functions are useful functions to get an overview or summary of the data

Formatting the Data


  • The pivot_longer() function can convert data frames from wide to long format and there are multiple ways to do this
  • The full_join() function can merge two data frames. You can specify which column names should be joined by using the join_by() function.

Introduction to ggplot2


  • A ggplot has 3 components: data (dataset), mapping (columns to plot) and geom (type of plot). Different types of plots include geom_point(), geom_jitter(), geom_line(), geom_boxplot(), geom_violin().
  • facet_wrap() can be used to make subplots of the data
  • The aesthetics of a ggplot can be modified, such as colouring by different columns in the dataset

Extra ggplot2 Customisation


  • In ggplot2, you can specify colours manually using the scale_colour_manual() function or use a predefined palette using the scale_colour_brewer() function
  • The labs() function allows you to set a plot title and change axis labels
  • Complete themes such as theme_bw() and theme_classic() can be used to change the appearance of a plot
  • Using the theme() function allows you to tweak components of a theme

Wrapping Up


  • You can use the pdf() function to save plots, and finalize the file by calling dev.off()
  • The sessionInfo() function prints information about your R environment which is useful for reproducibility