Introduction to R
- Use scripts for analyses over typing commands in the console. This allows you to keep an accurate record of what you did, which is important for reproducible research
- You can store data as objects using the assignment operator
<- - Installing packages from CRAN is done with the
install.packages()function - You can view the help page of a function by typing a
?before the name (e.g.?sum)
Exploring the Data
- The
read_csv()andread_tsv()functions can be used to load in CSV and TSV files in R - The
head()andtail()functions can print the first and last parts of an object and thedim()function prints the dimensions of an object - Subsetting can be done with the
$operator using column names or using square brackets[ ] - The
str()andsummary()functions are useful functions to get an overview or summary of the data
Formatting the Data
- The
pivot_longer()function can convert data frames from wide to long format and there are multiple ways to do this - The
full_join()function can merge two data frames. You can specify which column names should be joined by using thejoin_by()function.
Introduction to ggplot2
- A ggplot has 3 components: data (dataset), mapping (columns to plot)
and geom (type of plot). Different types of plots include
geom_point(),geom_jitter(),geom_line(),geom_boxplot(),geom_violin(). -
facet_wrap()can be used to make subplots of the data - The aesthetics of a ggplot can be modified, such as colouring by different columns in the dataset
Extra ggplot2 Customisation
- In
ggplot2, you can specify colours manually using thescale_colour_manual()function or use a predefined palette using thescale_colour_brewer()function - The
labs()function allows you to set a plot title and change axis labels - Complete themes such as
theme_bw()andtheme_classic()can be used to change the appearance of a plot - Using the
theme()function allows you to tweak components of a theme
Wrapping Up
- You can use the
pdf()function to save plots, and finalize the file by callingdev.off() - The
sessionInfo()function prints information about your R environment which is useful for reproducibility