Introduction to R
- Use scripts for analyses over typing commands in the console. This allows you to keep an accurate record of what you did, which is important for reproducible research
- You can store data as objects using the assignment operator
<-
- Installing packages from CRAN is done with the
install.packages()
function - You can view the help page of a function by typing a
?
before the name (e.g.?sum
)
Exploring the Data
- The
read_csv()
andread_tsv()
functions can be used to load in CSV and TSV files in R - The
head()
andtail()
functions can print the first and last parts of an object and thedim()
function prints the dimensions of an object - Subsetting can be done with the
$
operator using column names or using square brackets[ ]
- The
str()
andsummary()
functions are useful functions to get an overview or summary of the data
Formatting the Data
- The
pivot_longer()
function can convert data frames from wide to long format and there are multiple ways to do this - The
full_join()
function can merge two data frames. You can specify which column names should be joined by using thejoin_by()
function.
Introduction to ggplot2
- A ggplot has 3 components: data (dataset), mapping (columns to plot)
and geom (type of plot). Different types of plots include
geom_point()
,geom_jitter()
,geom_line()
,geom_boxplot()
,geom_violin()
. -
facet_wrap()
can be used to make subplots of the data - The aesthetics of a ggplot can be modified, such as colouring by different columns in the dataset
Extra ggplot2 Customisation
- In
ggplot2
, you can specify colours manually using thescale_colour_manual()
function or use a predefined palette using thescale_colour_brewer()
function - The
labs()
function allows you to set a plot title and change axis labels - Complete themes such as
theme_bw()
andtheme_classic()
can be used to change the appearance of a plot - Using the
theme()
function allows you to tweak components of a theme
Wrapping Up
- You can use the
pdf()
function to save plots, and finalize the file by callingdev.off()
- The
sessionInfo()
function prints information about your R environment which is useful for reproducibility