Introduction
- RNA-seq measures gene expression across the transcriptome, producing count data per gene per sample.
- The main goal of DE analysis is to identify genes with expression differences between conditions.
- Biological replicates are required for valid statistical inference; technical replicates must be combined or carefully handled.
- A “counts table” is the starting point for most RNA-seq analyses in R.
- Preparing a DGEList object and sample metadata correctly is essential for downstream analyses with limma and edgeR.
- Understanding the data structure and design factors early helps ensure trustworthy DE results later.
First steps: filtering, visualisation, and basic normalisation
- Filtering lowly expressed genes improves signal-to-noise and reliability of DE results.
- Library size influences count comparisons; larger libraries generally provide more statistical power.
- Visualisation (PCA, boxplots, RLE) is essential to detect technical variation and assess data quality.
- Log transformation of counts stabilises variance and makes visual patterns easier to interpret.
- Normalisation corrects for library size and compositional biases, allowing fair comparison across samples.
- TMM (trimmed mean of M-values) is a common method for normalising RNA-seq data using edgeR.
- Good normalisation removes unwanted technical variation while preserving biological signal.
- Persistent technical effects (e.g. batch effects) require more advanced correction strategies beyond basic normalisation.
Differential expression with limma
- RNA-seq data can be modelled with linear models after log transformation.
- The limma workflow estimates group means and contrasts, then tests for DE.
- Empirical Bayes moderation stabilises variances, improving reliability.
- Adjusted p-values (FDR) control false discoveries across many tests.
- limma-trend and limma-voom give similar results unless library sizes differ greatly.
Summary and exerciseVisualisation of resultsGetting data
- The full DE workflow combines filtering, normalisation, modelling, and testing.
- limma-trend and limma-voom differ mainly in handling library size variability.
- Visualisations such as volcano plots, MD plots, and heatmaps summarise DE results clearly.
- Public repositories like GEO and GREIN provide accessible RNA-seq count data.
- Mastering reproducible code structure ensures robust and transparent RNA-seq analysis.