Introduction


  • RNA-seq measures gene expression across the transcriptome, producing count data per gene per sample.
  • The main goal of DE analysis is to identify genes with expression differences between conditions.
  • Biological replicates are required for valid statistical inference; technical replicates must be combined or carefully handled.
  • A “counts table” is the starting point for most RNA-seq analyses in R.
  • Preparing a DGEList object and sample metadata correctly is essential for downstream analyses with limma and edgeR.
  • Understanding the data structure and design factors early helps ensure trustworthy DE results later.

First steps: filtering, visualisation, and basic normalisation


  • Filtering lowly expressed genes improves signal-to-noise and reliability of DE results.
  • Library size influences count comparisons; larger libraries generally provide more statistical power.
  • Visualisation (PCA, boxplots, RLE) is essential to detect technical variation and assess data quality.
  • Log transformation of counts stabilises variance and makes visual patterns easier to interpret.
  • Normalisation corrects for library size and compositional biases, allowing fair comparison across samples.
  • TMM (trimmed mean of M-values) is a common method for normalising RNA-seq data using edgeR.
  • Good normalisation removes unwanted technical variation while preserving biological signal.
  • Persistent technical effects (e.g. batch effects) require more advanced correction strategies beyond basic normalisation.

Differential expression with limma


  • RNA-seq data can be modelled with linear models after log transformation.
  • The limma workflow estimates group means and contrasts, then tests for DE.
  • Empirical Bayes moderation stabilises variances, improving reliability.
  • Adjusted p-values (FDR) control false discoveries across many tests.
  • limma-trend and limma-voom give similar results unless library sizes differ greatly.

Summary and exerciseVisualisation of resultsGetting data


  • The full DE workflow combines filtering, normalisation, modelling, and testing.
  • limma-trend and limma-voom differ mainly in handling library size variability.
  • Visualisations such as volcano plots, MD plots, and heatmaps summarise DE results clearly.
  • Public repositories like GEO and GREIN provide accessible RNA-seq count data.
  • Mastering reproducible code structure ensures robust and transparent RNA-seq analysis.