GSEA Explained: A Beginner’s Guide to Gene Set Enrichment Analysis

Written by

in

Gene Set Enrichment Analysis (GSEA) determines whether a defined set of genes shows statistically significant, coordinated differences between two biological states. Instead of evaluating isolated genes, GSEA ranks your entire transcriptome and checks if the genes in your target pathway bunch up at the very top or bottom of that list.

To pull meaningful biological insights from a GSEA output, you must decode four interconnected metrics: Enrichment Scores (ES), Normalized Enrichment Scores (NES), Nominal P-values, and False Discovery Rates (FDR). 1. Enrichment Score (ES)

The Enrichment Score (ES) represents the degree to which a gene set is overrepresented at the extremes of your ranked gene list.

The Mechanism: The GSEA algorithm steps down your ranked list of genes one by one. It adds to a running sum when it hits a gene in your gene set and subtracts when it hits a gene outside of it.

The Score: The ES is the maximum peak deviation from zero encountered during this walk.

The Limitation: Raw ES cannot be easily compared across different pathways because larger gene sets inherently skew the math and pull higher raw scores. 2. Normalized Enrichment Score (NES)

The Normalized Enrichment Score (NES) is the primary metric used to evaluate the strength and direction of a pathway’s enrichment.

The Mechanism: GSEA accounts for differences in gene set sizes by dividing the raw ES by the mean of the ES values generated from random permutations. Interpretation:

Positive NES (+): The gene set clusters at the top of the ranked list. These genes are coordinately upregulated or highly correlated with your target phenotype.

Negative NES (-): The gene set clusters at the bottom of the ranked list. These genes are coordinately downregulated or correlated with your control/comparison group.

Strength: The absolute value indicates effect size; an absolute NES > 1.5 generally indicates robust enrichment. 3. Nominal P-Value

The Nominal P-Value measures the statistical credibility of the enrichment score for a single, isolated gene set. Bader Lab @ The University of Toronto