The human genome contains roughly 20,000 protein-coding genes. The human proteome — the complete set of proteins — contains well over 100,000 distinct protein forms. How? The genome doesn't encode 100,000 genes. The discrepancy is resolved largely by alternative splicing: the ability of a single pre-mRNA to be spliced in multiple ways, producing different combinations of exons and therefore different protein isoforms.
Alternative splicing is not an exception or an edge case. It's the rule: approximately 95% of human multi-exon genes undergo alternative splicing. It's one of the primary mechanisms through which eukaryotic complexity arises from a surprisingly small genome.
Splicing Recap
As established in the genes chapter, after transcription, the pre-mRNA contains all introns. The spliceosome — a large complex of snRNAs (U1, U2, U4, U5, U6) and ~150 proteins — identifies intron-exon boundaries by recognizing consensus sequences (5' splice site GU, branch point, polypyrimidine tract, 3' splice site AG) and catalyzes intron removal.
Constitutive splicing removes every intron and joins all exons — the same outcome every time. Alternative splicing uses different combinations of splice sites to produce distinct mRNA isoforms.
Modes of Alternative Splicing
There are five main patterns:
Exon Skipping
The most common mode (~40% of alternative splicing events). An exon is included in some transcripts and excluded from others. Inclusion/skipping is controlled by the relative strength of the splice sites flanking the exon and by regulatory proteins.
Pre-mRNA: [Exon 1]—[Exon 2]—[Exon 3]—[Exon 4]
Isoform A: [Exon 1]—[Exon 2]—[Exon 3]—[Exon 4] (include exon 3)
Isoform B: [Exon 1]—[Exon 2]—[Exon 4] (skip exon 3)
Alternative 5' Splice Site
Different 5' splice sites are used, changing the 5' boundary of an intron — thus including or excluding a portion of the upstream exon.
Alternative 3' Splice Site
Different 3' splice sites are used, changing the 3' boundary of an intron — including or excluding a portion of the downstream exon.
Intron Retention
An intron is retained in the mature mRNA rather than being spliced out. Often produces a non-functional transcript (with a premature stop codon, triggering NMD) but can also produce functional isoforms. More common in plants; less common in animals, though prevalent in neurons.
Mutually Exclusive Exons
Two or more exons that are never included in the same transcript. The transcript always includes exactly one of them.
Imagine a codebase where certain modules can be compiled in or out depending on build flags. The source is the same; the compiled binary differs. Alternative splicing is this mechanism operating at the RNA level. The DNA (source) is fixed. Different cells, at different times or in different conditions, produce different mRNA "builds" by including or excluding exons.
The downstream consequence: two cells with identical genomes can express functionally distinct proteins from the same gene.
Regulation: Splicing Enhancers and Silencers
Splice sites alone don't fully determine which splicing pattern occurs. Local sequences in the pre-mRNA regulate spliceosome assembly:
- Exonic Splicing Enhancers (ESEs): sequences within exons that promote inclusion
- Exonic Splicing Silencers (ESSs): sequences within exons that promote skipping
- Intronic Splicing Enhancers (ISEs): promote exon inclusion when in adjacent intron
- Intronic Splicing Silencers (ISSs): promote exon skipping
These sequences are bound by RNA-binding proteins (RBPs), especially SR proteins (serine/arginine-rich — splicing activators) and hnRNPs (heterogeneous nuclear ribonucleoproteins — often splicing repressors). The balance of these RBPs determines which isoform is produced.
Key RBPs:
- SRSF1 (ASF/SF2): canonical SR splicing activator
- hnRNP A1: often antagonizes SR proteins; promotes exon skipping
- NOVA1/NOVA2: neuron-specific splicing regulators; control alternative splicing of many neuronal genes
- PTBP1: represses inclusion of neuron-specific exons in non-neural cells; PTBP1 downregulation during neuronal differentiation allows inclusion of neural-specific exons
Functional Consequences of Isoforms
Alternative splicing can change:
- Protein domain composition: including or excluding a domain changes the protein's interactions and functions
- Subcellular localization: a localization signal in an alternatively spliced exon can redirect the protein
- Protein stability: some isoforms are more stable; others have shorter half-lives
- Enzymatic activity: active site residues can be affected
- Dimerization: isoforms can differ in their ability to form homo- or heterodimers
Classic examples:
BRCA1 produces multiple isoforms through alternative splicing. Isoforms lacking functional BRCT domains can have altered DNA repair and tumor suppressor activity.
BCL-X (BCL2L1 gene): the long isoform BCL-XL is anti-apoptotic (prevents programmed cell death); the short isoform BCL-XS is pro-apoptotic. These are produced from the same gene by alternative 5' splice site usage. The balance between the two isoforms helps determine whether a cell lives or dies.
VEGF-A (vascular endothelial growth factor) has multiple isoforms with different binding affinities and diffusion properties — controlling whether the angiogenic signal is local or diffuses widely.
Tau (MAPT gene): multiple exons are alternatively spliced, producing 6 isoforms with different microtubule-binding properties. An imbalance in tau isoforms is implicated in tauopathies including Alzheimer's disease.
Disease-Causing Splicing Mutations
Mutations that disrupt splice sites are a major class of pathogenic variants. They can cause:
- Exon skipping: loss of the downstream exon → truncated protein
- Intron retention: intron retained → premature stop codon → NMD
- Cryptic splice site activation: a nearby sequence with partial homology to a splice site gets activated → aberrant isoform
~15–50% of pathogenic single-nucleotide variants affect splicing, either at canonical splice sites or in ESEs/ESSs. Many "missense" mutations in coding sequence actually disrupt splicing by eliminating an ESE rather than (only) changing the amino acid.
This has important implications for variant interpretation: a variant in the middle of an exon, with no predicted amino acid change, can still be pathogenic if it destroys an ESE. Standard variant annotation pipelines that only consider amino acid effects miss this class.
The ability to manipulate alternative splicing has become therapeutically useful. Antisense oligonucleotides (ASOs) can be designed to bind pre-mRNA sequences and block or expose splice sites:
- Nusinersen (Spinraza): treats spinal muscular atrophy (SMA) by blocking an ISS in the SMN2 gene, forcing inclusion of exon 7 and producing functional SMN protein
- Eteplirsen (Exondys 51): treats Duchenne muscular dystrophy by skipping exon 51, restoring the reading frame of the dystrophin gene
Splice-switching is now a validated therapeutic mechanism, with multiple approved drugs.
Measuring Alternative Splicing with RNA-seq
Standard RNA-seq workflows count reads per gene, averaging over all isoforms. Detecting alternative splicing requires:
Isoform-level quantification tools: Kallisto, Salmon, and RSEM directly quantify transcript isoforms (not just genes) by probabilistically assigning reads to known transcripts. Output: TPM/estimated counts per transcript.
Differential splicing analysis: rMATS, SUPPA2, and DEXSeq identify which splicing events change between conditions. They quantify Percent Spliced In (PSI, ψ) — the fraction of transcripts that include a given exon — and test for differences.
PSI ranges from 0 (exon always skipped) to 1 (exon always included). A PSI change of 0.2 between conditions means the exon shifts from, say, 40% to 60% included — often biologically meaningful.
Long-read sequencing (Oxford Nanopore, PacBio) reads full-length transcripts, directly revealing isoform structures without inference from short reads. Increasingly used for isoform discovery in tissues with complex splicing patterns (especially brain).
Alternative Splicing and the Proteome
Alternative splicing dramatically expands protein diversity beyond the ~20,000 gene count:
- Multiple isoforms per gene
- Isoforms with distinct interaction partners, localization, stability
- Isoform ratios that change during differentiation, disease, and aging
This means that the same variant (DNA mutation) can have different effects depending on which isoforms are expressed in a given cell type. A mutation might affect a domain present in the ubiquitous isoform but absent in the brain-specific isoform — so the phenotype is not brain-related.
For anyone working in genomics, splicing is not an advanced topic. It's part of the baseline: every variant call, every differential expression analysis, every gene annotation query involves decisions about which isoforms count and how to handle them.