Part 2·2.4·10 min read

RNA: The Bytecode

RNA is the working intermediate between stored source code and running executables — more diverse and dynamic than its supporting role suggests.

RNAtranscriptionnon-coding RNA

In any compiled language, there's a step between source code and execution: compilation produces an intermediate representation — bytecode, IR, assembly. You don't run Java source files; you run .class files. The intermediate form is more convenient for execution, shorter-lived than the source, and can be optimized, cached, or discarded.

RNA plays exactly this role in biology. DNA is the persistent, protected source of truth. Proteins are the executing programs. RNA is the transient intermediate that bridges them — produced on demand, modified, transported, translated, and degraded. But "intermediate" undersells it: RNA turns out to have rich functionality of its own, including catalytic activity, regulatory roles, and structural functions.

Transcription: Synthesizing RNA from DNA

Transcription is the process by which RNA polymerase reads a DNA template and synthesizes a complementary RNA strand. In eukaryotes, this happens in the nucleus.

The steps:

  1. Initiation: Transcription factors bind the promoter, recruit RNA polymerase II (for protein-coding genes), and open the double helix at the transcription start site.

  2. Elongation: RNA pol II moves 3'→5' along the template strand, synthesizing RNA 5'→3'. Unlike DNA polymerase, it doesn't need a primer. Speed: ~20–50 nucleotides/second.

  3. Termination: RNA pol II encounters a termination signal downstream of the gene and releases the RNA transcript.

The product is the pre-mRNA (primary transcript) — a complete copy of the genomic region between start and termination sites, including all introns.

The three RNA polymerases

Eukaryotes have three distinct RNA polymerases:

  • RNA Pol I — transcribes ribosomal RNA (rRNA) genes
  • RNA Pol II — transcribes protein-coding genes → mRNA, plus most non-coding RNAs
  • RNA Pol III — transcribes tRNA, 5S rRNA, and other small structural RNAs

Bacteria have just one RNA polymerase that does everything. This is why antibiotics that target bacterial RNA polymerase (like rifampicin) are selective — the bacterial enzyme is structurally distinct from all three eukaryotic ones.

mRNA Processing: From Pre-mRNA to Mature mRNA

The pre-mRNA is extensively processed before it leaves the nucleus:

5' Capping

Shortly after transcription begins, the 5' end of the pre-mRNA receives a 7-methylguanosine cap. This cap:

  • Protects the mRNA from degradation by 5'→3' exonucleases
  • Is recognized by the ribosome to initiate translation
  • Is recognized by export machinery to transport mRNA out of the nucleus

The cap is not a standard nucleotide — it's added in a 5'→5' linkage (reversed compared to the rest of the strand) by specific capping enzymes.

Polyadenylation

The 3' end of the pre-mRNA is cleaved ~10–30 nucleotides downstream of a consensus signal (AATAAA in the DNA, AAUAAA in the RNA). Then poly(A) polymerase adds a tail of ~150–250 adenine nucleotides — the poly-A tail.

The poly-A tail:

  • Protects from 3'→5' exonuclease degradation
  • Promotes export from the nucleus
  • Is recognized by poly-A binding protein (PABP), which promotes efficient translation
  • Can be shortened over time — shorter tails correlate with faster mRNA degradation (this is a major post-transcriptional regulation mechanism)
Why poly-A tails matter for sequencing

Almost all mRNA-seq (RNA-seq) protocols capture mRNAs using oligo-dT beads — short stretches of deoxyadenosine that hybridize to poly-A tails. This means standard RNA-seq specifically selects for polyadenylated transcripts, which includes most but not all mRNAs, and excludes most non-coding RNAs. If you want to capture rRNA or non-polyadenylated RNAs, you need ribo-depletion rather than poly-A selection.

Splicing

As discussed in the genes chapter, introns are removed and exons joined by the spliceosome. Alternative splicing of the same pre-mRNA can produce different mRNA isoforms encoding different proteins. We'll cover this in depth in Chapter 3.3.

The RNA Classes: More Than Just Messengers

When biologists say "RNA," most people think mRNA. But mRNA is a small fraction of total cellular RNA by mass. Here's the full landscape:

mRNA (Messenger RNA)

The working copies of protein-coding genes. Constitutes only ~2–5% of total RNA by mass despite being the most diverse class. Lifetime: minutes to hours.

rRNA (Ribosomal RNA)

The structural and catalytic core of ribosomes. Makes up ~80% of total cellular RNA by mass. Extremely stable. The 16S rRNA (prokaryotes) and 18S rRNA (eukaryotes) are frequently used as phylogenetic markers for species identification — their sequences evolve slowly enough to be conserved, but fast enough to distinguish species. 16S rRNA sequencing is the standard method for characterizing gut microbiome composition.

tRNA (Transfer RNA)

Small RNAs (~70–90 nucleotides) that carry amino acids to the ribosome. Each tRNA has an anticodon loop that base-pairs with an mRNA codon, and a 3' CCA end that gets aminoacylated (loaded with the correct amino acid) by aminoacyl-tRNA synthetases. There's one synthetase per amino acid, and they are among the most accurate molecular machines in the cell — error rate ~1 in 10⁴.

miRNA (MicroRNA)

Small (~22 nucleotide) non-coding RNAs that regulate gene expression post-transcriptionally. They base-pair (often imperfectly) with sequences in the 3' UTR of target mRNAs, recruiting the RISC complex (RNA-Induced Silencing Complex) to degrade or stall translation of the target. Each miRNA can regulate hundreds of targets; each mRNA can be regulated by dozens of miRNAs.

{ }miRNAs as configuration flags

A miRNA functions like a feature flag system that acts on the RNA level. The flag (miRNA) checks whether a specific pattern is present in the transcript (3' UTR sequence), and if so, reduces expression — not by editing the DNA, but by suppressing the working copy. Different cell types express different miRNA sets, tuning the same genome to different expression profiles.

lncRNA (Long Non-Coding RNA)

RNAs >200 nucleotides that don't encode protein. Over 50,000 human lncRNA genes have been identified — more than protein-coding genes. Functions include: chromatin remodeling, transcriptional regulation, splicing regulation, and acting as decoys for miRNAs (competing endogenous RNAs). Many lncRNAs are tissue-specific and dysregulated in disease. The XIST lncRNA is a canonical example — it coats the inactive X chromosome and silences it in female cells.

snRNA and snoRNA

Small nuclear RNAs (snRNA) are core components of the spliceosome (U1, U2, U4, U5, U6). Small nucleolar RNAs (snoRNA) guide chemical modifications of rRNA and tRNA. Both are required for normal RNA processing and are constitutively expressed.

siRNA (Small Interfering RNA)

Double-stranded RNAs that trigger sequence-specific mRNA degradation through the RNAi pathway — the same pathway as miRNA but with perfect complementarity to the target. In nature, siRNAs defend against viruses and transposons. In the lab, synthetic siRNAs are a standard tool for knocking down gene expression. RNAi therapeutics (like Patisiran and Inclisiran) use modified siRNAs as drugs.

RNA Structure and the RNA World

Unlike DNA, RNA can fold into complex secondary and tertiary structures — because its 2'-OH group allows more diverse hydrogen bonding patterns. RNA structures include:

  • Stem-loops (hairpins): helical double-stranded regions with single-stranded loops
  • Pseudoknots: more complex topologies where two stem-loops interlock
  • Riboswitches: mRNA structures in the 5' UTR that change conformation in response to metabolite binding, controlling gene expression without proteins

Ribozymes are catalytic RNAs. The ribosome itself — the machine that synthesizes all proteins — is fundamentally an RNA machine. The peptidyl transferase center that makes peptide bonds is composed of rRNA, not protein. This supports the RNA world hypothesis: early life may have relied on RNA for both information storage and catalysis, before DNA and protein took over their respective roles.

RNA Stability and Degradation

mRNA is deliberately unstable — typical half-lives range from minutes (for rapidly regulated genes like immediate early response genes) to hours (for housekeeping genes). This instability allows the cell to rapidly change its protein output in response to signals.

Degradation pathways:

  1. Deadenylation: Poly-A tail is shortened by deadenylases. When the tail is gone, decapping enzymes remove the 5' cap.
  2. 5'→3' decay: Once the cap is gone, the exoribonuclease Xrn1 degrades the mRNA rapidly.
  3. 3'→5' decay: The exosome complex can degrade from the 3' end.
  4. NMD (Nonsense-Mediated Decay): Special pathway that detects and destroys mRNAs with premature stop codons — a quality control mechanism that prevents synthesis of truncated, potentially harmful proteins.

Understanding RNA stability is important for RNA-seq analysis: more stable transcripts accumulate to higher steady-state levels, so mRNA abundance reflects both transcription rate and degradation rate. A gene that's transcribed rapidly but degraded quickly can have lower steady-state mRNA than a gene transcribed slowly but very stably.

The RNA Biology Revolution

Before the 2000s, RNA was considered primarily a passive intermediate. The discovery of RNA interference (RNAi, 1998 Nobel Prize in 2006), the non-coding RNA explosion from ENCODE (2012), and the application of RNA as medicine (mRNA vaccines, 2021 Nobel Prize) have completely rewritten that view.

Today, RNA biology intersects with:

  • Transcriptomics (RNA-seq, single-cell RNA-seq)
  • Epitranscriptomics (RNA modifications like m6A methylation)
  • Structural biology (cryo-EM structures of ribosomes, spliceosomes)
  • Therapeutics (mRNA vaccines, siRNA drugs, ASO drugs)
  • Diagnostics (liquid biopsy via circulating RNA)

RNA is not a secondary player in the central dogma. It's where most of the regulation actually happens.