The Central Dogma

Central Dogma: DNA → RNA → Protein

In 1958, Francis Crick articulated what he called the " of molecular biology": information flows from to to , and not in reverse. This framework is as fundamental to biology as the OSI model is to networking — a layered abstraction that clarifies how information moves through a system.

But like the OSI model, the is a simplification that becomes more nuanced the deeper you go. Understanding both the rule and its exceptions is essential for making sense of modern genomics, epigenetics, and the biology revolution.

The Core Flow

The canonical flow is:

DNA → RNA → Protein

This encodes two processes:

: is copied into
: is decoded into

These are the two steps every molecular biologist learns first, and they underlie essentially all of analysis.

The logic is straightforward: is the stable, heritable store. It's precious — errors are permanent. So the doesn't expose to the machinery directly. Instead, it makes a temporary working copy () and translates that. The can be adjusted, destroyed, or regulated without touching the source.

⟷DECODER

Biology

DNA → RNA → Protein. The cell never executes DNA directly. It transcribes a working copy (mRNA), which ribosomes then translate into proteins that do the actual work.

{ } For Developers

Source code → build artifact → running process. DNA is the repo. mRNA is the compiled binary. Protein is the running process. You never execute source directly — you build from it, deploy the artifact, and throw it away when done.

{ }Central dogma as the read-only master branch policy

The enforces a -only policy on the source of truth. is the master branch — you don't execute from it directly. You check out a working copy (), build from that, and let the build artifact () do the actual work. If a build is bad, you delete the artifact. The master branch stays intact.

This design decouples rate (how many copies are made) from rate (how many are made per ) from stability (how long each lasts). Three independent dials, compounding into enormous regulatory range.

What Crick Actually Said

Crick's original formulation distinguished between "general" information transfers (which can occur in nature) and "special" transfers (which would require unusual mechanisms):

General (can occur):

→ (replication)
→ ()
→ ()

Special (require unusual ):

→ (reverse )
→ ( replication)
→ or → (these have never been observed)

The key constraint is the last one: sequence information does not flow back to nucleic acids. Once , the sequence of a cannot feed back to modify the that encoded it. This is why acquired traits are not heritable through the germline — the sequence information in cannot write back to .

The Exceptions: Where the Dogma Gets Interesting

The "special" transfers are real and biologically important:

Reverse Transcriptase: RNA → DNA

Retroviruses (HIV, HTLV) carry and use reverse transcriptase — an -dependent polymerase — to convert their into after infecting a . This integrates into the host as a provirus, where it can persist indefinitely.

Reverse transcriptase is also responsible for retrotransposons — transposable elements that amplify themselves through an intermediate. About 40% of the human consists of retrotransposon-derived sequences. Many of our "junk" sequences are fossil retrotransposons.

In the lab, reverse transcriptase is essential for : because sequencers , is first converted to cDNA (complementary ) using reverse transcriptase, then sequenced.

RNA-dependent RNA Polymerase: RNA → RNA

(influenza, SARS-CoV-2, polio) replicate their using -dependent polymerases (RdRp). No such exists in normal human — which is why RdRp inhibitors (like remdesivir) are selective antivirals.

The lack of an inherent proofreading mechanism in most RdRps means mutate rapidly — orders of magnitude faster than -based organisms. This high rate enables rapid evolution and immune evasion but also produces many defective .

Prions: Protein → Protein (Structural)

This one is the most philosophically uncomfortable. Prions are misfolded that can induce normal copies of the same to misfold. The misfolded form is self-propagating without any nucleic acid template.

The prion PrP^Sc (found in Creutzfeldt-Jakob disease, kuru, and bovine spongiform encephalopathy) converts normal PrP^C to the pathological form through direct - contact. This is not a sequence change — same , different fold. Information (the abnormal fold) propagates from to .

Strictly speaking, this doesn't violate the because no sequence information is flowing backward. But it does mean that heritable information can be transmitted without nucleic acids — a deep exception to the intuitive picture.

Gene Expression: Reading the Dogma Dynamically

The describes potential information flow. describes which flows are active at any moment in a given .

Every in your body carries the same (~20,000 -coding ). But different types express different subsets of those . A liver expresses albumin and coagulation factors. A pancreatic β- expresses insulin. A retinal photoreceptor expresses opsins. The same , radically different outputs.

This -type specificity is controlled at multiple levels:

Level	Mechanism	Chapter
Transcriptional	Transcription factors, enhancers, chromatin state	3.1
Epigenetic	DNA methylation, histone modification	3.2
Post-transcriptional	Alternative splicing, RNA stability	3.3
Translational	miRNA regulation, ribosome occupancy	3.1, 2.4
Post-translational	Phosphorylation, ubiquitination	2.5

★Why mRNA abundance ≠ protein abundance

measures levels, but levels are what drive behavior. The correlation is real but imperfect — typically r ≈ 0.4–0.6 in matched samples. can have high but low (translational repression by miRNAs), or low but abundant stable (high half-life). For clinical biomarkers and drug target validation, measurements (proteomics, immunoassays) are often more relevant.

Measuring Gene Expression

The suggests three ways to measure what a is doing:

Genomics — the source code. calling, structural , copy number. Stable across time and types (mostly). Doesn't tell you what's active.

Transcriptomics () — the active . Dynamic, varies by type and condition. Tells you which are being . The dominant approach in molecular biology today.

Proteomics — the executing programs. Mass spectrometry-based. More directly functional but technically harder and less deep than .

Epigenomics (ATAC-seq, ChIP-seq, bisulfite ) — the regulatory state. Which regions of are accessible? Which histones are modified? This layer controls which can be .

Modern multi-omics integrates all four layers to get a full picture of state. Single- technologies (scRNA-seq, single- ATAC-seq) apply these measurements to individual , revealing heterogeneity that bulk measurements average away.

The Central Dogma as a Framework

The is most useful as a framework for asking questions:

If a is overexpressed in cancer, where in the dogma did it go wrong? More copies (genomic)? (transcriptional)? stabilization (post-transcriptional)? Reduced degradation (post-translational)?
If a drug targets a , what happens when become resistant? They can mutate the ( level), amplify the (genomic), upregulate a bypass (transcriptional), or activate post-translational modifications that block drug binding.
If you're designing a diagnostic test, which layer are you measuring? cfDNA (genomic), circulating (transcriptional), biomarkers (proteomics)?

The power of the is not that it's a complete description — it isn't. The power is that it gives you a map of where to look when something goes wrong, and where to intervene when you want to change what a does.

LAB · Transcription: DNA to RNA

Python · Pyodide

LAB · Translation: mRNA to Protein

Python · Pyodide

# Translation reads mRNA in triplets (codons), each encoding one amino acid.

CODON_TABLE = {
  "AUG": "Met", "UUU": "Phe", "UUC": "Phe", "UUA": "Leu", "UUG": "Leu",
  "UCU": "Ser", "UCC": "Ser", "UCA": "Ser", "UCG": "Ser",
  "UAU": "Tyr", "UAC": "Tyr", "UAA": "Stop", "UAG": "Stop", "UGA": "Stop",
  "UGU": "Cys", "UGC": "Cys", "UGG": "Trp",
  "CUU": "Leu", "CUC": "Leu", "CUA": "Leu", "CUG": "Leu",
  "CCU": "Pro", "CCC": "Pro", "CCA": "Pro", "CCG": "Pro",
  "CAU": "His", "CAC": "His", "CAA": "Gln", "CAG": "Gln",
  "CGU": "Arg", "CGC": "Arg", "CGA": "Arg", "CGG": "Arg",
  "AUU": "Ile", "AUC": "Ile", "AUA": "Ile",
  "ACU": "Thr", "ACC": "Thr", "ACA": "Thr", "ACG": "Thr",
  "AAU": "Asn", "AAC": "Asn", "AAA": "Lys", "AAG": "Lys",
  "AGU": "Ser", "AGC": "Ser", "AGA": "Arg", "AGG": "Arg",
  "GUU": "Val", "GUC": "Val", "GUA": "Val", "GUG": "Val",
  "GCU": "Ala", "GCC": "Ala", "GCA": "Ala", "GCG": "Ala",
  "GAU": "Asp", "GAC": "Asp", "GAA": "Glu", "GAG": "Glu",
  "GGU": "Gly", "GGC": "Gly", "GGA": "Gly", "GGG": "Gly",
}

def translate(mrna):
  protein = []
  for i in range(0, len(mrna) - 2, 3):
      codon = mrna[i:i+3]
      aa = CODON_TABLE.get(codon, "?")
      if aa == "Stop":
          break
      protein.append(aa)
  return protein

mrna = "AUGCCUGAGCUGGAGUAA"
protein = translate(mrna)

print("mRNA   :", mrna)
print("Codons :", " | ".join(mrna[i:i+3] for i in range(0, len(mrna), 3)))
print("Protein:", " - ".join(protein))
print("Length :", len(protein), "amino acids")