Part 2·2.6·8 min read

The Central Dogma

The central dogma describes how biological information flows from DNA to RNA to protein — and where the real complexity lies in the exceptions.

central dogmagene expressioninformation flow
Central Dogma: DNA → RNA → Protein

In 1958, Francis Crick articulated what he called the " of molecular biology": information flows from to to , and not in reverse. This framework is as fundamental to biology as the OSI model is to networking — a layered abstraction that clarifies how information moves through a system.

But like the OSI model, the is a simplification that becomes more nuanced the deeper you go. Understanding both the rule and its exceptions is essential for making sense of modern genomics, epigenetics, and the biology revolution.

The Core Flow

The canonical flow is:

DNA → RNA → Protein

This encodes two processes:

  • : is copied into
  • : is decoded into

These are the two steps every molecular biologist learns first, and they underlie essentially all of analysis.

The logic is straightforward: is the stable, heritable store. It's precious — errors are permanent. So the doesn't expose to the machinery directly. Instead, it makes a temporary working copy () and translates that. The can be adjusted, destroyed, or regulated without touching the source.

DECODER
Biology

DNA → RNA → Protein. The cell never executes DNA directly. It transcribes a working copy (mRNA), which ribosomes then translate into proteins that do the actual work.

{ } For Developers

Source code → build artifact → running process. DNA is the repo. mRNA is the compiled binary. Protein is the running process. You never execute source directly — you build from it, deploy the artifact, and throw it away when done.

{ }Central dogma as the read-only master branch policy

The enforces a -only policy on the source of truth. is the master branch — you don't execute from it directly. You check out a working copy (), build from that, and let the build artifact () do the actual work. If a build is bad, you delete the artifact. The master branch stays intact.

This design decouples rate (how many copies are made) from rate (how many are made per ) from stability (how long each lasts). Three independent dials, compounding into enormous regulatory range.

What Crick Actually Said

Crick's original formulation distinguished between "general" information transfers (which can occur in nature) and "special" transfers (which would require unusual mechanisms):

General (can occur):

  • (replication)
  • ()
  • ()

Special (require unusual ):

  • (reverse )
  • ( replication)
  • or (these have never been observed)

The key constraint is the last one: sequence information does not flow back to nucleic acids. Once , the sequence of a cannot feed back to modify the that encoded it. This is why acquired traits are not heritable through the germline — the sequence information in cannot write back to .

The Exceptions: Where the Dogma Gets Interesting

The "special" transfers are real and biologically important:

Reverse Transcriptase: RNA → DNA

Retroviruses (HIV, HTLV) carry and use reverse transcriptase — an -dependent polymerase — to convert their into after infecting a . This integrates into the host as a provirus, where it can persist indefinitely.

Reverse transcriptase is also responsible for retrotransposons — transposable elements that amplify themselves through an intermediate. About 40% of the human consists of retrotransposon-derived sequences. Many of our "junk" sequences are fossil retrotransposons.

In the lab, reverse transcriptase is essential for : because sequencers , is first converted to cDNA (complementary ) using reverse transcriptase, then sequenced.

RNA-dependent RNA Polymerase: RNA → RNA

(influenza, SARS-CoV-2, polio) replicate their using -dependent polymerases (RdRp). No such exists in normal human — which is why RdRp inhibitors (like remdesivir) are selective antivirals.

The lack of an inherent proofreading mechanism in most RdRps means mutate rapidly — orders of magnitude faster than -based organisms. This high rate enables rapid evolution and immune evasion but also produces many defective .

Prions: Protein → Protein (Structural)

This one is the most philosophically uncomfortable. Prions are misfolded that can induce normal copies of the same to misfold. The misfolded form is self-propagating without any nucleic acid template.

The prion PrP^Sc (found in Creutzfeldt-Jakob disease, kuru, and bovine spongiform encephalopathy) converts normal PrP^C to the pathological form through direct - contact. This is not a sequence change — same , different fold. Information (the abnormal fold) propagates from to .

Strictly speaking, this doesn't violate the because no sequence information is flowing backward. But it does mean that heritable information can be transmitted without nucleic acids — a deep exception to the intuitive picture.

Gene Expression: Reading the Dogma Dynamically

The describes potential information flow. describes which flows are active at any moment in a given .

Every in your body carries the same (~20,000 -coding ). But different types express different subsets of those . A liver expresses albumin and coagulation factors. A pancreatic β- expresses insulin. A retinal photoreceptor expresses opsins. The same , radically different outputs.

This -type specificity is controlled at multiple levels:

LevelMechanismChapter
TranscriptionalTranscription factors, enhancers, chromatin state3.1
EpigeneticDNA methylation, histone modification3.2
Post-transcriptionalAlternative splicing, RNA stability3.3
TranslationalmiRNA regulation, ribosome occupancy3.1, 2.4
Post-translationalPhosphorylation, ubiquitination2.5
Why mRNA abundance ≠ protein abundance

measures levels, but levels are what drive behavior. The correlation is real but imperfect — typically r ≈ 0.4–0.6 in matched samples. can have high but low (translational repression by miRNAs), or low but abundant stable (high half-life). For clinical biomarkers and drug target validation, measurements (proteomics, immunoassays) are often more relevant.

Measuring Gene Expression

The suggests three ways to measure what a is doing:

Genomics the source code. calling, structural , copy number. Stable across time and types (mostly). Doesn't tell you what's active.

Transcriptomics () the active . Dynamic, varies by type and condition. Tells you which are being . The dominant approach in molecular biology today.

Proteomics the executing programs. Mass spectrometry-based. More directly functional but technically harder and less deep than .

Epigenomics (ATAC-seq, ChIP-seq, bisulfite ) — the regulatory state. Which regions of are accessible? Which histones are modified? This layer controls which can be .

Modern multi-omics integrates all four layers to get a full picture of state. Single- technologies (scRNA-seq, single- ATAC-seq) apply these measurements to individual , revealing heterogeneity that bulk measurements average away.

The Central Dogma as a Framework

The is most useful as a framework for asking questions:

  • If a is overexpressed in cancer, where in the dogma did it go wrong? More copies (genomic)? (transcriptional)? stabilization (post-transcriptional)? Reduced degradation (post-translational)?

  • If a drug targets a , what happens when become resistant? They can mutate the ( level), amplify the (genomic), upregulate a bypass (transcriptional), or activate post-translational modifications that block drug binding.

  • If you're designing a diagnostic test, which layer are you measuring? cfDNA (genomic), circulating (transcriptional), biomarkers (proteomics)?

The power of the is not that it's a complete description — it isn't. The power is that it gives you a map of where to look when something goes wrong, and where to intervene when you want to change what a does.

LAB · Transcription: DNA to RNA
Python · Pyodide
LAB · Translation: mRNA to Protein
Python · Pyodide