Map of the Territory

Before you start reading, it helps to know the shape of what you're reading. This chapter maps the full curriculum — what each part covers, how the parts depend on each other, and which paths are most relevant depending on what kind of work you're doing.

The Nine Parts

Part 0 — Why This Matters is what you're reading now. It explains the gap between software and biology, how this site is structured, and how to use it. You can it in under thirty minutes.

Part 1 — The Infrastructure of Life builds the foundation. The as a system, the key molecular players, the as a boundary. This is the hardware layer: before you can understand what the software does, you need to know what it runs on. The In Practice chapter introduces NCBI and the major biological databases you'll query constantly.

Part 2 — The Genetic Code is the heart of the curriculum. as source code, as functions, as bytecode, as executables. This part culminates in the — the fundamental information flow of biology — and then shows you how to work with it using Biopython.

Part 3 — Control and Regulation is where biology gets interesting. is not a static of the source code — it's a dynamic, context-dependent process. Epigenetics, , regulatory networks. This part explains how the same can produce hundreds of different types. The In Practice chapter builds a regulatory network using NetworkX and the STRING interaction database.

Part 4 — Communication and Signaling explains how talk to each other and how they respond to their environment. , , signaling cascades, the cycle. This is the event-driven architecture of biology.

Part 5 — Virology and Immunology covers a topic most developers care about but few understand mechanistically: how work, how the immune system responds, and how vaccines and therapies exploit these mechanisms. The In Practice chapter works with using .

Part 6 — Variation, Evolution and Disease connects the molecular machinery to population-level phenomena. , cancer, evolutionary optimization, genetic diseases. The In Practice chapter introduces VCF files — the standard format for genomic data.

Part 7 — Computational Neuroscience covers the as a computational unit, biological neural networks, plasticity, brain signals as data, and brain-computer interfaces. This part has the most direct connection to ML and AI. The In Practice chapter analyzes EEG signals using MNE-Python.

Part 8 — Biostatistics and ML Applied to Biology is the capstone. It explains why biostatistics is different from general statistics, covers the essential tests and methods, and walks through a full analysis pipeline. This is the part that turns domain knowledge into working analyses.

How the Parts Connect

The curriculum has a loose dependency graph. Some parts require earlier parts; others can be more independently.

Part 0 (orientation)
    └── Part 1 (cell infrastructure)
            └── Part 2 (genetic code)    ← central hub
                    ├── Part 3 (regulation)
                    ├── Part 4 (signaling)
                    ├── Part 5 (virology)
                    ├── Part 6 (variation & disease)
                    └── Part 7 (neuroscience)
                                └── Part 8 (stats & ML)

Parts 1 and 2 are prerequisites for everything else. You don't need to have memorized them, but you need to have them. The rest of the curriculum assumes you know what is, what a does, and what the says.

Parts 3–7 are relatively independent of each other, though they share vocabulary. You can them in any order after Part 2. The exception: Part 7 is much easier after Part 3, because understanding regulation in is the same concept as regulation everywhere.

Part 8 requires all of the above. The statistical methods only make sense if you understand what you're measuring and why. The ML applications require knowing what the features represent biologically.

For Different Audiences

If you're a software engineer moving into biotech or genomics, Parts 0–3 first. That covers the vocabulary you'll encounter most — , , expression, regulation. Then Part 6 for and Part 8 for the analysis methods.

If you're a data scientist working with omics data, start with Parts 1–2 for the biological context, then go straight to Part 8. Come back to Parts 3–6 as specific topics come up in your work.

If you're an ML engineer working on structure, drug discovery, or genomics models, Parts 2 and 3 are essential. Understanding what are at the molecular level — not just as sequences or 3D structures — will make your feature engineering much more principled. Part 8 is directly useful for evaluation methodology.

If you're a researcher from biology who wants to understand the computational side, you can skim Parts 0–3 quickly (you already know this material) and focus on the In Practice chapters, which explain the tools in biological terms.

What You'll Be Able to Do

After Part 1: You can papers that describe -level experiments and understand what's being measured and why.

After Part 2: You can understand what bioinformatics tools are actually computing — what it means to sequences, call , or quantify .

After Part 3: You can about regulatory mechanisms and networks without losing the thread. You understand why the same can behave differently in different .

After Part 8: You can design and critique biological data analyses. You know what the statistical assumptions are, why they matter in biology specifically, and what "good enough to publish" looks like.

ℹThe In Practice Chapters

Each In Practice chapter ends a part with working code. These chapters are self-contained — you can run the code without having done the theoretical chapters, and the theoretical chapters don't require you to have run the code. But the two together are more than the sum of their parts.

The map is not the territory. You'll encounter concepts in your work that this curriculum doesn't cover in depth. That's expected. The goal is to give you the conceptual foundation from which you can navigate the territory yourself.

Start reading. The gap closes faster than you think.