Identical twins share the same sequence. Yet they can develop different diseases, respond differently to drugs, and show measurable physiological differences as they age. They are even more concordant for disease than fraternal twins but less than 100% concordant — which means something beyond sequence determines .
That something is epigenetics: chemical modifications to and the around it that change without altering the underlying sequence. Epigenetic marks are heritable through division, reversible in response to environment, and increasingly recognized as key drivers of development, aging, and disease.
The Chromatin Layer
To understand epigenetics, you first need to understand chromatin — the complex of , histone , and associated molecules that makes up .
As described in the chapter, is wrapped around histones — small, positively charged that compact the negatively charged . The basic unit is the nucleosome: 147 bp of wrapped ~1.75 times around an octamer of 4 histone types (H2A, H2B, H3, H4, two copies each).
The critical insight: whether is accessible to and polymerase depends on how tightly it's packaged.
Two states of chromatin:
- Euchromatin: loosely packed, accessible, actively regions. Appears lighter in microscopy.
- Heterochromatin: densely packed, inaccessible, transcriptionally silent regions. Appears darker. Includes constitutive heterochromatin (centromeres, telomeres — permanently silenced) and facultative heterochromatin ( silenced in a given type but active in others).
Think of chromatin state as access control. The same file () exists in every , but in some it's chmod 000 (heterochromatin — no access), in others chmod 644 (euchromatin — readable). The code hasn't changed. The permissions have.
Unlike filesystem permissions, chromatin states can be dynamically changed in response to signals — environmental cues can "unlock" or "lock" — and importantly, daughter inherit these states through division.
DNA Methylation: The Primary Epigenetic Mark
methylation is the addition of a methyl group (–CH₃) to the 5' position of cytosine, almost exclusively at CpG dinucleotides (cytosine followed by guanine) in mammals.
Distribution
The is generally hypomethylated except at repetitive elements (transposons, satellite ), where methylation is used to silence potentially harmful mobile elements. About 70–80% of CpGs are methylated in typical somatic .
CpG islands — regions with high CpG density, typically 200–3,000 bp long — are located at ~60% of and are generally unmethylated when the is active. When a CpG island becomes methylated, the associated is silenced.
Mechanism of Silencing
Methylated CpGs are recognized by methyl-CpG binding (MBDs, including MeCP2), which recruit histone deacetylases (HDACs) and other repressive machinery. This causes the chromatin to compact, blocking TF access.
Additionally, methylated cytosine physically impairs TF binding — some TFs require unmethylated CpGs at their binding sites.
Writers, Readers, and Erasers
Every epigenetic mark has that write, , and erase it:
- Writers (DNMTs): DNMT1 maintains methylation during replication (copies the methylation pattern to the newly synthesized strand); DNMT3A/3B establish new methylation patterns (de novo methylation)
- Readers: MBD , Kaiso
- Erasers: TET oxidize 5-methylcytosine to 5-hydroxymethylcytosine and further intermediates that are removed by excision repair, resulting in demethylation
Cancer and DNA Methylation
Cancer shows dramatic methylation changes:
- Global hypomethylation: repetitive elements become unmethylated, potentially reactivating transposons and destabilizing the
- hypermethylation: tumor suppressor get silenced by CpG island methylation. This is as effective as for inactivating a and contributes to the "two-hit" mechanism of tumor suppressor loss.
Epigenetic clocks (like Horvath's clock) use methylation patterns at specific CpGs to accurately estimate biological age — often more precisely than chronological age. Accelerated epigenetic aging predicts disease risk and mortality.
The standard method for measuring methylation -wide is bisulfite . Treating with sodium bisulfite converts unmethylated cytosines to uracil (which as thymine after PCR), while methylated cytosines are protected. After , C residues in the correspond to methylated sites; T residues correspond to unmethylated sites.
Whole- bisulfite (WGBS) provides single-CpG resolution. RRBS (reduced representation BS) is cheaper but covers only high-CpG regions. Methylation arrays (Illumina EPIC/450k) measure ~850k CpG sites and are standard for clinical and large-scale studies.
Histone Modifications: The Histone Code
Histone have "tails" — unstructured N-terminal extensions that protrude from the nucleosome core. These tails are extensively modified by that add or remove chemical groups:
| Modification | Mark | Effect | Typical location |
|---|---|---|---|
| Acetylation | H3K27ac, H3K9ac | Active; neutralizes positive charge, loosens chromatin | Active enhancers, active promoters |
| Methylation (1,2,3 methyl) | H3K4me3 | Active promoters | Active TSS |
| H3K4me1 | Enhancers (active or poised) | Enhancers | |
| H3K27me3 | Repressive | Polycomb-silenced regions | |
| H3K9me3 | Repressive | Constitutive heterochromatin | |
| H3K36me3 | Active transcription elongation | Gene bodies | |
| Ubiquitination | H2AK119ub1 | Repressive (Polycomb) | Polycomb-silenced genes |
| Phosphorylation | H3S10ph | Active; chromosome condensation | Mitotic chromosomes, active genes |
This combinatorial code — called the histone code hypothesis — means that multiple marks together specify chromatin state more precisely than any single mark alone.
Writers, Readers, and Erasers (Histones)
- Histone acetyltransferases (HATs): write acetyl marks. CBP/p300 write H3K27ac at active .
- Histone deacetylases (HDACs): erase acetyl marks. HDAC inhibitors (vorinostat, romidepsin) are approved cancer drugs.
- Histone methyltransferases (HMTs): write methyl marks. EZH2 writes H3K27me3 (repressive); DOT1L writes H3K79me (active).
- Histone demethylases (KDMs): erase methyl marks. LSD1/KDM1A demethylates H3K4me1/2 and H3K9me1/2.
- Bromodomains: domains that "" acetyl marks. BET (BRD2, BRD3, BRD4) bind acetylated histones at and ; BET inhibitors (JQ1, iBET) are in clinical trials for cancer.
Chromatin Accessibility: The Open/Closed Switch
Measuring which regions of the are accessible — i.e., not occluded by nucleosomes — is done with ATAC-seq (Assay for Transposase-Accessible Chromatin with ). Transposase Tn5 preferentially inserts adapters into open chromatin; reveals which regions are accessible.
ATAC-seq identifies:
- Active
- Active
- binding sites (TF footprinting)
- -type-specific regulatory elements
Combined with data, ATAC-seq helps identify which regulatory elements drive changes in a given condition.
Polycomb and Trithorax: The Two-State System
Two complexes maintain silencing and activation states through development:
Polycomb Repressive Complexes (PRC1, PRC2) deposit and repressive marks (H3K27me3, H2AK119ub1). They maintain silencing through division — essential for stable identity. PRC2's catalytic subunit EZH2 is frequently mutated or overexpressed in cancer.
Trithorax/COMPASS complexes maintain active states through H3K4 methylation and H3K36 methylation. They antagonize Polycomb and keep developmental active.
The interplay between Polycomb and Trithorax creates a bistable system — are either "on" (Trithorax-dominated) or "off" (Polycomb-dominated), with sharp transitions. This bistability contributes to the robustness of identity: once a type is established, it maintains its program stably through thousands of divisions.
Imprinting: Parent-of-Origin Epigenetics
Some are expressed exclusively from either the maternal or paternal — determined by epigenetic marks established in the germline. This is genomic imprinting, and about 100 human are imprinted.
Imprinted are regulated by imprinting control regions (ICRs) — differentially methylated CpG regions that carry methylation on only one parental . This -specific methylation is established during gametogenesis and maintained throughout development.
Disorders of imprinting are medically important:
- Prader-Willi syndrome: loss of paternal 15q11-q13 (or maternal uniparental disomy of chr15)
- Angelman syndrome: loss of maternal 15q11-q13 (or paternal uniparental disomy)
- Different syndromes from the same chromosomal deletion because different in the region are imprinted in opposite directions.
Epigenetics refers to heritable changes in gene expression that do not involve changes to the DNA sequence. DNA methylation silences genes; histone modifications compact or open chromatin. These marks can persist through cell division and, in some cases, across generations.
Epigenetic marks are runtime configuration that persists across process forks. DNA methylation is a comment-out that survives replication — the code stays in the genome but the flag says 'do not execute.' Histone modifications are filesystem permissions on the genome: acetylation = read/execute, methylation = no access. The same source code, different runtime behavior, heritable across generations.
Epigenetics in Bioinformatics Practice
Common epigenomics data types and tools:
| Data type | What it measures | Analysis tools |
|---|---|---|
| WGBS / RRBS | DNA methylation at CpGs | Bismark, BSMAP, methylKit |
| ChIP-seq | Histone marks, TF binding | MACS2, deepTools, HOMER |
| ATAC-seq | Chromatin accessibility | MACS2, HINT-ATAC, chromVar |
| Hi-C / 4C | 3D genome organization | HiC-Pro, cooltools, juicer |
| CUT&RUN | Histone marks (low cell input) | Similar to ChIP-seq |
A typical epigenomics analysis project involves: aligning to the , calling peaks (enriched regions), annotating peaks relative to , and integrating with expression data to understand regulatory relationships.
The key challenge is integration: a single type might have ATAC-seq, ChIP-seq (multiple marks), WGBS, and . Making sense of all four simultaneously — the multi-omics integration problem — is one of the central challenges of current bioinformatics.