Every person carries an estimated 4–5 million variants relative to the human reference genome, including 2–3 de novo mutations (not present in either parent). Most are benign. About 50–100 are in genes associated with disease. A handful may be medically actionable. The clinical genetics problem — identifying which variants cause disease in a specific patient — is one of the most consequential applications of bioinformatics.
Modes of Inheritance
The way a disease variant is transmitted from parent to child depends on whether it's dominant or recessive, and whether it's on an autosome, the X chromosome, or mitochondrial DNA.
Autosomal Dominant (AD)
One mutant allele is sufficient to cause disease. Each child of an affected parent has a 50% chance of inheriting the variant.
Mechanism: either the mutant protein is toxic/interfering (dominant negative) or a 50% reduction in functional protein is insufficient (haploinsufficiency).
Examples:
- Huntington's disease (HTT CAG expansion): expanded polyglutamine tract in huntingtin creates a toxic gain-of-function protein
- BRCA1/2 (hereditary breast/ovarian cancer): haploinsufficiency; one functional copy is usually sufficient for DNA repair, but the second hit in somatic cells leads to cancer
- Marfan syndrome (FBN1): dominant negative; the mutant fibrillin-1 disrupts normal fibrillin assembly
AD conditions often show variable expressivity (affected individuals differ in severity) and incomplete penetrance (not everyone who carries the variant develops the disease).
Autosomal Recessive (AR)
Both alleles must be mutated for disease. Carriers (one mutant allele) are usually healthy. Two carrier parents have a 25% probability of an affected child.
Mechanism: requires complete or near-complete loss of protein function. A single functional allele usually provides enough protein for normal function.
Examples:
- Cystic fibrosis (CFTR mutations): ~1/25 Northern European carriers; ~1/2500 births affected
- Sickle cell disease (HBB E6V): HbS homozygotes have profound hemolytic anemia; heterozygotes have sickle cell trait — mild symptoms but protective against malaria
- Phenylketonuria (PKU) (PAH mutations): phenylalanine hydroxylase deficiency; dietary phenylalanine accumulates → neurotoxicity; newborn screening + dietary treatment prevents intellectual disability
X-linked
Genes on the X chromosome. Males (XY) are hemizygous — they have only one X, so a single recessive variant causes disease. Females (XX) can be carriers.
X-linked recessive (XLR): males affected, females usually carriers.
- Duchenne muscular dystrophy (DMD): frame-disrupting dystrophin mutations; males affected; females carriers
- Hemophilia A (F8) and B (F9)
- Color blindness (OPN1LW/MW)
X-linked dominant (XLD): affects both males and females.
- Rett syndrome (MECP2): more severe in males; often lethal in hemizygous males
Mitochondrial
Mitochondrial DNA has ~37 genes. mtDNA is maternally inherited (all mitochondria in the embryo come from the oocyte). Affected mothers pass mtDNA variants to all children; affected fathers do not.
Mitochondrial diseases affect high-energy tissues: brain, muscle, heart. Examples: MELAS (mitochondrial encephalomyopathy), Leber's hereditary optic neuropathy (LHON).
A complication: heteroplasmy — cells may contain a mixture of normal and mutant mtDNA. Disease severity correlates with the proportion of mutant mtDNA, which can vary between tissues and change over time.
De Novo Mutations
Not all genetic diseases are inherited. De novo mutations — new mutations not present in either parent — arise in the germ cells (egg or sperm) or in the early embryo. They're found by trio sequencing: sequencing the patient plus both parents and looking for variants in the patient that aren't in either parent.
De novo mutation rate: ~1 × 10⁻⁸ per base per generation → approximately 60 de novo single nucleotide variants per person. Paternal age is the dominant factor: the mutation rate in sperm increases with age (sperm undergo far more cell divisions than eggs).
De novo mutations cause:
- ~50% of severe intellectual disability cases
- Most cases of early-onset neurodevelopmental conditions (autism, epilepsy, schizophrenia)
- Achondroplasia (FGFR3 G380R de novo in ~98% of cases)
De novo status is a strong indicator of pathogenicity: if a variant is not inherited and causes a serious phenotype, it's likely causative.
Variable Expressivity and Penetrance
Even with a clearly pathogenic variant, not everyone who carries it is equally affected:
Penetrance: the fraction of individuals with a genotype who display the phenotype. BRCA1 pathogenic variants confer ~70–80% lifetime risk of breast cancer — high but not 100% (incomplete penetrance).
Variable expressivity: individuals with the same pathogenic variant can have different phenotypic severity. NF1 (neurofibromatosis type 1) carriers range from mild café-au-lait spots only to severe neurofibromas and malignant peripheral nerve sheath tumors.
Modifying factors: other genetic variants (modifier genes) and environmental exposures can modulate penetrance and expressivity. This is why GWAS and polygenic risk scores are relevant even for monogenic diseases — genetic background modifies outcomes.
The Clinical Genetics Pipeline
Step 1: Clinical Recognition and Ordering Sequencing
The clinical process begins with a patient (or family) presenting with symptoms suggesting a genetic condition. The clinician decides what test to order:
- Chromosomal microarray: first-tier test for intellectual disability/autism; detects CNVs across the genome
- Single gene sequencing: when the clinical presentation strongly suggests a specific diagnosis (e.g., DMD in a boy with early-onset proximal weakness)
- Gene panel: sequencing of multiple genes associated with the same phenotypic spectrum (e.g., hereditary cancer panel: BRCA1/2, PALB2, CHEK2, ATM, etc.)
- Exome sequencing: sequencing all coding regions (~1–2% of the genome); first-tier for unexplained pediatric disease
- Genome sequencing: the full genome; highest diagnostic yield; increasingly cost-competitive
Step 2: Variant Calling and Annotation
Raw sequencing reads → alignment → variant calling → annotation.
Annotation pipeline:
- ANNOVAR/VEP: predict functional consequence (synonymous, missense, stop-gain, splice site)
- Population frequencies: gnomAD allele frequency (AF). If AF > 1% in gnomAD, unlikely to be a dominant disease variant with high penetrance
- In silico pathogenicity predictors: SIFT, PolyPhen-2, REVEL, AlphaMissense (uses protein structure prediction)
- ClinVar lookup: known classifications from other labs
- Literature search: published case reports, functional studies
Step 3: Variant Interpretation (ACMG/AMP Guidelines)
The ACMG/AMP 2015 guidelines (updated by ClinGen) provide a standardized framework:
| Evidence type | Criteria examples |
|---|---|
| Population data | PM2: absent from controls; BS1: allele frequency above expected |
| Computational | PP3: multiple in silico tools predict damaging; BP4: predict benign |
| Functional | PS3: well-established functional studies show damaging effect; BS3: shows no damaging effect |
| Segregation | PP1: co-segregates with disease in multiple affected family members |
| De novo | PS2: confirmed de novo in affected individual; PM6: assumed de novo |
| Allelic | PM3: detected in trans with a pathogenic variant (for AR) |
| Case data | PS4: prevalence in affected significantly increased vs. controls |
Points accumulate to a final classification: pathogenic (≥5 pathogenic points), likely pathogenic, VUS, likely benign, or benign.
Step 4: Return of Results
Variants are reported in a clinical format:
- Pathogenic/likely pathogenic variants: reported with interpretation and implications
- VUS: reported with explanation that significance is unclear; patient and family may return for re-evaluation as evidence accumulates
- Benign/likely benign: usually not reported
Secondary findings: ACMG recommends reporting pathogenic variants in 81 genes that are medically actionable regardless of the test indication — including BRCA1/2, Lynch syndrome genes, cardiac channelopathies, familial hypercholesterolemia genes. If you sequence a patient for any reason and find a BRCA1 pathogenic variant, you report it.
Key Databases for Clinical Variant Interpretation
| Database | Content | URL |
|---|---|---|
| ClinVar | Variant-disease interpretations from labs | ncbi.nlm.nih.gov/clinvar |
| gnomAD | Population frequencies (800k exomes, 76k genomes) | gnomad.broadinstitute.org |
| OMIM | Gene-disease associations + phenotype descriptions | omim.org |
| LOVD | Locus-specific variant databases (per gene) | lovd.nl |
| ClinGen | Gene-disease validity curation; variant curation rules | clinicalgenome.org |
| SpliceAI | Deep learning splice effect prediction | Available as VEP plugin |
| AlphaMissense | Structure-based missense pathogenicity from DeepMind | Available via VEP/ANNOVAR |
Pharmacogenomics: Variants That Affect Drug Response
Not all medically relevant variants cause disease — some determine how a patient metabolizes drugs:
DPYD: variants reduce dihydropyrimidine dehydrogenase activity → severely reduced 5-fluorouracil (chemotherapy) clearance → potentially fatal toxicity. CPIC guidelines recommend DPYD genotyping before 5-FU prescription.
CYP2C19: poor metabolizers (loss-of-function variants) fail to convert clopidogrel (anti-platelet) to its active form → reduced antiplatelet effect → higher cardiovascular event risk. Ultra-rapid metabolizers have increased activation.
TPMT/NUDT15: variants reduce thiopurine methyltransferase activity → increased thioguanine nucleotide levels from azathioprine/6-mercaptopurine → bone marrow toxicity. Standard of care in pediatric leukemia.
The CPIC (Clinical Pharmacogenomics Implementation Consortium) publishes evidence-based guidelines for gene-drug pairs — a direct application of clinical variant interpretation to precision pharmacology.
The Growing Burden of VUS
As sequencing becomes routine, the volume of variants of uncertain significance (VUS) has grown dramatically. A patient undergoing hereditary cancer panel testing receives an average of 1–2 VUS, in addition to any pathogenic findings.
VUS burden creates clinical uncertainty — neither actionable nor dismissible. ClinGen's variant curation working groups are systematically re-evaluating variants across genes to reclassify VUS as evidence accumulates. Machine learning methods (including AlphaMissense, trained on evolutionary and structural data) are improving in silico pathogenicity prediction, reducing the VUS burden computationally.
The long-term trend: more evidence, better algorithms, and larger databases are progressively reclassifying VUS as either pathogenic or benign — making clinical sequencing progressively more interpretable.