Part 6·6.4·16 min read

Genetic Diseases and Pathogenic Variants

How inherited DNA variants cause disease — inheritance patterns, variant databases, and the clinical pipeline from sequencing to diagnosis.

genetic diseasesclinical geneticsvariantsinheritancediagnosis

Every person carries an estimated 4–5 million variants relative to the human reference genome, including 2–3 de novo mutations (not present in either parent). Most are benign. About 50–100 are in genes associated with disease. A handful may be medically actionable. The clinical genetics problem — identifying which variants cause disease in a specific patient — is one of the most consequential applications of bioinformatics.

Modes of Inheritance

The way a disease variant is transmitted from parent to child depends on whether it's dominant or recessive, and whether it's on an autosome, the X chromosome, or mitochondrial DNA.

Autosomal Dominant (AD)

One mutant allele is sufficient to cause disease. Each child of an affected parent has a 50% chance of inheriting the variant.

Mechanism: either the mutant protein is toxic/interfering (dominant negative) or a 50% reduction in functional protein is insufficient (haploinsufficiency).

Examples:

  • Huntington's disease (HTT CAG expansion): expanded polyglutamine tract in huntingtin creates a toxic gain-of-function protein
  • BRCA1/2 (hereditary breast/ovarian cancer): haploinsufficiency; one functional copy is usually sufficient for DNA repair, but the second hit in somatic cells leads to cancer
  • Marfan syndrome (FBN1): dominant negative; the mutant fibrillin-1 disrupts normal fibrillin assembly

AD conditions often show variable expressivity (affected individuals differ in severity) and incomplete penetrance (not everyone who carries the variant develops the disease).

Autosomal Recessive (AR)

Both alleles must be mutated for disease. Carriers (one mutant allele) are usually healthy. Two carrier parents have a 25% probability of an affected child.

Mechanism: requires complete or near-complete loss of protein function. A single functional allele usually provides enough protein for normal function.

Examples:

  • Cystic fibrosis (CFTR mutations): ~1/25 Northern European carriers; ~1/2500 births affected
  • Sickle cell disease (HBB E6V): HbS homozygotes have profound hemolytic anemia; heterozygotes have sickle cell trait — mild symptoms but protective against malaria
  • Phenylketonuria (PKU) (PAH mutations): phenylalanine hydroxylase deficiency; dietary phenylalanine accumulates → neurotoxicity; newborn screening + dietary treatment prevents intellectual disability

X-linked

Genes on the X chromosome. Males (XY) are hemizygous — they have only one X, so a single recessive variant causes disease. Females (XX) can be carriers.

X-linked recessive (XLR): males affected, females usually carriers.

  • Duchenne muscular dystrophy (DMD): frame-disrupting dystrophin mutations; males affected; females carriers
  • Hemophilia A (F8) and B (F9)
  • Color blindness (OPN1LW/MW)

X-linked dominant (XLD): affects both males and females.

  • Rett syndrome (MECP2): more severe in males; often lethal in hemizygous males

Mitochondrial

Mitochondrial DNA has ~37 genes. mtDNA is maternally inherited (all mitochondria in the embryo come from the oocyte). Affected mothers pass mtDNA variants to all children; affected fathers do not.

Mitochondrial diseases affect high-energy tissues: brain, muscle, heart. Examples: MELAS (mitochondrial encephalomyopathy), Leber's hereditary optic neuropathy (LHON).

A complication: heteroplasmy — cells may contain a mixture of normal and mutant mtDNA. Disease severity correlates with the proportion of mutant mtDNA, which can vary between tissues and change over time.

De Novo Mutations

Not all genetic diseases are inherited. De novo mutations — new mutations not present in either parent — arise in the germ cells (egg or sperm) or in the early embryo. They're found by trio sequencing: sequencing the patient plus both parents and looking for variants in the patient that aren't in either parent.

De novo mutation rate: ~1 × 10⁻⁸ per base per generation → approximately 60 de novo single nucleotide variants per person. Paternal age is the dominant factor: the mutation rate in sperm increases with age (sperm undergo far more cell divisions than eggs).

De novo mutations cause:

  • ~50% of severe intellectual disability cases
  • Most cases of early-onset neurodevelopmental conditions (autism, epilepsy, schizophrenia)
  • Achondroplasia (FGFR3 G380R de novo in ~98% of cases)

De novo status is a strong indicator of pathogenicity: if a variant is not inherited and causes a serious phenotype, it's likely causative.

Variable Expressivity and Penetrance

Even with a clearly pathogenic variant, not everyone who carries it is equally affected:

Penetrance: the fraction of individuals with a genotype who display the phenotype. BRCA1 pathogenic variants confer ~70–80% lifetime risk of breast cancer — high but not 100% (incomplete penetrance).

Variable expressivity: individuals with the same pathogenic variant can have different phenotypic severity. NF1 (neurofibromatosis type 1) carriers range from mild café-au-lait spots only to severe neurofibromas and malignant peripheral nerve sheath tumors.

Modifying factors: other genetic variants (modifier genes) and environmental exposures can modulate penetrance and expressivity. This is why GWAS and polygenic risk scores are relevant even for monogenic diseases — genetic background modifies outcomes.

The Clinical Genetics Pipeline

Step 1: Clinical Recognition and Ordering Sequencing

The clinical process begins with a patient (or family) presenting with symptoms suggesting a genetic condition. The clinician decides what test to order:

  • Chromosomal microarray: first-tier test for intellectual disability/autism; detects CNVs across the genome
  • Single gene sequencing: when the clinical presentation strongly suggests a specific diagnosis (e.g., DMD in a boy with early-onset proximal weakness)
  • Gene panel: sequencing of multiple genes associated with the same phenotypic spectrum (e.g., hereditary cancer panel: BRCA1/2, PALB2, CHEK2, ATM, etc.)
  • Exome sequencing: sequencing all coding regions (~1–2% of the genome); first-tier for unexplained pediatric disease
  • Genome sequencing: the full genome; highest diagnostic yield; increasingly cost-competitive

Step 2: Variant Calling and Annotation

Raw sequencing reads → alignment → variant calling → annotation.

Annotation pipeline:

  1. ANNOVAR/VEP: predict functional consequence (synonymous, missense, stop-gain, splice site)
  2. Population frequencies: gnomAD allele frequency (AF). If AF > 1% in gnomAD, unlikely to be a dominant disease variant with high penetrance
  3. In silico pathogenicity predictors: SIFT, PolyPhen-2, REVEL, AlphaMissense (uses protein structure prediction)
  4. ClinVar lookup: known classifications from other labs
  5. Literature search: published case reports, functional studies

Step 3: Variant Interpretation (ACMG/AMP Guidelines)

The ACMG/AMP 2015 guidelines (updated by ClinGen) provide a standardized framework:

Evidence typeCriteria examples
Population dataPM2: absent from controls; BS1: allele frequency above expected
ComputationalPP3: multiple in silico tools predict damaging; BP4: predict benign
FunctionalPS3: well-established functional studies show damaging effect; BS3: shows no damaging effect
SegregationPP1: co-segregates with disease in multiple affected family members
De novoPS2: confirmed de novo in affected individual; PM6: assumed de novo
AllelicPM3: detected in trans with a pathogenic variant (for AR)
Case dataPS4: prevalence in affected significantly increased vs. controls

Points accumulate to a final classification: pathogenic (≥5 pathogenic points), likely pathogenic, VUS, likely benign, or benign.

Step 4: Return of Results

Variants are reported in a clinical format:

  • Pathogenic/likely pathogenic variants: reported with interpretation and implications
  • VUS: reported with explanation that significance is unclear; patient and family may return for re-evaluation as evidence accumulates
  • Benign/likely benign: usually not reported

Secondary findings: ACMG recommends reporting pathogenic variants in 81 genes that are medically actionable regardless of the test indication — including BRCA1/2, Lynch syndrome genes, cardiac channelopathies, familial hypercholesterolemia genes. If you sequence a patient for any reason and find a BRCA1 pathogenic variant, you report it.

Key Databases for Clinical Variant Interpretation

DatabaseContentURL
ClinVarVariant-disease interpretations from labsncbi.nlm.nih.gov/clinvar
gnomADPopulation frequencies (800k exomes, 76k genomes)gnomad.broadinstitute.org
OMIMGene-disease associations + phenotype descriptionsomim.org
LOVDLocus-specific variant databases (per gene)lovd.nl
ClinGenGene-disease validity curation; variant curation rulesclinicalgenome.org
SpliceAIDeep learning splice effect predictionAvailable as VEP plugin
AlphaMissenseStructure-based missense pathogenicity from DeepMindAvailable via VEP/ANNOVAR

Pharmacogenomics: Variants That Affect Drug Response

Not all medically relevant variants cause disease — some determine how a patient metabolizes drugs:

DPYD: variants reduce dihydropyrimidine dehydrogenase activity → severely reduced 5-fluorouracil (chemotherapy) clearance → potentially fatal toxicity. CPIC guidelines recommend DPYD genotyping before 5-FU prescription.

CYP2C19: poor metabolizers (loss-of-function variants) fail to convert clopidogrel (anti-platelet) to its active form → reduced antiplatelet effect → higher cardiovascular event risk. Ultra-rapid metabolizers have increased activation.

TPMT/NUDT15: variants reduce thiopurine methyltransferase activity → increased thioguanine nucleotide levels from azathioprine/6-mercaptopurine → bone marrow toxicity. Standard of care in pediatric leukemia.

The CPIC (Clinical Pharmacogenomics Implementation Consortium) publishes evidence-based guidelines for gene-drug pairs — a direct application of clinical variant interpretation to precision pharmacology.

The Growing Burden of VUS

As sequencing becomes routine, the volume of variants of uncertain significance (VUS) has grown dramatically. A patient undergoing hereditary cancer panel testing receives an average of 1–2 VUS, in addition to any pathogenic findings.

VUS burden creates clinical uncertainty — neither actionable nor dismissible. ClinGen's variant curation working groups are systematically re-evaluating variants across genes to reclassify VUS as evidence accumulates. Machine learning methods (including AlphaMissense, trained on evolutionary and structural data) are improving in silico pathogenicity prediction, reducing the VUS burden computationally.

The long-term trend: more evidence, better algorithms, and larger databases are progressively reclassifying VUS as either pathogenic or benign — making clinical sequencing progressively more interpretable.