๐Ÿงฌ Biotech Institute
Educational Resources

Genomics & Sequencing

Reading the book of life at scale. From Sanger sequencing to nanopore real-time reads, and the computational tools that turn raw data into biological insight.

History of Sequencing

Sanger Sequencing (1977)

Frederick Sanger developed chain-termination sequencing using dideoxynucleotides (ddNTPs). Each ddNTP (ddATP, ddCTP, ddGTP, ddTTP) lacks the 3'-OH needed for elongation, terminating the chain at random positions. Gel electrophoresis separates fragments by size, revealing the sequence. Sanger received his second Nobel Prize for this (1980). Automated capillary electrophoresis Sanger sequencing (Applied Biosystems 3730) can read ~800-1000 bp per run with >99.999% accuracy. Still the gold standard for confirming short sequences.

The Human Genome Project (1990-2003)

Next-Generation Sequencing Revolution (2005-present)

Massively parallel sequencing technologies dropped the cost of a human genome from $2.7 billion to under $200 (2024). Key platforms: 454 (pyrosequencing, 2005, discontinued), Illumina (sequencing by synthesis, 2006, dominant), Ion Torrent (semiconductor, 2010), PacBio (single-molecule, 2011), Oxford Nanopore (nanopore, 2014).

1977 — Sanger Sequencing
Chain termination. First genomes. ~800 bp reads.
2005 — Next-Gen (Illumina)
Sequencing by synthesis. Short reads, massive throughput.
2014 — Long-Read (Nanopore + PacBio)
Single-molecule. 10-100 kb reads. Resolves repeats.
2022 — T2T Complete Genome
First truly complete human genome. +200 Mbp resolved.

T2T Consortium (2022)

The Telomere-to-Telomere (T2T) Consortium completed the first truly complete human genome sequence โ€” including centromeres, telomeres, and ribosomal DNA arrays that were missing from the original Human Genome Project assembly. Added ~200 million base pairs (8% of the genome) previously unresolved. Used a combination of PacBio HiFi and Oxford Nanopore ultra-long reads.

📖 Imagine your DNA is a book with 3 billion letters. Sequencing is reading that whole book! The first time we read a human genome took 13 years and cost $2.7 billion. Now we can do it in a day for about $200. Some machines are so small they plug into your phone via USB!

Illumina Sequencing

Illumina dominates the sequencing market (~80% of all sequencing data worldwide). Their technology, sequencing by synthesis (SBS), generates massive amounts of short, accurate reads at low cost.

How It Works

Fragment DNA
Ligate adapters
Cluster on flow cell
SBS + image
FASTQ output

Platforms

Strengths & Limitations

Oxford Nanopore Technology

Oxford Nanopore Technologies (ONT) sequences DNA or RNA by threading individual molecules through protein nanopores and measuring the resulting electrical current changes. No amplification, no fluorescence โ€” direct, single-molecule, real-time sequencing.

How It Works

Platforms

Ultra-Long Reads

Nanopore's headline capability: reads exceeding 1 million bases (current record: >4 Mb). Ultra-long reads span entire structural variants, centromeric repeats, and segmental duplications that are unresolvable with short reads. Critical for the T2T complete human genome assembly. Library prep using high-molecular-weight DNA extraction (phenol-chloroform, agarose plug) is key to achieving ultra-long reads.

PacBio (Pacific Biosciences)

PacBio uses Single-Molecule Real-Time (SMRT) sequencing. A DNA polymerase, fixed at the bottom of a nanophotonic well (zero-mode waveguide, ZMW), synthesizes a complementary strand while fluorescent nucleotide incorporations are observed in real time.

How It Works

Platforms

Applications

Genome Assembly

Genome assembly reconstructs the complete genomic sequence from short or long sequencing reads. It is a computational puzzle: overlap reads, resolve repeats, and produce contiguous sequences (contigs) that ideally span entire chromosomes.

Approaches

Assembly Algorithms

Assembly Quality Metrics

Read Alignment

Read alignment (mapping) places sequencing reads onto a reference genome, finding the position and orientation where each read best matches. It is the first step in most resequencing analyses (variant calling, gene expression quantification, ChIP-seq peak calling).

Short-Read Aligners

Long-Read Aligners

Output Format: SAM/BAM/CRAM

Variant Calling

Variant calling identifies positions where a sample's genome differs from the reference. Variants range from single-nucleotide changes (SNVs) to large structural rearrangements (SVs). Accurate variant calling is essential for clinical genomics, population genetics, and understanding disease.

Variant Types

Variant Callers

Variant Annotation

Single-Cell Genomics

Bulk sequencing averages signals across millions of cells, masking cell-to-cell heterogeneity. Single-cell technologies sequence individual cells, revealing cell types, states, developmental trajectories, and rare populations that are invisible in bulk data.

scRNA-seq

Spatial Transcriptomics

Single-Cell Multiomics

Human Cell Atlas

An international effort to create a comprehensive reference atlas of all human cell types. Using single-cell and spatial transcriptomics to catalog every cell type in the human body, their molecular profiles, locations, and states. Led by Aviv Regev and Sarah Teichmann. As of 2025, data from >100 million cells across most human organs have been generated. The atlas will transform our understanding of development, health, and disease.

🔬 Your body has about 37 trillion cells, and they're not all the same โ€” brain cells, blood cells, muscle cells are all different! Single-cell sequencing lets scientists read the DNA instructions in ONE cell at a time. It's like going from reading one class's average test score to reading every student's individual answer sheet!

Resources

Sequence Read Archive (SRA)

NCBI's repository of raw sequencing data. Billions of reads from all sequencing platforms. Free to deposit and access. The world's largest archive of sequencing data.

NCBI | Free

Galaxy

Web-based platform for computational genomics. No command line needed. Hundreds of tools for alignment, variant calling, RNA-seq, ChIP-seq. Free public servers.

Galaxy Project | Free

Biostars

Q&A forum for bioinformatics. Ask questions about tools, pipelines, file formats. 160K+ users. The Stack Overflow of genomics.

Community | Free

HBC Training (Harvard)

Harvard Chan Bioinformatics Core training materials. RNA-seq, ChIP-seq, scRNA-seq, variant calling tutorials. Well-maintained, beginner-friendly.

Harvard | Free

1000 Genomes Project

Deep whole-genome sequencing of 2,504 individuals from 26 populations. Public resource for population genetics, variant frequency, and human diversity research.

IGSR | Free

gnomAD

Genome Aggregation Database. Allele frequencies from 76K+ genomes and 126K+ exomes. Essential for clinical variant interpretation and population genetics.

Broad Institute | Free