๐Ÿงฌ Biotech Institute
Educational Resources

Bioinformatics Tools

The computational toolkit of modern biology. From sequence searching to phylogenetic trees, and the programming languages and databases that make it all possible.

BLAST

BLAST (Basic Local Alignment Search Tool) is the most widely used bioinformatics program. Published by Altschul et al. in 1990, it searches sequence databases to find regions of similarity between a query sequence and database sequences. BLAST has been cited over 100,000 times โ€” among the most cited papers in all of science.

How BLAST Works

BLAST Programs

Query sequence
Seed words (k-mers)
Extend hits
Score + E-value
Alignments

Running BLAST

🔍 BLAST is like Google, but for DNA and proteins! You give it a sequence of letters (like ATCGATCG) and it searches through BILLIONS of known sequences to find matches. "Hey, this gene in your mystery bacteria looks 95% similar to a gene in E. coli!" It's the most-used tool in all of biology.

Multiple Sequence Alignment

Multiple sequence alignment (MSA) simultaneously aligns three or more sequences to identify conserved regions, functional motifs, and evolutionary relationships. MSA is the starting point for phylogenetics, structure prediction, and functional annotation.

Tools

Alignment Visualization

Alignment Quality

Phylogenetics

Phylogenetics reconstructs evolutionary relationships between sequences (or organisms) in the form of a tree. The tree topology shows which sequences share common ancestors, and branch lengths represent evolutionary distance (usually measured in substitutions per site).

Sequences
MSA
Trim
Model selection
Build tree
Phylogeny

Tree-Building Methods

Substitution Models

Branch Support

Tree Visualization

🌳 Phylogenetics is building a family tree โ€” but for ALL living things! By comparing DNA sequences, scientists can figure out which species are cousins, which share a great-great-grandparent, and when they split apart. Humans and chimps are like siblings (99% same DNA). Humans and bananas are like very distant cousins (60% same DNA)!

R / Bioconductor

R is the dominant programming language for statistical bioinformatics. Bioconductor is a curated repository of R packages for biological data analysis, with over 2,200 packages covering genomics, proteomics, flow cytometry, imaging, and more.

Core Bioconductor Packages

R for Bioinformatics

# Typical DESeq2 workflow
library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData = counts,
                              colData = metadata,
                              design = ~ condition)
dds <- DESeq(dds)
res <- results(dds, contrast = c("condition", "treated", "control"))
res <- lfcShrink(dds, coef = 2, type = "apeglm")
plotMA(res)  # MA-plot
write.csv(as.data.frame(res), "de_results.csv")

Python / Biopython

Python is the dominant language for bioinformatics pipelines, data wrangling, machine learning, and tool development. Biopython is the core library for biological computation in Python.

Biopython

Key Python Libraries

Python Example

# Parse FASTA, calculate GC content
from Bio import SeqIO
from Bio.SeqUtils import gc_fraction

for record in SeqIO.parse("genome.fasta", "fasta"):
    gc = gc_fraction(record.seq) * 100
    print(f"{record.id}: {len(record.seq)} bp, GC={gc:.1f}%")

Biological Databases

Sequence Databases

Functional Databases

Structure Databases

Clinical / Disease Databases

Analysis Pipelines

Workflow Managers

Common Pipelines

Resources

Rosalind

Platform for learning bioinformatics through problem solving. 280+ problems from string algorithms to population genetics. Solve in any language. The Project Euler of biology.

Free | Interactive

Bioinformatics.org

Open-access resource hub. Tutorials, tools, wiki, job board. Community-maintained links to bioinformatics software and databases.

Community | Free

Biostars

Q&A forum for bioinformatics questions. Active community of 160K+ users. Practical answers to pipeline, tool, and data format questions.

Community | Free

nf-core

Curated collection of 90+ Nextflow bioinformatics pipelines. RNA-seq, variant calling, metagenomics, single-cell. Production-ready, containerized, well-documented.

Community | Free

Biopython

Python tools for computational biology. Sequence I/O, BLAST, alignment, phylogenetics, PDB parsing, Entrez access. 20+ years of development.

Open-source | Free

Bioconductor

R packages for genomics and bioinformatics. 2,200+ packages. DESeq2, GenomicRanges, Seurat integration. Semi-annual releases with quality checks.

Open-source | Free