ΔΔG Calculator — Shobhit Vats Sharma

TL;DR — Quick Summary

Predicts protein stability changes (ΔΔG) upon amino acid mutation using a custom molecular mechanics force field — not a black-box ML model. Every energy component is physically explainable.
Implements Lennard-Jones VDW potentials, Coulomb electrostatics, harmonic bonded terms, LCPO SASA solvation, and Generalized Born implicit solvent from the AMBER ff14SB force field.
A 4-step "Correct Workflow" (pre-minimisation → mutation → neighbourhood optimisation → ΔΔG) resolves the crystallographic strain problem that invalidates naive approaches.
Performance evolved from O(N²) naive Python (8 hours) to GPU-accelerated O(N log N) kernels (<20 seconds) — a 100x+ improvement validated on p53 (1TUP) and Barnase.
Deployed as a FastAPI REST microservice with Kubernetes orchestration and Prometheus/Grafana monitoring — enterprise-grade infrastructure at zero licensing cost.

The Physics Behind the Predictions

Unlike black-box ML models that predict stability scores without explanation, this engine implements Molecular Mechanics (MM) — where protein energy is calculated directly from atomic coordinates using well-established physical potentials. The mutation stability change is defined as:

ΔΔG = E_mutant − E_wild-type

A positive ΔΔG means the mutation destabilises the protein (wild-type is more stable). A negative ΔΔG means stabilisation.

Van der Waals (Lennard-Jones 12-6)

Models atomic repulsion (Pauli exclusion) and attraction (London dispersion). The 12-power term captures the "hard wall" of overlapping electron clouds; the 6-power term captures the gentle attractive well at optimal distances. Critical for detecting steric clashes from mutations.

Coulomb Electrostatics

Charge-charge interactions with a distance-dependent dielectric ε(r) = r to simulate implicit water screening without the cost of explicit molecules. Captures salt bridge formation/disruption by mutations.

Bonded Terms (Harmonic)

Bond stretching, angle bending, and torsional/dihedral potentials prevent the protein from fragmenting during energy minimisation. Essential for structurally stable simulations. Parameters from AMBER ff14SB.

LCPO Solvation (SASA)

Linear Combination of Pairwise Overlaps calculates the Solvent Accessible Surface Area with fully analytical gradients — enabling derivative-based optimisation. The most expensive term; a SASA cache refreshes it every N iterations for a 5x speedup with negligible accuracy loss.

The "Correct Workflow" — Solving Crystallographic Strain

Key insight: Raw PDB crystal structures contain artificial atomic clashes ("crystallographic strain") exceeding 100,000,000 kcal/mol. A naive mutation on such a structure measures strain relief, not mutation effect. This is an industry-wide trap.

The engine implements a rigorous 4-step protocol to avoid this:

Load Raw PDB

Import the crystal structure with all its inherent strain.

Pre-Minimise (Whole Protein)

Relax the entire wild-type structure to a local energy minimum. For p53 (1TUP), this reduces energy from 105,698,651 kcal/mol → −19,666 kcal/mol. This clean, relaxed state is saved as the reference.

Create Mutant & Neighbourhood-Minimise

Perform in-silico mutagenesis on the relaxed structure using Dunbrack-dependent rotamer libraries, then minimise only atoms within a 10 Å cutoff of the mutation site using L-BFGS-B with full analytical gradients.

Calculate ΔΔG

ΔΔG = E(mutant minimised) − E(wild-type relaxed). Each energy component (VDW, Coulomb, SASA) is reported separately for physical interpretability.

Validation Case	Wild-Type Energy	Mutant Energy	ΔΔG	Result
p53 1TUP — K101A	−19,666.65 kcal/mol	−19,653.63 kcal/mol	+13.02	Destabilising ✓
Barnase — HIS102→SER	+29,669.83 kcal/mol	+29,577.75 kcal/mol	−92.08	Stabilising ✓

Both results correlate with FoldX and experimental literature benchmarks.

Performance Optimisation — Three Phases

Phase	Technique	Impact
Phase 1 — Algorithmic	KD-Tree spatial indexing — atoms outside 12 Å interaction sphere pruned in O(log N)	Eliminates billions of irrelevant pairwise calculations
Phase 2 — Acceleration	Numba JIT compilation with CPU prange parallelism; CUDA kernel port for GPU offloading	100x speedup for massive proteins on GPU
Phase 3 — Hardware	State-of-Array (SoA) memory layout — coordinates stored as contiguous float64 NumPy arrays	Full CPU cache line utilisation; eliminates object overhead
Bonus — SASA Cache	Recalculates surface area every N=5 minimisation iterations instead of every step	~5x speedup in minimisation loop with negligible accuracy loss

Production Infrastructure (Phase 4)

FastAPI REST Microservice

Three endpoints: POST /calculate (submit job), GET /status/{job_id} (real-time progress), GET /result/{job_id} (energy breakdown + stability verdict). JSON request/response. Single-mutation prediction in <15 seconds.

Docker + Kubernetes

Fully containerised with all biophysical dependencies. Kubernetes manifests include HorizontalPodAutoscaling for burst-throughput screening of thousands of mutations per hour.

Prometheus + Grafana

Live monitoring of calculation latency, error rates, and GPU memory utilisation. Production-grade observability at zero licensing cost.

Benchmark Suite

Automated regression tests comparing runtimes against stored baselines for small (Barnase), medium (1TUP), and large (HSA) proteins. Energy gradient verified against numerical derivatives to tolerance 10⁻⁶.

← All Projects

🧬 ΔΔG Calculator

TL;DR — Quick Summary

The Physics Behind the Predictions

Van der Waals (Lennard-Jones 12-6)

Coulomb Electrostatics

Bonded Terms (Harmonic)

LCPO Solvation (SASA)

The "Correct Workflow" — Solving Crystallographic Strain

Performance Optimisation — Three Phases

Production Infrastructure (Phase 4)

FastAPI REST Microservice

Docker + Kubernetes

Prometheus + Grafana

Benchmark Suite