Projects / MolBridge
Live Β· molbridge.streamlit.app IISER Pune Research

🌿 MolBridge: NonCovalent Atlas

High-fidelity detection, visualisation, and reporting of noncovalent interactions in protein structures β€” with performance instrumentation, provenance, and extensibility built in.

15+Interaction Families
60%Runtime Reduction
40%Faster Exploration
4Interfaces
PythonStreamlitFastAPI BioPythonSciPyPlotly DockerNumbaNetworkXpy3Dmol
🌐 Launch Live App ← All Projects

TL;DR β€” Quick Summary

Problem & Motivation

Noncovalent interactions (NCIs) are the molecular forces that determine protein folding, stability, enzyme catalysis, drug binding, and macromolecular assembly. Yet, existing tools for studying them are fragmented β€” one tool for hydrogen bonds, another for Ο€-stacking, none with a unified, reproducible analysis pipeline for all 15+ known NCI families simultaneously.

During my Masters research at IISER Pune on chalcogen, pnictogen, and tetrel bonding in protein structures, I repeatedly faced this gap. Analysing a single protein required stitching together outputs from multiple incompatible tools, with no provenance tracking and inconsistent geometric criteria. MolBridge was built to be the unified, literature-grounded platform I wished had existed.

Architecture

MolBridge follows a layered architecture separating detection logic, computation strategy, and interfaces.

πŸ”¬ Detection Layer

A decorator-based detector registry allows each NCI family to define its own geometric criteria, parameter presets, and detection logic independently β€” making the system fully extensible by adding new detectors without touching core code.

⚑ Performance Layer

Vector geometry fast-paths (NumPy), KD-tree spatial pruning, adaptive threshold tuning, task graph precompute stage, shared-memory parallelism (POSIX), and optional Numba/Rust kernels. Auto-profile mode selects the right strategy per structure size.

πŸ“‘ API Layer

FastAPI REST backend with async job execution, progress tracking, and multiple output formats (JSON, CSV, PDF, Excel). Full Swagger documentation at /docs. Supports programmatic batch processing of hundreds of PDB structures.

🎨 UI Layer

Streamlit web interface with interactive py3Dmol 3D viewer, Plotly heatmaps and distribution charts, Ramachandran plots, force-directed interaction network graphs, command palette (Ctrl+K), and scenario profiles as YAML templates.

Interaction Families Detected

β†’ Hydrogen Bonds (conventional, low-barrier, C5-type)
β†’ Halogen Bonds (Cl, Br, I, F β€” sigma hole)
β†’ π–π Stacking (face-to-face & edge-to-face)
β†’ Cation–π Interactions
β†’ Anion–π Interactions
β†’ CH–π Interactions
β†’ Sulfur–π Interactions
→ n→π* Orbital Interactions
β†’ Chalcogen Bonds (S, Se, Te)
β†’ Pnictogen Bonds (N, P, As)
β†’ Tetrel Bonds (C, Si, Ge)
β†’ Salt Bridges (ARG/LYS vs ASP/GLU)
β†’ Hydrophobic Contacts
β†’ London Dispersion Forces
β†’ Metal Coordination (Zn, Fe, Mg, Ca, Cu…)
β†’ H-bond Subtype Classification (5 classes)

Performance Engineering

Key innovation: an adaptive auto-profile system that inspects atom count, detector count, and estimated workload β€” then selects the optimal combination of performance features automatically.
TechniqueWhat it doesImpact
Vector Geometry Fast-PathsBatched distance/angle math via NumPy matrices instead of nested loops2-5x speedup on angular calculations
KD-Tree Spatial PruningPartitions 3D space to eliminate irrelevant atom pairs in O(N log N)Eliminates billions of irrelevant calculations
Adaptive Threshold TuningDynamically relaxes/tightens distance cutoffs per detector based on candidate densityBalances accuracy and speed per structure
Shared Memory ParallelismPOSIX shared memory blocks for coords, ring centroids, H-bond donor/acceptors across process poolNear-zero copy overhead for large structures
Numba JIT KernelsJIT-compiled pairwise distance and geometry primitivesC-speed execution without C++ complexity
Rust Geometry Extension (opt-in)PyO3 pairwise_sq_dists β€” fastest available backendMaximum performance for massive proteins
Task Graph PrecomputeExtracts aromatic rings, centroids, donors/acceptors once per structure, fans out to all detectorsEliminates redundant recomputation across 15 detectors

Reproducibility & Scientific Rigour

MolBridge treats reproducibility as a first-class concern β€” something often absent in research-grade bioinformatics tools.

πŸ” Provenance Hashing

Every export embeds a provenance digest: structure signature + parameters + detector set + version tag. Researchers can cite this in manuscripts for full reproducibility.

πŸ“Š Golden Regression Framework

A curated set of PDB structures forms a golden baseline. CI enforces that interaction counts and timing don't deviate beyond 5% between versions.

πŸ“ Literature-Anchored Criteria

All geometric thresholds are derived from published crystallographic literature (CSD studies). Three presets: Conservative, Literature Default, and Exploratory.

πŸ” Normalised Records

Interaction records are normalised to a common schema regardless of which detector produced them, enabling cross-family analysis and consistent CSV/Excel exports.

Outcomes & Impact

πŸ“‰

60% Runtime Reduction

Redesigned HPC data workflows on IISER Pune's ParamBrahma cluster, delivering 60% reduction in total computational runtime and enabling higher-throughput protein analysis.

⚑

40% Faster Data Exploration

By centralising fragmented datasets and automating schema validation, reduced data exploration time by 40% for early users compared to previous ad hoc methods.

🌐

Live Deployed Application

The only project in the portfolio with a publicly accessible live deployment. Accessible to the global structural biology community at molbridge.streamlit.app.

πŸ”¬

Research Integration

Directly integrates into ongoing Masters research on pnictogen and tetrel bonding β€” serving as both the analytical tool and the web-based validation platform for the thesis.

Roadmap

🌐 Open Live App ← All Projects