High-fidelity detection, visualisation, and reporting of noncovalent interactions in protein structures β with performance instrumentation, provenance, and extensibility built in.
Noncovalent interactions (NCIs) are the molecular forces that determine protein folding, stability, enzyme catalysis, drug binding, and macromolecular assembly. Yet, existing tools for studying them are fragmented β one tool for hydrogen bonds, another for Ο-stacking, none with a unified, reproducible analysis pipeline for all 15+ known NCI families simultaneously.
During my Masters research at IISER Pune on chalcogen, pnictogen, and tetrel bonding in protein structures, I repeatedly faced this gap. Analysing a single protein required stitching together outputs from multiple incompatible tools, with no provenance tracking and inconsistent geometric criteria. MolBridge was built to be the unified, literature-grounded platform I wished had existed.
MolBridge follows a layered architecture separating detection logic, computation strategy, and interfaces.
A decorator-based detector registry allows each NCI family to define its own geometric criteria, parameter presets, and detection logic independently β making the system fully extensible by adding new detectors without touching core code.
Vector geometry fast-paths (NumPy), KD-tree spatial pruning, adaptive threshold tuning, task graph precompute stage, shared-memory parallelism (POSIX), and optional Numba/Rust kernels. Auto-profile mode selects the right strategy per structure size.
FastAPI REST backend with async job execution, progress tracking, and multiple output formats (JSON, CSV, PDF, Excel). Full Swagger documentation at /docs. Supports programmatic batch processing of hundreds of PDB structures.
Streamlit web interface with interactive py3Dmol 3D viewer, Plotly heatmaps and distribution charts, Ramachandran plots, force-directed interaction network graphs, command palette (Ctrl+K), and scenario profiles as YAML templates.
| Technique | What it does | Impact |
|---|---|---|
| Vector Geometry Fast-Paths | Batched distance/angle math via NumPy matrices instead of nested loops | 2-5x speedup on angular calculations |
| KD-Tree Spatial Pruning | Partitions 3D space to eliminate irrelevant atom pairs in O(N log N) | Eliminates billions of irrelevant calculations |
| Adaptive Threshold Tuning | Dynamically relaxes/tightens distance cutoffs per detector based on candidate density | Balances accuracy and speed per structure |
| Shared Memory Parallelism | POSIX shared memory blocks for coords, ring centroids, H-bond donor/acceptors across process pool | Near-zero copy overhead for large structures |
| Numba JIT Kernels | JIT-compiled pairwise distance and geometry primitives | C-speed execution without C++ complexity |
| Rust Geometry Extension (opt-in) | PyO3 pairwise_sq_dists β fastest available backend | Maximum performance for massive proteins |
| Task Graph Precompute | Extracts aromatic rings, centroids, donors/acceptors once per structure, fans out to all detectors | Eliminates redundant recomputation across 15 detectors |
MolBridge treats reproducibility as a first-class concern β something often absent in research-grade bioinformatics tools.
Every export embeds a provenance digest: structure signature + parameters + detector set + version tag. Researchers can cite this in manuscripts for full reproducibility.
A curated set of PDB structures forms a golden baseline. CI enforces that interaction counts and timing don't deviate beyond 5% between versions.
All geometric thresholds are derived from published crystallographic literature (CSD studies). Three presets: Conservative, Literature Default, and Exploratory.
Interaction records are normalised to a common schema regardless of which detector produced them, enabling cross-family analysis and consistent CSV/Excel exports.
Redesigned HPC data workflows on IISER Pune's ParamBrahma cluster, delivering 60% reduction in total computational runtime and enabling higher-throughput protein analysis.
By centralising fragmented datasets and automating schema validation, reduced data exploration time by 40% for early users compared to previous ad hoc methods.
The only project in the portfolio with a publicly accessible live deployment. Accessible to the global structural biology community at molbridge.streamlit.app.
Directly integrates into ongoing Masters research on pnictogen and tetrel bonding β serving as both the analytical tool and the web-based validation platform for the thesis.