OptiScan — Shobhit Vats Sharma

TL;DR — Quick Summary

Unlike standard linters (Pylint, Flake8) that focus on style and syntax, OptiScan targets "silent performance killers" — code patterns that are syntactically correct but computationally catastrophic (O(n²) string concatenation in loops, N+1 query patterns, unbatched I/O).
Built on LibCST (Concrete Syntax Tree) — preserving whitespace and comments for "perfect diff" generation — rather than Python's standard AST, which discards formatting.
A Three-Layer Filtering System achieves 0% false positive rate: the previous V1 prototype had a 100% false positive rate on string concatenation rules, flagging numeric incrementors (count += 1) as string issues.
Produces parallel reports: human-readable Markdown with quantified performance ROI, and AI-native JSON with deterministic line numbers for direct ingestion by autonomous LLM refactoring agents.
Hardware-aware multiprocessing via ProcessPoolExecutor bypasses Python's GIL — analysing 120+ files/second on an 8-core CPU.

The Mission: Zero False Positives

Standard linters are built on a philosophy of high recall — catch everything, accept some noise. This is fine for style enforcement, but catastrophic for an AI agent using findings to autonomously refactor production code. A single false positive that causes an AI agent to "fix" code incorrectly destroys trust in the entire system.

OptiScan's philosophy is the inverse: prefer a missed true positive over a single false positive. Every rule is built with multi-layer validation that conservatively flags only when all criteria are definitively met.

Version 1 had a 100% false positive rate on its primary rule. Version 3 has 0%. The evolution from V1 to V3 is the core engineering story of this project.

The V1 → V3 Evolution

Version	Engine	False Positive Rate	Key Achievement
V1.0	Basic AST pattern matching	100% on string rules	Conceptual prototype; proved the problem space
V2.0	LibCST (Concrete Syntax Tree)	0%	Three-Layer Filter System; "Conservative Flagging" philosophy; production-ready for CI/CD
V3.0 (Current)	LibCST + metadata providers	0%	Architectural & Bottleneck rule tiers; AI-perfect JSON schema; full contextual integrity (5-10 lines of surrounding code)

Detection Algorithms — The Three-Layer Filter

The challenge that broke V1: distinguishing result_str += item (inefficient — O(n²)) from count += 1 (perfectly fine — numeric increment). The Three-Layer Filter for PY-PERF-001:

Layer 1 — Name Heuristics

A dictionary of 30+ counter variable name patterns (ct, cnt, index, total, num, i, j, k, count…) is checked against the target variable. If the name matches, the finding is suppressed.

Layer 2 — Initialisation Analysis

The visitor tracks the first assignment of each variable within its scope. If initialised as 0 or any integer → marked numeric (suppress). If initialised as "" or any string literal → marked string (candidate).

Layer 3 — RHS Type Inference

The right-hand side of the += is inspected. String literals, f-strings, and str() calls confirm the finding. Numeric literals, variable names matching counter heuristics, or arithmetic expressions suppress it.

The Full Rule Catalog

Rule ID	Category	Pattern Detected	Impact
`PY-BOTTLENECK-201`	🔴 Critical	N+1 Query Pattern (data-loading calls inside loops)	Eliminates 100–10,000+ redundant I/O operations
`PY-BOTTLENECK-202`	🔴 Critical	Nested Loops — O(n²) complexity	Reduces to O(n) or O(n log n)
`PY-BOTTLENECK-203`	🔴 Critical	Tight Loop Hotspots (regex, complex math in loops)	Enables vectorisation or loop-invariant hoisting
`PY-PERF-001`	🟡 Auto-Fix	String Concatenation in loops (+= on str)	O(n²) → O(n) via str.join()
`PY-PERF-002`	🟡 Auto-Fix	Inefficient Membership (x in dict.keys())	Avoids view object creation and method lookup
`PY-PERF-003`	🟡 Auto-Fix	Manual .append() loops (convert to list comprehension)	Bytecode-level optimisation; only if loop is "pure"
`PY-ARCH-101`	🟠 Manual	Repeated Computation (method calls as subscripts in loops)	10-100x reduction via caching or precomputation
`PY-ARCH-102`	🟠 Manual	Inefficient List Tests (x in large_list)	O(n) → O(1) by converting to set
`PY-ARCH-103`	🟠 Manual	Unbatched I/O (open() or write() inside loops)	10-1,000x reduction in expensive system calls
`PY-ARCH-104`	🟠 Manual	Large List-Building (should return generators)	90%+ memory reduction for large datasets
`PY-ARCH-105`	🟠 Manual	Repeated Attribute Access (obj.attr in tight loops)	2-5x speedup by caching as local variable

Outcomes

🎯

0% False Positive Rate

From 100% (V1) to 0% (V2+) — enabling full autonomous agent deployment without human review

⚡

120 Files/Second

Hardware-aware multiprocessing — ProcessPoolExecutor with dynamic core detection bypasses Python GIL

💰

4-8 min Saved per Finding

Auto-fixable rules save developer manual refactoring time; architectural findings demonstrate 10-60x real-world speedups

🤖

AI-Agent Ready

Deterministic line numbers from CST metadata enable precise autonomous code edits; priority_score enables triage ordering

← All Projects

🔍 OptiScan

TL;DR — Quick Summary

The Mission: Zero False Positives

The V1 → V3 Evolution

Detection Algorithms — The Three-Layer Filter

Layer 1 — Name Heuristics

Layer 2 — Initialisation Analysis

Layer 3 — RHS Type Inference

The Full Rule Catalog

Outcomes