Projects / Project Clarity
Local Research Project

πŸ•ΈοΈ Project Clarity

Code Intelligence & Knowledge Graph Visualisation β€” transforming opaque Python repositories into interactive, human-readable dependency maps with semantic decomposition and zero-loss reconstruction.

100%Code Reconstruction
V5Current Version
3Abstraction Levels
1000+Node Graph Scale
PythonASTPyvis / vis.js StreamlitDockerGitPythonRadonPydantic
← All Projects

TL;DR β€” Quick Summary

The Core Idea

Most code understanding tools are text-based search β€” grep, regex, static text analysis. Project Clarity takes a fundamentally different approach: it treats a Python codebase as a graph of logical entities, where every function, class, and module is a node, and every call/import relationship is an edge with a weight (how many times does A call B?).

This graph representation enables capabilities impossible with text search: visualising hidden circular dependencies, identifying "god functions" with hundreds of dependents, finding dead code islands, simulating execution flows, and β€” most uniquely β€” proving that the system understands the code completely by reconstructing the entire repository from the extracted graph.

Technical Architecture

πŸ“₯ Data Acquisition Layer

Two input modes: (1) Remote repositories β€” clones any public/private GitHub repo via GitPython into secured temporary storage; (2) Local directories β€” direct filesystem analysis with MD5-based change detection to manage state across incremental runs.

🧠 Semantic Intelligence Engine

Custom AST NodeVisitor performing three passes: First Pass (Identification) β€” locates all functions and classes with source segments and line numbers. Second Pass (Dependency Mapping) β€” resolves aliased imports and tracks inter-function calls. Third Pass (Data Linkage) β€” tracks variable assignments from function returns.

🎨 Visualisation Layer

Streamlit dashboard with three abstraction levels (Function/File/Directory). Interactive physics-based force-directed graphs via Pyvis (vis.js under the hood) β€” nodes repel and edges attract based on dependency weight. Optimised for 1000+ node graphs.

πŸ” Rebuild Engine

The repo_builder module takes CodeChunk objects, sorts them by original line numbers, and reconstructs the entire repository from scratch. Integration with Black for automated code formatting during rebuild. Serves as both a proof-of-correctness and a codebase-wide refactoring tool.

The CodeChunk Data Model

Every logical block of code is encapsulated as a CodeChunk Pydantic object with:

Unique ID β€” deterministic hash from file path + function signature
Source Segment β€” exact code with start_line, end_line for perfect reconstruction
Dependency Counter β€” weighted adjacency list (not just who, but how many times)
Complexity Metrics β€” Cyclomatic Complexity via Radon + Lines of Code
Inputs/Outputs β€” extracted from function signatures and return statements
Hash-based Identity β€” detects code changes across incremental runs

The V1 β†’ V5 Development Journey

VersionMilestone
V1.0Parser Foundation β€” first recursive AST visitor for Python files. Proved the CodeChunk concept.
V2.0Visual Revolution β€” Pyvis + Streamlit integration. Interactive force-directed graphs became the core UI.
V3.0Intelligence Layer β€” Radon for complexity metrics, GitPython for remote repo handling, aliased import resolution.
V4.0Rebuild Engine β€” repo_builder proved semantic completeness by reconstructing repos from graph data.
V5.0 (Current)Containerisation & Scale β€” full Docker support, optimised rendering for 1000+ node graphs, DataLink model for data flow tracking.

Outcomes

πŸ”
100% Semantic Reconstruction Accuracy
Rebuilt repos run identically to originals β€” proving complete semantic decomposition
πŸ”¬
X-Ray Code Analysis
Revealed hidden circular imports and high-complexity hotspots in real-world research codebases
πŸš€
Zero-Config Deployment
Docker: any developer can analyse their entire codebase with a single command
πŸ“š
Educational Use
Used as a pedagogical tool to teach code modularity and dependency management principles
← All Projects