Code Intelligence & Knowledge Graph Visualisation β transforming opaque Python repositories into interactive, human-readable dependency maps with semantic decomposition and zero-loss reconstruction.
Most code understanding tools are text-based search β grep, regex, static text analysis. Project Clarity takes a fundamentally different approach: it treats a Python codebase as a graph of logical entities, where every function, class, and module is a node, and every call/import relationship is an edge with a weight (how many times does A call B?).
This graph representation enables capabilities impossible with text search: visualising hidden circular dependencies, identifying "god functions" with hundreds of dependents, finding dead code islands, simulating execution flows, and β most uniquely β proving that the system understands the code completely by reconstructing the entire repository from the extracted graph.
Two input modes: (1) Remote repositories β clones any public/private GitHub repo via GitPython into secured temporary storage; (2) Local directories β direct filesystem analysis with MD5-based change detection to manage state across incremental runs.
Custom AST NodeVisitor performing three passes: First Pass (Identification) β locates all functions and classes with source segments and line numbers. Second Pass (Dependency Mapping) β resolves aliased imports and tracks inter-function calls. Third Pass (Data Linkage) β tracks variable assignments from function returns.
Streamlit dashboard with three abstraction levels (Function/File/Directory). Interactive physics-based force-directed graphs via Pyvis (vis.js under the hood) β nodes repel and edges attract based on dependency weight. Optimised for 1000+ node graphs.
The repo_builder module takes CodeChunk objects, sorts them by original line numbers, and reconstructs the entire repository from scratch. Integration with Black for automated code formatting during rebuild. Serves as both a proof-of-correctness and a codebase-wide refactoring tool.
Every logical block of code is encapsulated as a CodeChunk Pydantic object with:
| Version | Milestone |
|---|---|
| V1.0 | Parser Foundation β first recursive AST visitor for Python files. Proved the CodeChunk concept. |
| V2.0 | Visual Revolution β Pyvis + Streamlit integration. Interactive force-directed graphs became the core UI. |
| V3.0 | Intelligence Layer β Radon for complexity metrics, GitPython for remote repo handling, aliased import resolution. |
| V4.0 | Rebuild Engine β repo_builder proved semantic completeness by reconstructing repos from graph data. |
| V5.0 (Current) | Containerisation & Scale β full Docker support, optimised rendering for 1000+ node graphs, DataLink model for data flow tracking. |