Projects / Aegis
Local Research Project

πŸ”± Aegis Intelligence Platform

Instagram Data-as-a-Service (DaaS) engine β€” zero-API-cost intelligence extraction via stealth browser automation, Chrome Extension engineering, and multi-page analytics dashboards.

$0API Cost
15DaaS Products
20Blockers Solved
4Dashboard Pages
PythonPlaywrightStreamlit SQLiteChrome MV3JavaScriptPlotly
← All Projects

TL;DR β€” Quick Summary

Why It Was Built

Instagram's official API provides only 200 calls/hour, requires business verification, and costs thousands per month at enterprise tier. The same data Instagram serves to its own users β€” posts, captions, engagement metrics, comment streams, audio trends β€” is observable through legitimate browser-based interception.

This project proves that a skilled engineer can build the same intelligence pipeline that powers $500–$3,000/month SaaS products (Sprout Social, Brandwatch, Iconosquare) for $0 β€” using open-source tools, a stealth browser, and deep knowledge of Instagram's React frontend and GraphQL API architecture.

The Scraper Engine β€” How It Works

Key architecture: context.route("**/*") β€” a transparent man-in-the-middle network proxy inside Playwright that intercepts every GraphQL response, parses it recursively, and saves to SQLite without modifying any traffic.
1
Session Bootstrap

One-time human login saves cookies/localStorage to state.json. All subsequent headless sessions load this state β€” bypassing Instagram's CAPTCHA and device fingerprinting entirely by inheriting trust from a genuine login event.

2
Traffic Interception

Playwright routes ALL network traffic through handle_route(). Each intercepted response is fetched, inspected, and passed through unmodified to avoid detection.

3
Recursive JSON Parsing

Instead of hard-coding API schema paths (which change constantly), the parser recursively walks every dict/list, identifying posts by heuristic signature: (shortcode OR code) AND (owner OR user). This makes the system immune to Instagram's A/B format changes β€” handling 5 known schema variants simultaneously.

4
Anti-Detection Stack

playwright-stealth patches 23 fingerprinting signals. Windows Chrome 120 UA string. Randomised 2–5s human-timing delays. Random scroll depths (800–1200px). Probabilistic organic engagement (5% like, 3% save) to maintain authentic trust ratios.

The GodMode Chrome Extension β€” The Hardest Problem

Embedding live Instagram content in Streamlit iframes requires solving three layers of browser security that Instagram enforces:

BlockerEngineering Solution
X-Frame-Options + CSP frame-ancestors β€” browsers refuse to render iframedeclarativeNetRequest MV3 rules strip these headers at the network layer before the browser processes them
JavaScript Framebusters β€” JS redirects parent tab if in iframeInitialise iframes on /explore/ β€” this origin does not execute Instagram's framebuster payload
React's isTrusted=true lock β€” synthetic click events rejected by navigation componentsDirect PopState Warp: manipulate window.history.pushState to trigger React Router without ever clicking
Session Expiration Death-Loop β€” all 16 iframes hit login page simultaneously, triggering IP banAuthentication Killswitch: first frame to hit login sets localStorage flag; all subsequent frames halt before touching the server
Chromium 6-connection limit β€” 7th–16th iframes blockedRow-Based Batching Matrix: 4 iframes/row, throttled by IntersectionObserver viewport gating
CPU saturation from 16 concurrent React SPA hydrations8,500ms DOM evaluation timeout β€” gives processor time to complete hydration before checking for rendered content
Speed-scroll DDOS cascade β€” fast scroll triggers all IntersectionObservers simultaneouslyDistributed Atomic Semaphore Lock in localStorage: enforces 1,200ms minimum gap between frame activations

Dashboard Features β€” 15 Intelligence Products

πŸ† Creator Tools

  • True-Fan Audience Mapper β€” identifies superfans from comment frequency analysis
  • Viral Formula Deconstructor β€” reverse-engineers competitor viral content patterns
  • Audio Trend Predictor β€” surfaces accelerating Reels audios before they peak

πŸ›‘οΈ Brand Tools

  • Brand-Safe Influencer Audit β€” bot-risk score (0–100) based on engagement ratios
  • PR Crisis Sentiment Thermometer β€” real-time keyword sentiment monitoring
  • Hyper-Local UGC Discovery β€” high-quality user content from specific locations
  • Competitor Ad Spy β€” identifies influencers and estimates competitor spend

πŸ“Š Market Intelligence

  • Authentic Consumer Review Aggregator β€” raw, unfiltered product feedback
  • Micro-Influencer Discovery Engine β€” 10K–50K follower accounts with >15% engagement
  • Geo-Fenced Event Analytics β€” organic social volume by physical location

⚑ Advanced Tools

  • Hashtag Optimiser β€” statistically optimal hashtag clusters from viral data
  • Customer Persona Builder β€” psychographic analysis from organic engager bios
  • Audience Poaching β€” frustrated competitor users as high-intent leads
  • Copyright Enforcer β€” DMCA-ready digital theft reports
  • Subculture Mapper β€” VC-grade influence web of emerging communities
← All Projects