Instagram Data-as-a-Service (DaaS) engine β zero-API-cost intelligence extraction via stealth browser automation, Chrome Extension engineering, and multi-page analytics dashboards.
Instagram's official API provides only 200 calls/hour, requires business verification, and costs thousands per month at enterprise tier. The same data Instagram serves to its own users β posts, captions, engagement metrics, comment streams, audio trends β is observable through legitimate browser-based interception.
This project proves that a skilled engineer can build the same intelligence pipeline that powers $500β$3,000/month SaaS products (Sprout Social, Brandwatch, Iconosquare) for $0 β using open-source tools, a stealth browser, and deep knowledge of Instagram's React frontend and GraphQL API architecture.
One-time human login saves cookies/localStorage to state.json. All subsequent headless sessions load this state β bypassing Instagram's CAPTCHA and device fingerprinting entirely by inheriting trust from a genuine login event.
Playwright routes ALL network traffic through handle_route(). Each intercepted response is fetched, inspected, and passed through unmodified to avoid detection.
Instead of hard-coding API schema paths (which change constantly), the parser recursively walks every dict/list, identifying posts by heuristic signature: (shortcode OR code) AND (owner OR user). This makes the system immune to Instagram's A/B format changes β handling 5 known schema variants simultaneously.
playwright-stealth patches 23 fingerprinting signals. Windows Chrome 120 UA string. Randomised 2β5s human-timing delays. Random scroll depths (800β1200px). Probabilistic organic engagement (5% like, 3% save) to maintain authentic trust ratios.
Embedding live Instagram content in Streamlit iframes requires solving three layers of browser security that Instagram enforces:
| Blocker | Engineering Solution |
|---|---|
| X-Frame-Options + CSP frame-ancestors β browsers refuse to render iframe | declarativeNetRequest MV3 rules strip these headers at the network layer before the browser processes them |
| JavaScript Framebusters β JS redirects parent tab if in iframe | Initialise iframes on /explore/ β this origin does not execute Instagram's framebuster payload |
| React's isTrusted=true lock β synthetic click events rejected by navigation components | Direct PopState Warp: manipulate window.history.pushState to trigger React Router without ever clicking |
| Session Expiration Death-Loop β all 16 iframes hit login page simultaneously, triggering IP ban | Authentication Killswitch: first frame to hit login sets localStorage flag; all subsequent frames halt before touching the server |
| Chromium 6-connection limit β 7thβ16th iframes blocked | Row-Based Batching Matrix: 4 iframes/row, throttled by IntersectionObserver viewport gating |
| CPU saturation from 16 concurrent React SPA hydrations | 8,500ms DOM evaluation timeout β gives processor time to complete hydration before checking for rendered content |
| Speed-scroll DDOS cascade β fast scroll triggers all IntersectionObservers simultaneously | Distributed Atomic Semaphore Lock in localStorage: enforces 1,200ms minimum gap between frame activations |