V3: Edge-First Streaming + Hybrid AI
The current state: Edge streaming with Upstash Redis caching, optional client-side AI via WebLLM, and conversational refinement. Experience sub-100ms cached responses and privacy-preserving local inference.
The Evolution
| Version | Architecture | Key Innovation | Status |
|---|---|---|---|
| V1 | Manual API | AI generates layouts (one-shot) | Complete |
| V2 | Agent-Native | Shared state + conversational refinement | Complete |
| V3 | Edge-Native | Edge streaming, Redis caching, optional client-side AI | Complete |
What V3 Delivers
V3 combines edge streaming with intelligent caching and optional client-side AI for the best of both worlds: fast cached responses and privacy-preserving local inference.
The Architecture
Cache Hit Path:
Request → Edge → Upstash Redis → <100ms response
Cache Miss Path:
Request → Edge → GPT-4o-mini → Stream → Cache → Response
Client AI Path (optional):
WebGPU → WebLLM → Local inference → Zero network latency
The hybrid approach gives you:
- →Upstash Redis caching with persona-aware TTL (30min - 4hr)
- →Edge streaming via Vercel Edge Functions for global low-latency
- →WebLLM integration for browsers with WebGPU support
- →Delta-based refinement for efficient partial layout updates
V3 Feature Set
Edge Streaming
LiveNDJSON streaming from Vercel Edge Functions. Progressive component rendering with skeleton placeholders.
Redis Caching
LiveUpstash Redis for serverless caching. Persona-aware TTL ensures relevant content while reducing API costs.
Goal-Oriented: 30min TTL (deal freshness)
Comparing: 4hr TTL (stable comparisons)
60%+ cache hit rate expected
Refinement Chat
LiveDelta-based refinement sends only changed components. Floating chat with persona-aware quick suggestions.
"Show me deals under $50" → partial update
"More lifestyle inspiration" → reorder
~70% bandwidth reduction
Client-Side AI
OptionalWebLLM integration for browsers with WebGPU. Toggle between edge and local inference for privacy-first experience.
SmolLM2-360M: 200MB, fast inference
Phi-3-mini: Higher quality, 2GB
Zero server calls when enabled
Architecture Preview
Performance Targets
Implementation Status
Route Restructuring
V3 at /shop (default), V2 at /shop/v2, V1 at /shop/v1. Unified routing with version-aware components.
Edge Streaming + Caching
NDJSON streaming from Edge Functions. Upstash Redis caching with persona-aware TTL. Sub-100ms cached responses.
Delta Refinement
Floating refinement chat with delta-based updates. Only changed components are transmitted and re-rendered.
Client-Side AI
WebLLM integration for browsers with WebGPU support. User can toggle between edge and local inference.