Current VersionV3 Live

V3: Edge-First Streaming + Hybrid AI

The current state: Edge streaming with Upstash Redis caching, optional client-side AI via WebLLM, and conversational refinement. Experience sub-100ms cached responses and privacy-preserving local inference.

The Evolution

Version	Architecture	Key Innovation	Status
V1	Manual API	AI generates layouts (one-shot)	Complete
V2	Agent-Native	Shared state + conversational refinement	Complete
V3	Edge-Native	Edge streaming, Redis caching, optional client-side AI	Complete

What V3 Delivers

V3 combines edge streaming with intelligent caching and optional client-side AI for the best of both worlds: fast cached responses and privacy-preserving local inference.

The Architecture

Cache Hit Path:

Request → Edge → Upstash Redis → <100ms response

Cache Miss Path:

Request → Edge → GPT-4o-mini → Stream → Cache → Response

Client AI Path (optional):

WebGPU → WebLLM → Local inference → Zero network latency

The hybrid approach gives you:

→Upstash Redis caching with persona-aware TTL (30min - 4hr)
→Edge streaming via Vercel Edge Functions for global low-latency
→WebLLM integration for browsers with WebGPU support
→Delta-based refinement for efficient partial layout updates

V3 Feature Set

Edge Streaming

Live

NDJSON streaming from Vercel Edge Functions. Progressive component rendering with skeleton placeholders.

First paint<200ms

Cached response<100ms

Redis Caching

Live

Upstash Redis for serverless caching. Persona-aware TTL ensures relevant content while reducing API costs.

Goal-Oriented: 30min TTL (deal freshness)

Comparing: 4hr TTL (stable comparisons)

60%+ cache hit rate expected

Refinement Chat

Live

Delta-based refinement sends only changed components. Floating chat with persona-aware quick suggestions.

"Show me deals under $50" → partial update

"More lifestyle inspiration" → reorder

~70% bandwidth reduction

Semantic Search

Live

Vector embeddings with pgvector enable meaning-based product discovery. Query by intent, not just keywords.

"outdoor gear for rainy weather" → semantic match

OpenAI text-embedding-3-small (1536d)

Learn more →

Client-Side AI

Optional

WebLLM integration for browsers with WebGPU. Toggle between edge and local inference for privacy-first experience.

SmolLM2-360M: 200MB, fast inference

Phi-3-mini: Higher quality, 2GB

Zero server calls when enabled

Architecture Preview

Loading diagram...

Performance Targets

<200ms

First Refinement

vs 2-3s in V2

<50ms

Semantic Search

vs 500ms in V2

70%+

Client-Side Queries

Reduced server load

60%

Cost Reduction

Server costs

Implementation Status

Complete

Route Restructuring

V3 at /shop (default), V2 at /shop/v2, V1 at /shop/v1. Unified routing with version-aware components.

✓ All routes operational

Complete

Edge Streaming + Caching

NDJSON streaming from Edge Functions. Upstash Redis caching with persona-aware TTL. Sub-100ms cached responses.

✓ /api/v3/stream operational

Complete

Delta Refinement

Floating refinement chat with delta-based updates. Only changed components are transmitted and re-rendered.

✓ /api/v3/refine operational

Optional

Client-Side AI

WebLLM integration for browsers with WebGPU support. User can toggle between edge and local inference.

✓ Available on supported browsers

Current Version

V2 Agent-Native

Working implementation

Edge Architecture

Browser-based AI inference