SIX|V3 DocumentationEdge-NativeLive
Current VersionV3 Live

V3: Edge-First Streaming + Hybrid AI

The current state: Edge streaming with Upstash Redis caching, optional client-side AI via WebLLM, and conversational refinement. Experience sub-100ms cached responses and privacy-preserving local inference.

The Evolution

VersionArchitectureKey InnovationStatus
V1Manual APIAI generates layouts (one-shot)Complete
V2Agent-NativeShared state + conversational refinementComplete
V3Edge-NativeEdge streaming, Redis caching, optional client-side AIComplete

What V3 Delivers

V3 combines edge streaming with intelligent caching and optional client-side AI for the best of both worlds: fast cached responses and privacy-preserving local inference.

The Architecture

Cache Hit Path:

Request → Edge → Upstash Redis → <100ms response

Cache Miss Path:

Request → Edge → GPT-4o-mini → Stream → Cache → Response

Client AI Path (optional):

WebGPU → WebLLM → Local inference → Zero network latency

The hybrid approach gives you:

  • Upstash Redis caching with persona-aware TTL (30min - 4hr)
  • Edge streaming via Vercel Edge Functions for global low-latency
  • WebLLM integration for browsers with WebGPU support
  • Delta-based refinement for efficient partial layout updates

V3 Feature Set

Edge Streaming

Live

NDJSON streaming from Vercel Edge Functions. Progressive component rendering with skeleton placeholders.

First paint<200ms
Cached response<100ms

Redis Caching

Live

Upstash Redis for serverless caching. Persona-aware TTL ensures relevant content while reducing API costs.

Goal-Oriented: 30min TTL (deal freshness)

Comparing: 4hr TTL (stable comparisons)

60%+ cache hit rate expected

Refinement Chat

Live

Delta-based refinement sends only changed components. Floating chat with persona-aware quick suggestions.

"Show me deals under $50" → partial update

"More lifestyle inspiration" → reorder

~70% bandwidth reduction

Client-Side AI

Optional

WebLLM integration for browsers with WebGPU. Toggle between edge and local inference for privacy-first experience.

SmolLM2-360M: 200MB, fast inference

Phi-3-mini: Higher quality, 2GB

Zero server calls when enabled

Architecture Preview

Loading diagram...

Performance Targets

<200ms
First Refinement
vs 2-3s in V2
<50ms
Semantic Search
vs 500ms in V2
70%+
Client-Side Queries
Reduced server load
60%
Cost Reduction
Server costs

Implementation Status

Complete

Route Restructuring

V3 at /shop (default), V2 at /shop/v2, V1 at /shop/v1. Unified routing with version-aware components.

✓ All routes operational
Complete

Edge Streaming + Caching

NDJSON streaming from Edge Functions. Upstash Redis caching with persona-aware TTL. Sub-100ms cached responses.

✓ /api/v3/stream operational
Complete

Delta Refinement

Floating refinement chat with delta-based updates. Only changed components are transmitted and re-rendered.

✓ /api/v3/refine operational
Optional

Client-Side AI

WebLLM integration for browsers with WebGPU support. User can toggle between edge and local inference.

✓ Available on supported browsers