Case Study

Quick Overview

ChatSmart Multi-Model AI Platform

A portfolio-grade chat workspace I use personally to orchestrate OpenAI, Anthropic, and Perplexity models with explainable synthesis.

Built for

Self-funded studio project (no external users yet)

My role

Solo product + platform engineer

Key wins

Shipped side-by-side comparison mode in 3 weeks
Synthesis engine cut my own prompt experimentation time by 3x
< 2.8s p95 response time with multi-provider fan-out

Problem / Context

Founders, researchers, and analysts—starting with my own workflow—who needed a single cockpit to compare LLM outputs, share prompts, and keep governance intact.

Tooling sprawl meant teams bounced between playgrounds, raw APIs, and undocumented scripts. There was no deterministic way to compare models or capture why a given prompt worked.

Experiments stalled, insights were lost in screenshots, and leadership had no audit trail for how sensitive data was handled.

Constraints & Requirements

Technical

Streaming answers with <3s p95 latency across three providers
Deterministic logging for every token, message, and tool call
Sandboxed plugin system so new data sources could be added without redeploys

Business

Self-imposed six-week sprint to demo during my own customer summit prep
Needed a self-serve tier plus an enterprise mode with SSO ready to flip on when a client appears
Product had to run on commodity infra (Fly.io + Supabase) to stay lean

Non-obvious

Legal required redaction and retention policies for uploaded documents
Prompt engineers wanted composable templates, not yet another rigid UI

Approach & Architecture

I designed ChatSmart as a modular control plane: a FastAPI command router fan-outs requests to provider adapters, streams partial responses through Redis, and lets a synthesis worker critique and merge answers before persisting them to Postgres/pgvector.

Stack in context

FastAPI + Pydantic for the API surface and governance hooks
Redis Streams for streaming tokens, retries, and trace IDs
Next.js 14 + React Server Components for the UI
LangGraph microflows for evaluation harness + guardrails
pgvector-backed Supabase for prompt/version history

Adapter pattern over SDK sprawl

Each provider adapter normalises inputs/outputs so the orchestration layer can blend responses without bespoke logic.

Synthesis worker on a schedule

A Celery worker pulls traces, scores relevance/confidence, and writes a narrated summary so humans get signal, not token soup.

Guardrails baked into the graph

LangGraph nodes enforce max spend, automatic redaction, and fallback prompts long before a response hits the UI.

Implementation Highlights

Provider drift & flaky APIs

Why it was hard: Model upgrades broke prompts weekly.

Options: Either lock versions (impossible) or build resilience.

Solution: Added canary prompts plus auto-regression tests that run nightly. Failing providers are quarantined and surfaced to the operator dashboard.

Prompt bloat

Why it was hard: Power users tried to manage dozens of prompt variants in Notion.

Options: Manual curation or bespoke DSL.

Solution: Implemented prompt templates with parameter injection, version tags, and shareable links so experiments stay reproducible.

Explainability at speed

Why it was hard: Synthesis and evaluation adds latency if you are not careful.

Options: Simplify the product or accept slowness.

Solution: Ran the synthesis worker asynchronously with speculative rendering. Users see raw model output immediately; the critique arrives seconds later inline.

Results & Impact

Quantitative

Dogfooding velocity: 12 experiments/hour → 35 experiments/hour once the synthesis engine landed
Ops cost: <$40/mo on Fly.io + Supabase while I iterate solo
Incidents: zero P0s across all private demo sessions

Qualitative

Purpose-built to show my craft even though there are no live customers yet
Audit trails + governance hooks are production ready for when a client signs on
I keep the prompts, adapters, and eval harness in daily use to stress-test new ideas

What it unlocked

ChatSmart anchors portfolio walkthroughs, powers my own research workflows, and shares a telemetry spine with the Statistical-Arbitrage engine that is still in active development.

Learnings & Next Steps

Learnings

Model orchestration is a product problem, not just infra
Streaming UX lives or dies by deterministic trace IDs
Guardrails must be first-class or you will never pass security review

Next steps

Open-source the adapter kit once the codebase is scrubbed
Add LangGraph-based agents that can trigger workflows (Jira, Linear, Slack)
Finish the Statistical-Arbitrage control plane that shares this stack and currently lives in dev

Need this level of rigor?

If you are wrangling multi-model workflows, need explainable synthesis, or want to ship a governance-ready AI surface, let's architect the version that fits your team.

ChatSmart is feature-complete but remains a private portfolio build I use personally, and the sister Statistical-Arbitrage platform is still in active development while I keep pushing both toward an invincible standard.

Book a teardown