Quick Overview
ChatSmart Multi-Model AI Platform
A portfolio-grade chat workspace I use personally to orchestrate OpenAI, Anthropic, and Perplexity models with explainable synthesis.
Built for
Self-funded studio project (no external users yet)
My role
Solo product + platform engineer
Key wins
- Shipped side-by-side comparison mode in 3 weeks
- Synthesis engine cut my own prompt experimentation time by 3x
- < 2.8s p95 response time with multi-provider fan-out
Problem / Context
Founders, researchers, and analysts—starting with my own workflow—who needed a single cockpit to compare LLM outputs, share prompts, and keep governance intact.
Tooling sprawl meant teams bounced between playgrounds, raw APIs, and undocumented scripts. There was no deterministic way to compare models or capture why a given prompt worked.
Experiments stalled, insights were lost in screenshots, and leadership had no audit trail for how sensitive data was handled.
Constraints & Requirements
Technical
- Streaming answers with <3s p95 latency across three providers
- Deterministic logging for every token, message, and tool call
- Sandboxed plugin system so new data sources could be added without redeploys
Business
- Self-imposed six-week sprint to demo during my own customer summit prep
- Needed a self-serve tier plus an enterprise mode with SSO ready to flip on when a client appears
- Product had to run on commodity infra (Fly.io + Supabase) to stay lean
Non-obvious
- Legal required redaction and retention policies for uploaded documents
- Prompt engineers wanted composable templates, not yet another rigid UI
Approach & Architecture
I designed ChatSmart as a modular control plane: a FastAPI command router fan-outs requests to provider adapters, streams partial responses through Redis, and lets a synthesis worker critique and merge answers before persisting them to Postgres/pgvector.
Stack in context
- FastAPI + Pydantic for the API surface and governance hooks
- Redis Streams for streaming tokens, retries, and trace IDs
- Next.js 14 + React Server Components for the UI
- LangGraph microflows for evaluation harness + guardrails
- pgvector-backed Supabase for prompt/version history
Adapter pattern over SDK sprawl
Each provider adapter normalises inputs/outputs so the orchestration layer can blend responses without bespoke logic.
Synthesis worker on a schedule
A Celery worker pulls traces, scores relevance/confidence, and writes a narrated summary so humans get signal, not token soup.
Guardrails baked into the graph
LangGraph nodes enforce max spend, automatic redaction, and fallback prompts long before a response hits the UI.
Implementation Highlights
Provider drift & flaky APIs
Why it was hard: Model upgrades broke prompts weekly.
Options: Either lock versions (impossible) or build resilience.
Solution: Added canary prompts plus auto-regression tests that run nightly. Failing providers are quarantined and surfaced to the operator dashboard.
Prompt bloat
Why it was hard: Power users tried to manage dozens of prompt variants in Notion.
Options: Manual curation or bespoke DSL.
Solution: Implemented prompt templates with parameter injection, version tags, and shareable links so experiments stay reproducible.
Explainability at speed
Why it was hard: Synthesis and evaluation adds latency if you are not careful.
Options: Simplify the product or accept slowness.
Solution: Ran the synthesis worker asynchronously with speculative rendering. Users see raw model output immediately; the critique arrives seconds later inline.
Results & Impact
Quantitative
- Dogfooding velocity: 12 experiments/hour → 35 experiments/hour once the synthesis engine landed
- Ops cost: <$40/mo on Fly.io + Supabase while I iterate solo
- Incidents: zero P0s across all private demo sessions
Qualitative
- Purpose-built to show my craft even though there are no live customers yet
- Audit trails + governance hooks are production ready for when a client signs on
- I keep the prompts, adapters, and eval harness in daily use to stress-test new ideas
What it unlocked
ChatSmart anchors portfolio walkthroughs, powers my own research workflows, and shares a telemetry spine with the Statistical-Arbitrage engine that is still in active development.
Learnings & Next Steps
Learnings
- Model orchestration is a product problem, not just infra
- Streaming UX lives or dies by deterministic trace IDs
- Guardrails must be first-class or you will never pass security review
Next steps
- Open-source the adapter kit once the codebase is scrubbed
- Add LangGraph-based agents that can trigger workflows (Jira, Linear, Slack)
- Finish the Statistical-Arbitrage control plane that shares this stack and currently lives in dev
Need this level of rigor?
If you are wrangling multi-model workflows, need explainable synthesis, or want to ship a governance-ready AI surface, let's architect the version that fits your team.
ChatSmart is feature-complete but remains a private portfolio build I use personally, and the sister Statistical-Arbitrage platform is still in active development while I keep pushing both toward an invincible standard.
Book a teardown