Quick Overview
Statistical-Arbitrage Control Plane
A self-funded quant lab where I ingest multi-provider market data, score cointegration, and narrate risk so future pilot clients can see institutional rigor before go-live.
Built for
Portfolio build (internal trade desk, no external users yet)
My role
Solo quant + platform engineer
Current wins
- Automated 200+ pair refreshes with Engle–Granger + half-life filters
- LLM-backed audit notes flag stale feeds and variance spikes within minutes
- Realtime dashboards + Prometheus health checks keep me honest while it is still in R&D
Problem / Context
I needed a trustworthy statistical-arbitrage stack for my own research before pitching it to funds that demand auditable signals.
Off-the-shelf SaaS could not keep up with the depth of pair analytics, explainability, and governance that modern trading desks expect.
Without an internal proving ground, every client conversation stalled at 'can you really operate this at scale?' and I lacked telemetry to demonstrate it.
Constraints & Requirements
Technical
- Stream daily + intraday bars from two data vendors with resilience
- Run cointegration, half-life, and z-score calculations without blocking ingestion
- Emit Prometheus metrics + structured logs so I can trace every anomaly
Operating
- Self-imposed timeline: hit parity with manual research before inviting pilot capital
- Infrastructure must stay lean (Fly.io + Railway + Supabase) until monetization
- Every component must be modular so bespoke strategies can snap in later
Non-obvious
- Audit/regulation expectations even though this is still a portfolio project
- Need human-friendly narratives, not just charts, so non-quants can follow along
- Avoid overfitting while iterating quickly on features
Approach & Architecture
The platform is split into ingestion daemons, a FastAPI orchestration layer, and a React dashboard that shares UI primitives with ChatSmart. Redis Streams buffer ticks, PostgreSQL/Timescale stores normalized series, and Supabase Auth keeps future pilot logins ready.
Stack in context
- Python data workers orchestrated via Prefect
- FastAPI + Pydantic command router
- Redis Streams + RQ for backfills and alert fan-out
- PostgreSQL/Timescale + pgvector for embeddings
- React 18 + Tailwind dashboards piped through Next.js
LLM-powered audit subsystem
Every pipeline run stores derived stats that flow through deterministic rules + GPT-4 mini prompts. The result is a narrative that calls out missing assets, variance spikes, and remediation ideas without me digging through logs.
Pair labs separated from execution
Signal generation stays isolated from any broker connectivity until the analytics + governance story is watertight. It keeps experimentation fast without risking capital.
Shared telemetry with ChatSmart
Both projects emit traces into the same dashboarding stack so I can benchmark AI and quant workloads side-by-side and reuse alerting logic.
Implementation Highlights
Vendor drift
Why it was hard: Symbol coverage changes or delayed feeds can corrupt cointegration stats.
Options: Either overpay for enterprise SLAs or detect drift myself.
Solution: Implemented redundant ingestion plus checksum comparisons per batch. When feeds diverge, the audit agent narrates exactly which symbols deviated so I can pause that pair.
Explainability for non-quants
Why it was hard: Dashboards proved the math but not the decision trail.
Options: Ship raw charts or layer narrative context.
Solution: The audit subsystem writes Loom-style summaries directly into the UI so future stakeholders see why pairs were promoted, demoted, or quarantined.
Cost discipline
Why it was hard: Running heavy analytics 24/7 can melt a solo builder's budget.
Options: Throttle features or get clever with scheduling.
Solution: Pipelines autoscale based on volatility regimes; quiet markets run hourly, while high-volatility windows trigger denser sampling.
Results & Impact
Quantitative
- 200+ candidate pairs scored nightly with Engle–Granger + Johansen cross-checks
- <4 minutes to backfill a week of missing data across both vendors
- Ops budget under $60/mo while in private R&D
Qualitative
- Still no external users—this is my lab—but every demo now includes concrete telemetry instead of slideware
- Audit narratives mean compliance reviewers can follow along even before pilots launch
- Shared UI/infra with ChatSmart speeds up new feature spikes
What it unlocked
The control plane now underpins portfolio walkthroughs, stress-tests quant ideas for personal trading, and doubles as the foundation for the future client tier.
Learnings & Next Steps
Learnings
- Quant infra needs storytelling as much as math
- Redundant vendors + automated drift detection save hours of manual triage
- LLM copilots can audit pipelines if you feed them structured metrics
Next steps
- Finish embeddings-driven regime detection
- Wire paper-trading + risk caps before opening pilot seats
- Integrate news/L2 feeds so pair health reflects more than prices
Want this level of rigor for your desk?
The stat-arb control plane is still an internal build, but the code, telemetry, and audit rails are production-ready. If you need a similar system—or want to pressure test mine—let's talk.
ChatSmart and this engine share the same governance spine. I keep iterating until both feel invincible, then open them up to partners.
Book a teardown