Personal Platform

Statistical Arbitrage

Hexagonal Architecture

FastAPI

TimescaleDB

ML Ensemble

SuperintelA Personal Quantitative Trading Platform — Ports, Adapters & a 22-Model Ensemble

A solo-built, full-lifecycle statistical arbitrage platform connecting market data ingestion, ML prediction, risk gating, human approval, dual-broker execution, and real-time monitoring into one maintainable system — held together by strict hexagonal architecture so each layer can change without rewriting the others.

24 domain modules · 31–32 routers · ~145–150 endpoints

22-model prediction ensemble

40 tables · 20 TimescaleDB hypertables

2,692 passing tests · DDD compliance suite

Problem Definition

The Real Problem: Maintainability at Trading-System Scale

The problem was not "build a pairs trading bot." The problem was: build a platform where each step of the trading lifecycle can change independently. Swap the database. Add a data source. Modify the model ensemble. Change execution venues. Do any of these without rewriting core logic.

This is non-trivial because trading systems combine heterogeneous external dependencies (brokers, market data providers, LLM APIs), time-series data at scale, and operational scheduling across multiple cadences — all with correctness constraints that must hold under partial failure.

The architectural answer: strict hexagonal boundaries. Domain services define what they need via ports (interfaces). Infrastructure provides adapters that implement those ports. No SQLAlchemy, Redis, or HTTP client references inside domain code. Ever.

Design Constraints

Personal-use scope — single-user, no multi-tenancy

Multiple ingest cadences: 5 min / hourly / EOD / weekly

Dual-broker execution: IBKR (equities) + Binance (crypto)

Correctness gates before any order reaches a broker

Scope honesty: Superintel is a personal platform with production-grade architectural seams. It is not a hardened multi-user SaaS. Known gaps — optional JWT auth, unauthenticated WebSockets — are documented and have a concrete hardening plan.

Technology

Stack & Infrastructure

Layer	Technology
Backend runtime	FastAPI (Python 3.11), Pydantic v2, async/await, DI factory pattern
Domain architecture	DDD / hexagonal — domain services + ports, adapters for every external concern
Primary database	PostgreSQL 14 + TimescaleDB — 40 tables, 20 hypertables with compression + retention
Caching / rate-limiting	Redis 7 — L2 cache and per-route rate limiting
Frontend	React + TypeScript, WebSocket real-time feeds
Market data	YFinance, Binance (CCXT), FMP economic calendar, Coinglass, NewsAPI
AI / LLM layer	Perplexity (sentiment), OpenAI (audit) — both behind port abstractions
Broker execution	IBKR (Docker gateway), Binance via CCXT, Composite routing adapter
Observability	Prometheus metrics, health endpoints, structured logging
Infrastructure	Docker Compose (IBKR gateway, Postgres, Redis)

Architecture

Hexagonal Architecture — Full System

API routers call domain services. Domain services depend only on port interfaces. Infrastructure adapters implement those ports and are selected at runtime via environment variables — enabling memory/mock mode for testing with zero domain changes.

Runtime Configurability

Adapter Selection via Environment

Every port has multiple adapter implementations. The factory reads an environment variable and wires the correct adapter — postgres for production, memory for unit tests,mock for CI. Domain code never knows which is active.

This pattern is applied consistently: *_REPO_IMPL, CACHE_IMPL, HTTP_CLIENT_IMPL, BROKER_IMPL. The DDD compliance test suite validates that swapping adapters via config produces identical domain behaviour.

postgres

Production

memory

Unit Tests

mock

CI / CD

infrastructure/factories/repo_factory.py

1def get_price_data_repository(
2    db_client=None,
3) -> PriceDataRepositoryPort:
4    """
5    Domain code calls this and receives a PriceDataRepositoryPort.
6    It never sees Postgres, Redis, or any infrastructure symbol.
7    """
8    impl = os.getenv("PRICE_DATA_REPO_IMPL", "postgres")
9
10    match impl:
11        case "postgres":
12            return PostgresPriceDataRepository(
13                db_client or get_db_client()
14            )
15        case "memory":
16            return InMemoryPriceDataRepository()
17        case "mock":
18            return MockPriceDataRepository()
19        case _:
20            raise ValueError(f"Unknown adapter: {impl}")
21
22
23# DDD compliance test — validates the contract holds
24def test_adapter_swap_produces_same_contract():
25    for impl in ["postgres", "memory", "mock"]:
26        os.environ["PRICE_DATA_REPO_IMPL"] = impl
27        repo = get_price_data_repository()
28        assert isinstance(repo, PriceDataRepositoryPort)
29        assert hasattr(repo, "get_prices")
30        assert hasattr(repo, "write_prices")

Trading Lifecycle

End-to-End Signal-to-Fill Sequence

Every trade traverses: ingestion → feature store → ML ensemble → approval gate → risk veto → broker execution → journal write → WebSocket broadcast. No step is skippable — correctness is structural, not optional.

Broker Layer

Composite Broker Gateway

A single BrokerPort interface hides all broker complexity from trading logic. The composite adapter inspectsPairOrder.asset_type and routes to the correct provider — IBKR for equities, Binance for crypto.

Upstream services submit one order type. The gateway handles routing, retry, and error normalisation. Adding a third venue requires one new adapter class and a routing rule — zero changes in domain code.

infrastructure/brokers/composite_broker.py

1class CompositeBrokerAdapter(BrokerPort):
2    """
3    Routes orders by asset_type.
4    Upstream code only sees BrokerPort — no IBKR/Binance imports.
5    """
6    def __init__(
7        self,
8        ibkr: IBKRAdapter,
9        binance: BinanceAdapter,
10    ):
11        self._ibkr    = ibkr
12        self._binance = binance
13
14    async def submit_order(
15        self, order: PairOrder
16    ) -> OrderResult:
17        match order.asset_type:
18            case AssetType.EQUITY:
19                return await self._ibkr.submit_order(order)
20            case AssetType.CRYPTO:
21                return await self._binance.submit_order(order)
22            case _:
23                raise UnsupportedAssetTypeError(
24                    order.asset_type
25                )
26
27    async def get_positions(self) -> list[Position]:
28        equity = await self._ibkr.get_positions()
29        crypto = await self._binance.get_positions()
30        return equity + crypto

Operational Scheduling

Always-On Pipeline Architecture

Multiple cadences co-exist: crypto ingestion every 5 minutes, equity data hourly, EOD reconciliation + P&L snapshots, and weekly model training. Graceful degradation and freshness checks prevent stale data from reaching signal generation.

Every 5 min

Crypto ingest

Hourly

Equity ingest

Daily

EOD reconcile

Weekly

Model retrain

ML Layer

22-Model Prediction Ensemble

PredictionService dynamically loads model configurations from the ModelRegistry byrun_id. Each weekly training cycle records metrics and pushes a new run. The registry controls which run is active in production — rollback is a single config update.

Cointegration Models5

Spread / Z-Score Models4

Momentum / Mean-Reversion4

Macro / Sentiment Models4

Risk / Volatility Models5

Data Layer

Feature Store — TimescaleDB + JSONB

Feature vectors are stored as schema-versioned JSONB payloads in TimescaleDB hypertables. Compression and retention policies apply per-table. Every vector is tagged with aschema_version FK — the model loader validates schema compatibility before inference.

Total tables

Hypertables

Domain modules

2,692

Passing tests

Testing Strategy

2,692 Tests — Honest Coverage Breakdown

Coverage is high for domain logic where the hexagonal structure enables true isolation. Infrastructure adapters and integration paths have lower coverage. Reporting a single headline number would mislead — the breakdown is more informative.

Domain Logic

High

Unit — all in-memory adapter

Domain services and use-cases run against InMemory adapters. No DB, no network. Fast and deterministic.

DDD Compliance

Validated

Adapter swap contract tests

Explicit suite validates that every port implementation behaves identically — swapping postgres for memory changes nothing upstream.

Infrastructure / Integration

Lower

Integration + E2E (partial)

Adapter-level tests and full-stack E2E paths exist but are not exhaustive. Broker adapters and LLM paths rely on mocks.

Honesty & Hardening

Known Security Gaps & Concrete Hardening Plan

Personal-scope assumptions leave real security debt. These are documented, understood, and have specific remediation steps — not swept under a "future work" heading.

Optional JWT across routers

high

Hardening plan: Turn 'optional' into required middleware for all sensitive endpoints. Formalise roles and scopes at the router level. Add auth-required negative tests per router.

WebSocket lacks authentication

high

Hardening plan: Token validation on connect. Per-channel authorization. Replay and expiry handling. WS auth integration test suite.

Hardcoded approver identity

medium

Hardening plan: Replace with a real approver model: users/roles, request ownership, and audit-trail integrity. Approval router currently has known issues documented in the codebase.

Provider API key hygiene

medium

Hardening plan: Per-environment scoped keys, server-side-only storage, rotation policy, and least-privilege scoping for YFinance/FMP/OpenAI/Perplexity integrations.

Design Decisions

Trade-offs & Rejected Alternatives

✓ Hexagonal boundaries over direct framework coupling

Faster to call SQLAlchemy/Redis directly from a service. But that makes testability expensive and infrastructure changes painful at 24-module scale. The complexity cost paid upfront enables long-term changeability without domain rewrites.

✓ Composite broker gateway over per-service broker calls

Each service knowing about IBKR/Binance would scatter routing logic. A single composite adapter centralises routing and error normalisation — trading logic stays broker-agnostic, which is critical when adding a third venue.

✓ Execution correctness over execution sophistication

TWAP/VWAP algorithms were explicitly deprioritised. The platform focuses on correctness gates — quality checks, approval vetoes, risk limits — from signal to order to journal write. A robust end-to-end loop beats sophisticated execution on an unreliable pipeline.

✓ DB-backed feature store over in-memory feature pipeline

In-memory pipelines lose state on restart. TimescaleDB hypertables with schema-versioned JSONB persist features durably, support compression/retention, and enable replay — important for weekly retraining across multi-year lookbacks.

Retrospective

Lessons & What I'd Do Differently

Authentication should be foundational, not additive

Starting with 'optional JWT' creates auth debt that grows with every new router. In the next system, auth middleware is wired first and every sensitive endpoint is auth-required by default, with explicit opt-out for public routes.

Domain coverage ≠ overall coverage — report both

A single headline test coverage number misleads when domain logic is well-tested but infrastructure adapters are not. The honest metric is domain coverage, infrastructure coverage, and total — all three, explained.

Operational scheduling complexity compounds

Five concurrent cadences (5 min, hourly, EOD, weekly, + on-demand) each with their own failure modes and fallbacks is a higher coordination surface than anticipated. Explicit circuit-breakers and a dead-letter pattern for failed jobs would be the first-class addition.

Key Architectural Insight

Hexagonal architecture at 24-module scale is not primarily a design problem — it is a discipline problem. Every new module has an opportunity to shortcut a port and import infrastructure directly. The DDD compliance test suite is what makes the boundary enforceable rather than aspirational.

Scope Positioning

Superintel is not presented as a production SaaS — it is a personal quantitative platform with production-grade architectural seams. The ports/adapters model, DI discipline, and test suite represent the engineering standard. The auth gaps represent personal-scope pragmatism that has a concrete remediation path.

Scale Summary

Domain modules24

API routers31 – 32

Endpoints~145 – 150

ML sub-models22

DB tables40 (20 hypertables)

Passing tests2,692