Distributed Ray ScalperEvent-Driven Architecture on Bare Metal

Ray Distributed Compute

500+ Ticks/sec Processing

Multi-Cloud (Oracle + Azure)

The Latency War

In crypto scalping, "fast" isn't enough. You need causality-strict simulation that exactly mirrors production execution. This system uses a Ray Cluster to run heavy event-driven backtests across 12,000+ parameter combinations, optimizing for "Robustness Score" rather than just net profit.

ayush@ray-head:~/backtest

1$ python scripts/backtest_ml_edge.py --min-accuracy 0.65
2  Account Value:      $       1,000
3  Min Model Accuracy:          65%
4  Assets Traded:                11
5--------------------------------------
6  Total Trades:                268
7  Win Rate:                  56.1%
8  Avg Return/Trade:           1.13%
9  Profit Factor:              1.92
10--------------------------------------
11  TOTAL P&L:          $         195
12  Return on Account:          19.5%
13  Max Drawdown (cum): $         -28
14  Worst Trade:               -5.00%
15--------------------------------------
16Top Assets by P&L:
17  ILV-USD    acc=69.8%  avg_ret=+3.54%
18  HBAR-USD   acc=72.2%  avg_ret=+2.09%
19  CSCO       acc=71.9%  avg_ret=+1.54%

Core Strategy: Statistical Anomaly Detection

"Most market signals are noise. We only trade when multi-dimensional volatility metrics align across uncorrelated timeframes, indicating a high-probability mean reversion."

Detection: Real-time monitoring of order book imbalances and liquidity gaps across multiple exchanges.

Filtering: Proprietary "Regime Classification" logic prevents entering during toxic flow or news events.

Execution: Dynamic inventory management with Ornstein-Uhlenbeck mean-reversion exits.

Vectorized Detection Logic

src/logic/detectors.py

1def detect_volatility_clustering(prices: np.ndarray, threshold: float = 2.0) -> np.ndarray:
2    """
3    Vectorized detection of volatility regimes using GARCH-proxy logic.
4    Optimized for zero-copy execution on Ray object store.
5    """
6    # Calculate log returns in O(1) vector space
7    returns = np.diff(np.log(prices))
8    
9    # Rolling standard deviation using stride tricks for speed
10    rolling_std = np.lib.stride_tricks.sliding_window_view(returns, 20).std(axis=-1)
11    
12    # Z-Score normalization against trailing baseline
13    baseline = np.mean(rolling_std[-1000:])
14    z_scores = (rolling_std - baseline) / (np.std(rolling_std) + 1e-8)
15    
16    return z_scores > threshold

System Context

High-Level Interactions

The system acts as the central orchestrator, dispatching heavy compute tasks to the Ray Cluster via gRPC while maintaining a low-latency WebSocket connection to Binance for execution.

Infrastructure Logic

src/core/orchestrator.py

1@ray.remote(num_cpus=1, resources={"worker_node": 1})
2class BacktestWorker:
3    def __init__(self, config_registry):
4        self.engine = FastVectorEngine()
5        # Pre-load shared memory references to zero-copy arrays
6        self.shared_data = ray.get(config_registry.get_data_ref.remote())
7
8    def run_batch(self, param_batch: List[dict]) -> List[Result]:
9        results = []
10        for params in param_batch:
11            # JIT compile the specific parameter set
12            strategy_fn = self.engine.compile_strategy(params)
13            
14            # Execute on zero-copy shared memory
15            metrics = strategy_fn(self.shared_data)
16            
17            if metrics.sharpe > 2.5 and metrics.drawdown < 0.15:
18                results.append(metrics)
19        return results

Internal Architecture

Container Topology

The Feature Engine uses Numba for JIT compilation of indicators. The Fast Engine handles vectorized pre-filtering, passing only high-probability candidates to the Event-Driven Strategy Engine.

Feature Cache (O(1) Lookup)

Expensive indicators (ATR, Volatility) are computed once and pickled to feature_cache/. Ray workers load these pre-computed frames to skip O(N) recalculations during grid search.

Multi-Stage Optimization

Phase 1: Brute Force 12k combos (Vectorized).
Phase 2: Robustness Scoring (Weighted Avg of Regimes).
Phase 3: Stress Testing (2022 Crash simulation).

What came next →

The configs this cluster found went into a live production system

The top-ranked OOS configurations from these 12,288 screened combos were used to build a 24/7 regime-switching scalper running on Oracle Cloud. That case study covers the production architecture, WebSocket latency optimisation, and the bugs that only appear at 3am.

Regime-Switching Scalper