Building an ML trading system that works in a backtest is easy. Building one that survives real markets — with slippage, latency, regime changes, and correlated drawdowns — is an entirely different discipline. TradeNova is our answer: a 5-agent ensemble running on AWS EKS that combines reinforcement learning, gradient boosting, and Bayesian self-optimization into a production-grade trading platform at $280/month in infrastructure cost.
Why Multi-Agent?
Single-strategy systems are fragile. A trend follower prints money in directional markets and hemorrhages in chop. A mean-reversion system thrives in ranges and capitulates during breakouts. The academic literature on portfolio diversification applies equally to strategy diversification — uncorrelated return streams reduce drawdowns more reliably than any single alpha source.
TradeNova operationalizes this principle with five specialized agents, each an independent ML model with its own feature pipeline, training loop, and risk budget.
The 5 Agent Types
- Trend Agent: A reinforcement learning agent built on
stable-baselines3(PPO) that ingests multi-timeframe price action, ADX, and Supertrend indicators. It learns to enter trending instruments early and trail stops dynamically. - MeanReversion Agent: A
LightGBMclassifier trained on Bollinger Band z-scores, RSI divergences, and order-flow imbalance features. It identifies overextended moves and trades the snap-back. - Volatility Agent: Specializes in VIX-regime transitions using a hidden Markov model for state detection paired with a
PyTorchpolicy network for position sizing. - EMA Agent: A fast-reacting agent that trades exponential moving average crossovers with adaptive lookback periods tuned by Bayesian optimization. Designed for high-frequency signals in liquid instruments.
- Options Agent: Prices and executes options spreads based on implied-vs-realized volatility divergence, using a
PyTorchmodel trained on historical options chains and Greeks surfaces.
Master Picks: Unified Scoring
Each agent produces independent trade proposals. The Master Picks layer aggregates them into a unified scoring system on a 0–350+ point scale. Points are allocated across dimensions: signal strength (0–100), agent confidence (0–50), cross-agent agreement (0–75), regime alignment (0–75), and risk-budget availability (0–50+).
Proposals scoring below a dynamic threshold — calibrated nightly using the previous 30 days' hit rate — are filtered out. When multiple agents converge on the same instrument and direction, the agreement bonus amplifies the score, creating a natural wisdom of crowds effect.
The 7-Layer Market Weather System
Before any trade is evaluated, the system computes a holistic market context through seven analytical layers:
- Macro regime: Bull, bear, or transition based on broad index trends and yield curve signals
- Volatility regime: Low, normal, elevated, or crisis using VIX term structure
- Sector rotation: Relative strength across 11 GICS sectors with momentum scoring
- Correlation regime: Dispersion vs. correlation clustering across the S&P 500
- Liquidity: Bid-ask spread trends, volume profiles, and market depth metrics
- Sentiment: Put/call ratios, AAII surveys, and social-media NLP scores
- Calendar effects: FOMC dates, earnings seasons, options expiration cycles, and seasonality patterns
Each agent receives the current weather vector as input features, allowing it to condition its signals on regime context without hardcoded rules.
The Moonshot Engine
Separately from the core agents, the Moonshot engine scans for asymmetric setups — trades with 5x+ reward-to-risk ratios that conventional scoring would rank modestly. Moonshots receive a small, capped allocation (never exceeding 2% of portfolio) and are evaluated on a separate P&L track to avoid contaminating the main performance metrics.
Self-Improving ML
TradeNova's most powerful feature is its self-improving feedback loop. Every closed trade is logged with full feature snapshots. Each night, a retraining pipeline runs:
- Bayesian weight updating: Agent weights in Master Picks are adjusted based on trailing 30-day Sharpe ratios using Thompson sampling.
- Nightly retraining: Each agent's model is retrained on the expanded dataset with walk-forward validation to prevent look-ahead bias.
- Regime drift detection: If an agent's out-of-sample performance degrades beyond a threshold, its allocation is automatically reduced until the next successful retraining cycle.
5-Tier Profit Cascade
Position management follows a structured exit framework. As a trade moves in favor, profits are locked in across five tiers — at 1R, 2R, 3R, 5R, and 8R multiples of initial risk. Each tier closes a percentage of the position and tightens the trailing stop on the remainder. This ensures that winning trades contribute realized gains while allowing runners to capture tail moves.
Infrastructure at $280/Month
The entire system runs on AWS EKS with infrastructure defined in Terraform. Cost optimization is aggressive: Spot instances for nightly retraining, reserved instances for the always-on inference pods, and S3 Intelligent-Tiering for historical data storage.
- EKS cluster: 3
t3.mediumnodes (on-demand) for inference — $110/mo - Spot training:
g4dn.xlargeGPU instances for nightly retraining — $85/mo average - Data & networking: S3, ECR, NAT Gateway, CloudWatch — $55/mo
- Managed services: RDS PostgreSQL (db.t3.micro) for trade logs — $30/mo
At $280/month total, TradeNova demonstrates that production ML systems don't require six-figure cloud bills. Thoughtful architecture — right-sizing instances, exploiting spot pricing, and separating training from inference — makes sophisticated multi-agent ML accessible to teams of any size.
Technology Stack
- Language:
Python3.11 with async execution via asyncio - Infrastructure:
AWS EKSorchestrated byTerraform - Deep learning:
PyTorchfor policy networks and options pricing - Reinforcement learning:
stable-baselines3(PPO, A2C) - Gradient boosting:
LightGBMfor classification and feature ranking
The lesson from TradeNova is architectural: don't build one model that tries to learn everything. Build specialized agents, give them independent training loops, and let a meta-learner discover the optimal blend. The market is a multi-regime system — your trading system should be too.