Replay historical prediction market data through the exact same pipeline you use in live trading. Get equity curves, trade logs, Sharpe ratios, Brier scores, and CSV exports from a single function call.
Full Code
"""Backtest a simple market maker on historical prediction market data."""
import horizon as hz
from horizon.context import FeedData
def fair_value(ctx: hz.Context) -> float:
feed = ctx.feeds.get("default", FeedData())
return feed.price if feed.price > 0 else 0.50
def quoter(ctx: hz.Context, fair: float) -> list[hz.Quote]:
skew = ctx.inventory.net * 0.002
return hz.quotes(fair - skew, spread=0.06, size=5)
# Generate sample data: price oscillating around 0.50
data = [
{"timestamp": float(i), "price": 0.50 + 0.05 * ((-1) ** i) * (i % 10) / 10}
for i in range(500)
]
result = hz.backtest(
name="simple_mm_backtest",
markets=["test-market"],
data=data,
pipeline=[fair_value, quoter],
risk=hz.Risk(max_position=50, max_drawdown_pct=10),
initial_capital=100.0,
)
print(result.summary())
print(f"\nTrades: {len(result.trades)}")
print(f"Final equity: ${result.equity_curve[-1][1]:.2f}")
# Export
result.to_csv("equity.csv", what="equity")
result.to_csv("trades.csv", what="trades")
How It Works
The backtest engine replays your data through the same pipeline that hz.run() uses in live trading:
-
Data normalization: Your input (dicts, CSV, or DataFrame) is converted into a chronological timeline of
Tick objects with timestamp, price, bid, ask, and volume fields.
-
Timeline construction: All ticks across all feeds are merged into a single sorted timeline. Each timestamp carries forward the latest state of every feed (carry-forward interpolation).
-
Pipeline execution: At each tick, the engine builds a
Context with current feed data and inventory, then runs your pipeline functions in order. The output quotes are submitted to the internal paper exchange.
-
Paper matching: The paper exchange matches resting orders against the current feed price. Fills update positions and P&L.
-
Metrics computation: After all ticks are processed,
BacktestResult lazily computes Sharpe, Sortino, Calmar, drawdown, win rate, profit factor, and prediction-market-specific metrics like Brier score.
Rate limits and dedup windows are automatically relaxed during backtests for maximum throughput. The risk pipeline (position limits, drawdown, etc.) still runs normally.
hz.backtest() accepts four input formats for the data parameter:
list[dict]
The simplest format. Each dict must have a timestamp field and at least one of price or bid:
data = [
{"timestamp": 0.0, "price": 0.50},
{"timestamp": 1.0, "price": 0.52},
{"timestamp": 2.0, "price": 0.48, "bid": 0.47, "ask": 0.49},
]
If only price is provided, bid and ask are set equal to price. If only bid and ask are provided, price is derived as the midpoint.
CSV file path
Pass a string path to a CSV file with a header row:
result = hz.backtest(
data="data/btc_market.csv",
pipeline=[fair_value, quoter],
risk=hz.Risk(max_position=50),
)
Expected CSV columns: timestamp, price, and optionally bid, ask, volume.
pandas DataFrame
Pass a DataFrame directly, no conversion needed:
import pandas as pd
df = pd.read_csv("data/market_history.csv")
result = hz.backtest(
data=df,
pipeline=[fair_value, quoter],
risk=hz.Risk(max_position=50),
)
dict[str, data] for multi-feed
Map feed names to their data sources for strategies that consume multiple feeds:
result = hz.backtest(
data={
"btc": "data/btc_prices.csv",
"book": [
{"timestamp": 0.0, "bid": 0.47, "ask": 0.53},
{"timestamp": 1.0, "bid": 0.48, "ask": 0.52},
],
},
pipeline=[fair_value, quoter],
risk=hz.Risk(max_position=50),
)
Interpreting Results
The result.summary() output contains three sections:
Returns
| Metric | Description |
|---|
| Total Return | Absolute dollar P&L from initial capital |
| Total Return % | Percentage return on initial capital |
| CAGR | Compound annual growth rate (requires duration >= 1 day) |
Risk
| Metric | Description |
|---|
| Sharpe Ratio | Annualized risk-adjusted return (higher is better, >1 is good) |
| Sortino Ratio | Like Sharpe but only penalizes downside volatility |
| Calmar Ratio | CAGR divided by max drawdown |
| Max Drawdown | Largest peak-to-trough decline in equity |
| Max DD Duration | Longest time spent below a previous equity high |
Trades
| Metric | Description |
|---|
| Win Rate | Percentage of round-trip trades that were profitable |
| Profit Factor | Gross profit / gross loss (>1 means profitable) |
| Expectancy | Average P&L per trade |
| Avg Win / Avg Loss | Mean size of winning vs. losing trades |
Prediction Market Metrics
| Metric | Description |
|---|
| Brier Score | Mean squared forecast error (lower is better, 0 = perfect) |
| Avg Edge | Mean (outcome - price_paid) for buy trades |
Multi-Feed Backtesting
Test strategies that consume multiple data sources, such as a BTC-priced prediction market:
"""Multi-feed backtest: BTC price feed + prediction market book."""
import horizon as hz
from horizon.context import FeedData
def fair_value(ctx: hz.Context) -> float:
btc = ctx.feeds.get("btc", FeedData())
if btc.price > 100_000:
return 0.70 # Bullish
elif btc.price > 95_000:
return 0.50 # Neutral
else:
return 0.30 # Bearish
def quoter(ctx: hz.Context, fair: float) -> list[hz.Quote]:
book = ctx.feeds.get("book", FeedData())
spread_est = (book.ask - book.bid) if book.bid > 0 else 0.06
spread = max(0.04, spread_est * 1.2)
skew = ctx.inventory.net * 0.001
return hz.quotes(fair - skew, spread=spread, size=5)
result = hz.backtest(
name="multi_feed_mm",
markets=["btc-100k"],
data={
"btc": "data/btc_1min.csv",
"book": "data/polymarket_book.csv",
},
pipeline=[fair_value, quoter],
risk=hz.Risk(max_position=100, max_drawdown_pct=5),
initial_capital=500.0,
)
print(result.summary())
At each timestamp in the merged timeline, both feeds carry forward their latest values, so the btc feed updates even when book has no new data at that timestamp and vice versa.
"""Backtest from a CSV file with bid/ask data."""
import horizon as hz
from horizon.context import FeedData
def fair_value(ctx: hz.Context) -> float:
feed = ctx.feeds.get("default", FeedData())
if feed.bid > 0 and feed.ask > 0:
return (feed.bid + feed.ask) / 2.0
return feed.price if feed.price > 0 else 0.50
def quoter(ctx: hz.Context, fair: float) -> list[hz.Quote]:
return hz.quotes(fair, spread=0.04, size=10)
result = hz.backtest(
name="csv_backtest",
markets=["my-market"],
data="data/historical_quotes.csv",
pipeline=[fair_value, quoter],
risk=hz.Risk(max_position=200),
initial_capital=1000.0,
paper_fee_rate=0.002, # 20 bps fee
)
print(result.summary())
# Per-market P&L breakdown
for market_id, pnl in result.pnl_by_market().items():
print(f" {market_id}: ${pnl:+.2f}")
With Outcomes for Brier Score
Pass known outcomes to compute Brier score and average edge. This is essential for evaluating your probability calibration:
"""Backtest with known outcomes for calibration metrics."""
import horizon as hz
from horizon.context import FeedData
def fair_value(ctx: hz.Context) -> float:
feed = ctx.feeds.get("default", FeedData())
return feed.price if feed.price > 0 else 0.50
def quoter(ctx: hz.Context, fair: float) -> list[hz.Quote]:
skew = ctx.inventory.net * 0.002
return hz.quotes(fair - skew, spread=0.06, size=5)
result = hz.backtest(
name="calibration_test",
markets=["election-2024", "fed-rate-cut"],
data={
"default": [
{"timestamp": float(i), "price": 0.55 + 0.02 * (i % 5)}
for i in range(200)
],
},
pipeline=[fair_value, quoter],
risk=hz.Risk(max_position=50),
initial_capital=500.0,
outcomes={
"election-2024": 1.0, # Yes resolved
"fed-rate-cut": 0.0, # No resolved
},
)
print(result.summary())
m = result.metrics
print(f"\nBrier Score: {m.brier_score:.4f}") # 0 = perfect, 0.25 = coin flip
print(f"Avg Edge: {m.avg_edge:+.4f}") # Positive = profitable calibration
A Brier score below 0.25 means your model forecasts better than a coin flip. Below 0.10 is considered excellent calibration for prediction markets.
Run It
python examples/backtest_example.py
# Or in a Jupyter notebook, result.summary() renders nicely in print()