Skip to main content
Replay historical prediction market data through the exact same pipeline you use in live trading. Get equity curves, trade logs, Sharpe ratios, Brier scores, and CSV exports from a single function call.

Full Code

"""Backtest a simple market maker on historical prediction market data."""

import horizon as hz
from horizon.context import FeedData


def fair_value(ctx: hz.Context) -> float:
    feed = ctx.feeds.get("default", FeedData())
    return feed.price if feed.price > 0 else 0.50


def quoter(ctx: hz.Context, fair: float) -> list[hz.Quote]:
    skew = ctx.inventory.net * 0.002
    return hz.quotes(fair - skew, spread=0.06, size=5)


# Generate sample data: price oscillating around 0.50
data = [
    {"timestamp": float(i), "price": 0.50 + 0.05 * ((-1) ** i) * (i % 10) / 10}
    for i in range(500)
]

result = hz.backtest(
    name="simple_mm_backtest",
    markets=["test-market"],
    data=data,
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=50, max_drawdown_pct=10),
    initial_capital=100.0,
)

print(result.summary())
print(f"\nTrades: {len(result.trades)}")
print(f"Final equity: ${result.equity_curve[-1][1]:.2f}")

# Export
result.to_csv("equity.csv", what="equity")
result.to_csv("trades.csv", what="trades")

How It Works

The backtest engine replays your data through the same pipeline that hz.run() uses in live trading:
  1. Data normalization: Your input (dicts, CSV, or DataFrame) is converted into a chronological timeline of Tick objects with timestamp, price, bid, ask, and volume fields.
  2. Timeline construction: All ticks across all feeds are merged into a single sorted timeline. Each timestamp carries forward the latest state of every feed (carry-forward interpolation).
  3. Pipeline execution: At each tick, the engine builds a Context with current feed data and inventory, then runs your pipeline functions in order. The output quotes are submitted to the internal paper exchange.
  4. Paper matching: The paper exchange matches resting orders against the current feed price. Fills update positions and P&L.
  5. Metrics computation: After all ticks are processed, BacktestResult lazily computes Sharpe, Sortino, Calmar, drawdown, win rate, profit factor, and prediction-market-specific metrics like Brier score.
Rate limits and dedup windows are automatically relaxed during backtests for maximum throughput. The risk pipeline (position limits, drawdown, etc.) still runs normally.

Data Formats

hz.backtest() accepts four input formats for the data parameter:

list[dict]

The simplest format. Each dict must have a timestamp field and at least one of price or bid:
data = [
    {"timestamp": 0.0, "price": 0.50},
    {"timestamp": 1.0, "price": 0.52},
    {"timestamp": 2.0, "price": 0.48, "bid": 0.47, "ask": 0.49},
]
If only price is provided, bid and ask are set equal to price. If only bid and ask are provided, price is derived as the midpoint.

CSV file path

Pass a string path to a CSV file with a header row:
result = hz.backtest(
    data="data/btc_market.csv",
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=50),
)
Expected CSV columns: timestamp, price, and optionally bid, ask, volume.

pandas DataFrame

Pass a DataFrame directly, no conversion needed:
import pandas as pd

df = pd.read_csv("data/market_history.csv")
result = hz.backtest(
    data=df,
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=50),
)

dict[str, data] for multi-feed

Map feed names to their data sources for strategies that consume multiple feeds:
result = hz.backtest(
    data={
        "btc": "data/btc_prices.csv",
        "book": [
            {"timestamp": 0.0, "bid": 0.47, "ask": 0.53},
            {"timestamp": 1.0, "bid": 0.48, "ask": 0.52},
        ],
    },
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=50),
)

Interpreting Results

The result.summary() output contains three sections:

Returns

MetricDescription
Total ReturnAbsolute dollar P&L from initial capital
Total Return %Percentage return on initial capital
CAGRCompound annual growth rate (requires duration >= 1 day)

Risk

MetricDescription
Sharpe RatioAnnualized risk-adjusted return (higher is better, >1 is good)
Sortino RatioLike Sharpe but only penalizes downside volatility
Calmar RatioCAGR divided by max drawdown
Max DrawdownLargest peak-to-trough decline in equity
Max DD DurationLongest time spent below a previous equity high

Trades

MetricDescription
Win RatePercentage of round-trip trades that were profitable
Profit FactorGross profit / gross loss (>1 means profitable)
ExpectancyAverage P&L per trade
Avg Win / Avg LossMean size of winning vs. losing trades

Prediction Market Metrics

MetricDescription
Brier ScoreMean squared forecast error (lower is better, 0 = perfect)
Avg EdgeMean (outcome - price_paid) for buy trades

Multi-Feed Backtesting

Test strategies that consume multiple data sources, such as a BTC-priced prediction market:
"""Multi-feed backtest: BTC price feed + prediction market book."""

import horizon as hz
from horizon.context import FeedData


def fair_value(ctx: hz.Context) -> float:
    btc = ctx.feeds.get("btc", FeedData())
    if btc.price > 100_000:
        return 0.70  # Bullish
    elif btc.price > 95_000:
        return 0.50  # Neutral
    else:
        return 0.30  # Bearish


def quoter(ctx: hz.Context, fair: float) -> list[hz.Quote]:
    book = ctx.feeds.get("book", FeedData())
    spread_est = (book.ask - book.bid) if book.bid > 0 else 0.06
    spread = max(0.04, spread_est * 1.2)
    skew = ctx.inventory.net * 0.001
    return hz.quotes(fair - skew, spread=spread, size=5)


result = hz.backtest(
    name="multi_feed_mm",
    markets=["btc-100k"],
    data={
        "btc": "data/btc_1min.csv",
        "book": "data/polymarket_book.csv",
    },
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=100, max_drawdown_pct=5),
    initial_capital=500.0,
)

print(result.summary())
At each timestamp in the merged timeline, both feeds carry forward their latest values, so the btc feed updates even when book has no new data at that timestamp and vice versa.

CSV Input Example

"""Backtest from a CSV file with bid/ask data."""

import horizon as hz
from horizon.context import FeedData


def fair_value(ctx: hz.Context) -> float:
    feed = ctx.feeds.get("default", FeedData())
    if feed.bid > 0 and feed.ask > 0:
        return (feed.bid + feed.ask) / 2.0
    return feed.price if feed.price > 0 else 0.50


def quoter(ctx: hz.Context, fair: float) -> list[hz.Quote]:
    return hz.quotes(fair, spread=0.04, size=10)


result = hz.backtest(
    name="csv_backtest",
    markets=["my-market"],
    data="data/historical_quotes.csv",
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=200),
    initial_capital=1000.0,
    paper_fee_rate=0.002,  # 20 bps fee
)

print(result.summary())

# Per-market P&L breakdown
for market_id, pnl in result.pnl_by_market().items():
    print(f"  {market_id}: ${pnl:+.2f}")

With Outcomes for Brier Score

Pass known outcomes to compute Brier score and average edge. This is essential for evaluating your probability calibration:
"""Backtest with known outcomes for calibration metrics."""

import horizon as hz
from horizon.context import FeedData


def fair_value(ctx: hz.Context) -> float:
    feed = ctx.feeds.get("default", FeedData())
    return feed.price if feed.price > 0 else 0.50


def quoter(ctx: hz.Context, fair: float) -> list[hz.Quote]:
    skew = ctx.inventory.net * 0.002
    return hz.quotes(fair - skew, spread=0.06, size=5)


result = hz.backtest(
    name="calibration_test",
    markets=["election-2024", "fed-rate-cut"],
    data={
        "default": [
            {"timestamp": float(i), "price": 0.55 + 0.02 * (i % 5)}
            for i in range(200)
        ],
    },
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=50),
    initial_capital=500.0,
    outcomes={
        "election-2024": 1.0,   # Yes resolved
        "fed-rate-cut": 0.0,    # No resolved
    },
)

print(result.summary())

m = result.metrics
print(f"\nBrier Score: {m.brier_score:.4f}")   # 0 = perfect, 0.25 = coin flip
print(f"Avg Edge:    {m.avg_edge:+.4f}")       # Positive = profitable calibration
A Brier score below 0.25 means your model forecasts better than a coin flip. Below 0.10 is considered excellent calibration for prediction markets.

Run It

python examples/backtest_example.py

# Or in a Jupyter notebook, result.summary() renders nicely in print()