Horizon Backtesting Example

Replay historical prediction market data through the exact same pipeline you use in live trading. Get equity curves, trade logs, Sharpe ratios, Brier scores, and CSV exports from a single function call.

Full Code

"""Backtest a simple market maker on historical prediction market data."""

import horizon as hz
from horizon.context import FeedData


def fair_value(ctx: hz.Context) -> float:
    feed = ctx.feeds.get("default", FeedData())
    return feed.price if feed.price > 0 else 0.50


def quoter(ctx: hz.Context, fair: float) -> list[hz.Quote]:
    skew = ctx.inventory.net * 0.002
    return hz.quotes(fair - skew, spread=0.06, size=5)


# Generate sample data: price oscillating around 0.50
data = [
    {"timestamp": float(i), "price": 0.50 + 0.05 * ((-1) ** i) * (i % 10) / 10}
    for i in range(500)
]

result = hz.backtest(
    name="simple_mm_backtest",
    markets=["test-market"],
    data=data,
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=50, max_drawdown_pct=10),
    initial_capital=100.0,
)

print(result.summary())
print(f"\nTrades: {len(result.trades)}")
print(f"Final equity: ${result.equity_curve[-1][1]:.2f}")

# Export
result.to_csv("equity.csv", what="equity")
result.to_csv("trades.csv", what="trades")

How It Works

The backtest engine replays your data through the same pipeline that hz.run() uses in live trading:

Data normalization: Your input (dicts, CSV, or DataFrame) is converted into a chronological timeline of Tick objects with timestamp, price, bid, ask, and volume fields.
Timeline construction: All ticks across all feeds are merged into a single sorted timeline. Each timestamp carries forward the latest state of every feed (carry-forward interpolation).
Pipeline execution: At each tick, the engine builds a Context with current feed data and inventory, then runs your pipeline functions in order. The output quotes are submitted to the internal paper exchange.
Paper matching: The paper exchange matches resting orders against the current feed price. Fills update positions and P&L.
Metrics computation: After all ticks are processed, BacktestResult lazily computes Sharpe, Sortino, Calmar, drawdown, win rate, profit factor, and prediction-market-specific metrics like Brier score.

Rate limits and dedup windows are automatically relaxed during backtests for maximum throughput. The risk pipeline (position limits, drawdown, etc.) still runs normally.

Data Formats

hz.backtest() accepts four input formats for the data parameter:

list[dict]

The simplest format. Each dict must have a timestamp field and at least one of price or bid:

data = [
    {"timestamp": 0.0, "price": 0.50},
    {"timestamp": 1.0, "price": 0.52},
    {"timestamp": 2.0, "price": 0.48, "bid": 0.47, "ask": 0.49},
]

If only price is provided, bid and ask are set equal to price. If only bid and ask are provided, price is derived as the midpoint.

CSV file path

Pass a string path to a CSV file with a header row:

result = hz.backtest(
    data="data/btc_market.csv",
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=50),
)

Expected CSV columns: timestamp, price, and optionally bid, ask, volume.

pandas DataFrame

Pass a DataFrame directly, no conversion needed:

import pandas as pd

df = pd.read_csv("data/market_history.csv")
result = hz.backtest(
    data=df,
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=50),
)

dict[str, data] for multi-feed

Map feed names to their data sources for strategies that consume multiple feeds:

result = hz.backtest(
    data={
        "btc": "data/btc_prices.csv",
        "book": [
            {"timestamp": 0.0, "bid": 0.47, "ask": 0.53},
            {"timestamp": 1.0, "bid": 0.48, "ask": 0.52},
        ],
    },
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=50),
)

Interpreting Results

The result.summary() output contains three sections:

Returns

Metric	Description
Total Return	Absolute dollar P&L from initial capital
Total Return %	Percentage return on initial capital
CAGR	Compound annual growth rate (requires duration >= 1 day)

Risk

Metric	Description
Sharpe Ratio	Annualized risk-adjusted return (higher is better, >1 is good)
Sortino Ratio	Like Sharpe but only penalizes downside volatility
Calmar Ratio	CAGR divided by max drawdown
Max Drawdown	Largest peak-to-trough decline in equity
Max DD Duration	Longest time spent below a previous equity high

Trades

Metric	Description
Win Rate	Percentage of round-trip trades that were profitable
Profit Factor	Gross profit / gross loss (>1 means profitable)
Expectancy	Average P&L per trade
Avg Win / Avg Loss	Mean size of winning vs. losing trades

Prediction Market Metrics

Metric	Description
Brier Score	Mean squared forecast error (lower is better, 0 = perfect)
Avg Edge	Mean (outcome - price_paid) for buy trades

Multi-Feed Backtesting

Test strategies that consume multiple data sources, such as a BTC-priced prediction market:

"""Multi-feed backtest: BTC price feed + prediction market book."""

import horizon as hz
from horizon.context import FeedData


def fair_value(ctx: hz.Context) -> float:
    btc = ctx.feeds.get("btc", FeedData())
    if btc.price > 100_000:
        return 0.70  # Bullish
    elif btc.price > 95_000:
        return 0.50  # Neutral
    else:
        return 0.30  # Bearish


def quoter(ctx: hz.Context, fair: float) -> list[hz.Quote]:
    book = ctx.feeds.get("book", FeedData())
    spread_est = (book.ask - book.bid) if book.bid > 0 else 0.06
    spread = max(0.04, spread_est * 1.2)
    skew = ctx.inventory.net * 0.001
    return hz.quotes(fair - skew, spread=spread, size=5)


result = hz.backtest(
    name="multi_feed_mm",
    markets=["btc-100k"],
    data={
        "btc": "data/btc_1min.csv",
        "book": "data/polymarket_book.csv",
    },
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=100, max_drawdown_pct=5),
    initial_capital=500.0,
)

print(result.summary())

At each timestamp in the merged timeline, both feeds carry forward their latest values, so the btc feed updates even when book has no new data at that timestamp and vice versa.

CSV Input Example

"""Backtest from a CSV file with bid/ask data."""

import horizon as hz
from horizon.context import FeedData


def fair_value(ctx: hz.Context) -> float:
    feed = ctx.feeds.get("default", FeedData())
    if feed.bid > 0 and feed.ask > 0:
        return (feed.bid + feed.ask) / 2.0
    return feed.price if feed.price > 0 else 0.50


def quoter(ctx: hz.Context, fair: float) -> list[hz.Quote]:
    return hz.quotes(fair, spread=0.04, size=10)


result = hz.backtest(
    name="csv_backtest",
    markets=["my-market"],
    data="data/historical_quotes.csv",
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=200),
    initial_capital=1000.0,
    paper_fee_rate=0.002,  # 20 bps fee
)

print(result.summary())

# Per-market P&L breakdown
for market_id, pnl in result.pnl_by_market().items():
    print(f"  {market_id}: ${pnl:+.2f}")

With Outcomes for Brier Score

Pass known outcomes to compute Brier score and average edge. This is essential for evaluating your probability calibration:

"""Backtest with known outcomes for calibration metrics."""

import horizon as hz
from horizon.context import FeedData


def fair_value(ctx: hz.Context) -> float:
    feed = ctx.feeds.get("default", FeedData())
    return feed.price if feed.price > 0 else 0.50


def quoter(ctx: hz.Context, fair: float) -> list[hz.Quote]:
    skew = ctx.inventory.net * 0.002
    return hz.quotes(fair - skew, spread=0.06, size=5)


result = hz.backtest(
    name="calibration_test",
    markets=["election-2024", "fed-rate-cut"],
    data={
        "default": [
            {"timestamp": float(i), "price": 0.55 + 0.02 * (i % 5)}
            for i in range(200)
        ],
    },
    pipeline=[fair_value, quoter],
    risk=hz.Risk(max_position=50),
    initial_capital=500.0,
    outcomes={
        "election-2024": 1.0,   # Yes resolved
        "fed-rate-cut": 0.0,    # No resolved
    },
)

print(result.summary())

m = result.metrics
print(f"\nBrier Score: {m.brier_score:.4f}")   # 0 = perfect, 0.25 = coin flip
print(f"Avg Edge:    {m.avg_edge:+.4f}")       # Positive = profitable calibration

A Brier score below 0.25 means your model forecasts better than a coin flip. Below 0.10 is considered excellent calibration for prediction markets.

Run It

python examples/backtest_example.py

# Or in a Jupyter notebook, result.summary() renders nicely in print()

​Full Code

​How It Works

​Data Formats

​list[dict]

​CSV file path

​pandas DataFrame

​dict[str, data] for multi-feed

​Interpreting Results

​Returns

​Risk

​Trades

​Prediction Market Metrics

​Multi-Feed Backtesting

​CSV Input Example

​With Outcomes for Brier Score

​Run It

Full Code

How It Works

Data Formats

list[dict]

CSV file path

pandas DataFrame

dict[str, data] for multi-feed

Interpreting Results

Returns

Risk

Trades

Prediction Market Metrics

Multi-Feed Backtesting

CSV Input Example

With Outcomes for Brier Score

Run It