Pro Feature. Requires a Pro or Ultra subscription. Get started at api.mathematicalcompany.com
Robustness Testing
Three statistical tests designed specifically for prediction markets. They answer a question every trader must ask: “Is my edge real, or did I just get lucky?”Event Permutation
Shuffle which outcomes map to which markets. Tests whether your market selection was skillful.
Outcome Randomization
Keep trades fixed, re-draw binary outcomes from Bernoulli(p). Tests whether you got lucky on resolutions.
Path Simulation
Shuffle trade ordering to test if your drawdown and loss streaks were unusually good or bad.
Quick Start
Event Permutation Test
Tests whether your strategy’s market selection was genuinely skillful. Since prediction market events are independent, the order in which outcomes resolve shouldn’t matter if your edge is real. This test shuffles which outcomes map to which markets and recomputes PnL. If your observed PnL is in the upper tail of the permuted distribution, your selection was skillful (not lucky).Signature
Example
PermutationTestResult
| Field | Type | Description |
|---|---|---|
observed_pnl | float | Actual total PnL from the backtest |
mean_permuted_pnl | float | Mean PnL across permuted orderings |
std_permuted_pnl | float | Standard deviation of permuted PnLs |
p_value | float | Fraction of permutations with PnL >= observed |
n_permutations | int | Number of permutations executed |
permuted_pnls | list[float] | All permuted PnLs (for histograms) |
is_significant | bool | True if p_value < 0.05 |
Mathematical Details
Mathematical Details
Let be the set of all permutations of outcome assignments. For each permutation :The p-value is computed conservatively:Under (no selection skill), the observed PnL should be typical of the permuted distribution. A small p-value rejects .
Outcome Randomization
Tests whether your realized outcomes were luckier than expected given market pricing. Keeps every trade exactly as-is (same prices, sizes, timing). For each simulation, re-draws each market’s binary outcome fromBernoulli(p) where p is the market’s implied probability (average buy price). If your strategy has genuinely better calibration than the market, your real PnL will sit in the upper tail.
Signature
Example
OutcomeRandomizationResult
| Field | Type | Description |
|---|---|---|
observed_pnl | float | Actual total PnL |
mean_random_pnl | float | Mean PnL under randomized outcomes |
std_random_pnl | float | Standard deviation of randomized PnLs |
p_value | float | Fraction of random runs with PnL >= observed |
n_simulations | int | Number of randomization runs |
simulated_pnls | list[float] | All simulated PnLs |
is_significant | bool | True if p_value < 0.05 |
How it works
How it works
For closed round-trips (buy then sell before expiry), the PnL is fixed regardless of the final outcome. Only open positions held to resolution contribute to variance across simulations.For each simulation :
- For each market , draw where is the implied probability
- Compute resolution PnL from open positions using
- Total PnL = (fixed round-trip PnL) + (random resolution PnL)
PnL Path Simulation
Tests whether your drawdown and loss streaks were unusually good or bad. Total PnL is invariant to ordering (same sum), but path-dependent statistics like max drawdown and consecutive losses depend heavily on which order trades occurred. This test shuffles the order of round-trip PnLs and computes a distribution of these path statistics.Signature
Example
PathSimulationResult
| Field | Type | Description |
|---|---|---|
observed_max_drawdown | float | Actual max drawdown |
observed_terminal_equity | float | Final equity (invariant across shuffles) |
mean_max_drawdown | float | Mean max DD across shuffled paths |
std_max_drawdown | float | Std of simulated max drawdowns |
p_value_drawdown | float | Fraction of paths with DD >= observed |
max_consecutive_losses_observed | int | Actual longest loss streak |
mean_consecutive_losses | float | Mean longest loss streak across paths |
n_simulations | int | Number of shuffled paths |
simulated_max_drawdowns | list[float] | All simulated max DDs |
simulated_max_consecutive_losses | list[int] | All simulated loss streaks |
Why drawdown depends on ordering
Why drawdown depends on ordering
Consider two sequences of trade PnLs:
[+10, +10, -5, -5] and [-5, -5, +10, +10].Both have the same total PnL (+10), but:- Sequence 1: max drawdown = 10 (peak at +20, trough at +10)
- Sequence 2: max drawdown = 10 (peak at 0, trough at -10)
Convenience Wrapper
Run all three tests with a single call:- permutation and outcome require
outcomeswith >= 2 markets - path requires at least 1 trade
RobustnessReport
| Field | Type | Description |
|---|---|---|
permutation | PermutationTestResult | None | Event permutation result |
outcome | OutcomeRandomizationResult | None | Outcome randomization result |
path | PathSimulationResult | None | Path simulation result |
is_robust | bool | True if all executed tests pass |
Interpreting Results
p-values
| p-value | Interpretation |
|---|---|
| < 0.01 | Strong evidence of genuine edge |
| 0.01 - 0.05 | Significant evidence |
| 0.05 - 0.10 | Marginal — investigate further |
| > 0.10 | Insufficient evidence that edge is real |
Which test tells you what
| Test | Question answered |
|---|---|
| Permutation | ”Did I pick the right markets?” |
| Outcome | ”Were the resolutions luckier than market pricing implied?” |
| Path | ”Was my drawdown path unusually good/bad?” |
Recommended workflow
- Run all three tests with >= 1000 simulations
- If permutation test fails: your market selection may be random
- If outcome test fails: your calibration edge may not be real
- If path test shows unfavorable drawdown: size down — your worst drawdown is likely ahead