Alpha Research Pipeline

Pro Feature. Requires a Pro or Ultra subscription. Get started at api.mathematicalcompany.com

A complete alpha research workflow using AFML (Advances in Financial Machine Learning) techniques: triple-barrier labeling, feature importance with purged cross-validation, alpha decay measurement, and PnL attribution.

Full Code

"""Alpha research pipeline: labels, importance, decay, attribution."""

import horizon as hz

# ── Step 1: Triple-barrier labeling ──
# Generate meta-labels for a set of events
prices = [0.50, 0.51, 0.49, 0.52, 0.54, 0.53, 0.55, 0.54, 0.56, 0.58,
          0.57, 0.55, 0.53, 0.52, 0.54, 0.56, 0.58, 0.60, 0.59, 0.57]

# Compute daily volatility for barrier sizing
vol = hz.get_daily_vol(prices, lookback=10)
print(f"Daily vol: {vol:.4f}")

# CUSUM filter for event detection
events = hz.cusum_filter(prices, threshold=vol)
print(f"Events detected: {len(events)}")

# Apply triple barriers
labels = hz.triple_barrier_labels(
    prices=prices,
    events=events,
    upper_barrier=2.0 * vol,   # take profit at 2x vol
    lower_barrier=1.0 * vol,   # stop loss at 1x vol
    max_holding=10,             # max 10 periods
)

for label in labels[:5]:
    print(f"  Event t={label.event_idx}: label={label.label} ret={label.return_val:.4f} duration={label.duration}")

# ── Step 2: Meta-labeling ──
meta = hz.compute_meta_labels(
    prices=prices,
    primary_model_predictions=[1, 1, -1, 1, 1, -1, 1, -1, 1, 1,
                                1, -1, -1, -1, 1, 1, 1, 1, -1, -1],
    upper_barrier=2.0 * vol,
    lower_barrier=1.0 * vol,
    max_holding=10,
)

print(f"\nMeta Labels:")
print(f"  Total: {len(meta)}")
positive = sum(1 for m in meta if m.label == 1)
print(f"  Positive (primary was right): {positive}")
print(f"  Negative (primary was wrong): {len(meta) - positive}")

# ── Step 3: Feature importance ──
# MDA: Mean Decrease Accuracy (drop-one feature importance)
features = [
    [0.5, 0.3, 0.8],
    [0.6, 0.2, 0.7],
    [0.4, 0.4, 0.9],
    [0.7, 0.1, 0.6],
    [0.3, 0.5, 0.8],
    [0.8, 0.2, 0.5],
    [0.5, 0.3, 0.7],
    [0.6, 0.4, 0.6],
]
target = [1, 1, 0, 1, 0, 1, 0, 1]

mda = hz.mda_importance(
    features=features,
    labels=target,
    feature_names=["momentum", "flow", "vol"],
    n_splits=3,
)

print(f"\nMDA Feature Importance:")
for fi in mda:
    print(f"  {fi.name:15s} importance={fi.importance:.4f} std={fi.std:.4f}")

# SFI: Single Feature Importance
sfi = hz.sfi_importance(
    features=features,
    labels=target,
    feature_names=["momentum", "flow", "vol"],
    n_splits=3,
)

print(f"\nSFI Feature Importance:")
for fi in sfi:
    print(f"  {fi.name:15s} importance={fi.importance:.4f} std={fi.std:.4f}")

# ── Step 4: Alpha decay tracking ──
ic_series = [0.15, 0.14, 0.12, 0.11, 0.09, 0.08, 0.06, 0.05, 0.04, 0.03]

report = hz.alpha_decay_pipeline(
    ic_values=ic_series,
    timestamps=[float(i) for i in range(len(ic_series))],
)

print(f"\nAlpha Decay:")
print(f"  Initial IC:     {report.initial_ic:.4f}")
print(f"  Current IC:     {report.current_ic:.4f}")
print(f"  Half-life:      {report.half_life:.1f} periods")
print(f"  Decay rate:     {report.decay_rate:.4f}")
print(f"  Is decaying:    {report.is_decaying}")

# ── Step 5: PnL attribution ──
attribution = hz.attribute_pnl(
    market_ids=["election-winner", "btc-100k", "gop-senate"],
    pnls=[120.50, -45.30, 67.20],
    sizes=[100, 80, 60],
)

print(f"\nPnL Attribution:")
print(f"  Total PnL:      ${attribution.total_pnl:,.2f}")
for bd in attribution.breakdowns:
    print(f"  {bd.market_id:20s} pnl=${bd.pnl:>8.2f} contribution={bd.contribution:.1%}")

How It Works

Triple-barrier labeling classifies each trade as win/loss/timeout based on price barriers
Meta-labeling evaluates whether a primary model’s signals are correct (sizing layer)
MDA importance measures each feature’s contribution by shuffling it and observing accuracy drop
SFI importance measures each feature’s standalone predictive power
Alpha decay tracks how quickly your signal’s information coefficient degrades over time
PnL attribution decomposes returns by market, time period, and risk factors

Time-Based Attribution

Break down PnL by hour, day, or custom periods:

time_attr = hz.attribute_by_time(
    timestamps=[0.0, 3600.0, 7200.0, 10800.0, 14400.0],
    pnls=[10.0, -5.0, 15.0, -3.0, 8.0],
    period="hourly",
)

print("Hourly PnL:")
for tb in time_attr.breakdowns:
    print(f"  {tb.period}: ${tb.pnl:>8.2f}")

Factor Attribution

Decompose PnL by risk factors:

factor_attr = hz.attribute_by_factor(
    pnls=[120.50, -45.30, 67.20],
    factor_exposures={
        "momentum": [0.8, -0.3, 0.5],
        "value":    [0.2, 0.7, -0.1],
        "vol":      [-0.1, 0.4, 0.3],
    },
)

print("Factor Attribution:")
for fb in factor_attr.breakdowns:
    print(f"  {fb.factor:15s} contribution=${fb.contribution:>8.2f}")

Run It

python examples/alpha_research_pipeline.py

See Alpha Research and Bars & Labeling for the full AFML reference.

​Full Code

​How It Works

​Time-Based Attribution

​Factor Attribution

​Run It

Full Code

How It Works

Time-Based Attribution

Factor Attribution

Run It