Skip to main content
Pro Feature. Requires a Pro or Ultra subscription. Get started at api.mathematicalcompany.com

Alpha Research

Horizon ships four pure-Python research modules inspired by Lopez de Prado’s Advances in Financial Machine Learning. Use them standalone for offline analysis or drop their pipeline functions into hz.run() for live monitoring.

Meta-Labeling

Triple-barrier labeling: primary model gives direction, meta-label model decides sizing.

Feature Importance

MDA, SFI, and clustered MDA with purged cross-validation to prevent leakage.

Alpha Decay

Track information coefficient over time, estimate half-life, detect dying edges.

PnL Attribution

Break down returns by market, time period, and factor exposure.

Meta-Labeling (AFML Ch. 3)

A two-model framework. The primary model predicts direction (+1 long, -1 short). The meta-label model then decides whether to act on that signal (1) or abstain (0), using a triple-barrier method: profit-taking, stop-loss, and a vertical (time) barrier. This separation lets you use a high-recall primary model (catches most opportunities) and a high-precision meta-label model (filters out bad trades), which is far more effective than trying to build a single model that does both.
from horizon.meta_label import compute_meta_labels, meta_label_pipeline

compute_meta_labels

Compute meta-labels from primary model signals using triple barriers. For each primary signal, scans forward from the signal index and applies three barriers:
  • Profit-taking (PT): Return exceeds vol * pt_sl[0] in the direction of the primary signal. Meta-label = 1 (act).
  • Stop-loss (SL): Return exceeds vol * pt_sl[1] against the primary signal. Meta-label = 0 (abstain).
  • Vertical barrier: max_holding bars elapse with no barrier hit. Meta-label = 1 if cumulative return > 0, else 0.
labels = compute_meta_labels(
    prices=[100, 101, 102, 99, 98, 103, 105],
    timestamps=[0, 1, 2, 3, 4, 5, 6],
    primary_signals=[(0, 1), (3, -1)],  # (event_index, side)
    pt_sl=(1.0, 1.0),
    max_holding=5,
    vol_span=20,
)

for label in labels:
    print(f"idx={label.event_idx} side={label.primary_side} "
          f"label={label.meta_label} ret={label.ret:.4f} "
          f"conf={label.confidence:.2f}")

Parameters

ParameterTypeDefaultDescription
priceslist[float]requiredPrice series (length T).
timestampslist[float]requiredTimestamp series (length T, monotonically increasing).
primary_signalslist[tuple[int, int]]requiredList of (event_index, side) where side is +1 (long) or -1 (short).
pt_sltuple[float, float](1.0, 1.0)Multipliers for profit-taking and stop-loss barriers, applied to local volatility. (1.0, 1.0) = symmetric barriers at 1x vol.
max_holdingint100Maximum bars before the vertical barrier fires.
vol_spanint20Span for the EWM standard deviation of log returns used to set barrier widths.

Returns

List of MetaLabel objects.

MetaLabel

FieldTypeDescription
event_idxintIndex into the price series where the primary signal fired.
primary_sideintDirection of the primary signal (+1 long, -1 short).
meta_labelint1 if the signal was profitable (act), 0 if not (abstain).
retfloatRealized return from the primary signal’s perspective.
confidencefloatConfidence score in [0, 1] based on return magnitude relative to volatility.

meta_label_pipeline

Pipeline function for hz.run(). Reads the primary model’s signal from ctx.params, maintains a rolling buffer of price observations, and injects meta-label decisions.
hz.run(
    pipeline=[
        my_primary_model,         # sets ctx.params["primary_signal"] = +1/-1/0
        meta_label_pipeline(),    # injects meta_label and meta_confidence
        my_quoter,                # uses ctx.params["meta_label"] to gate trades
    ],
    ...
)

Parameters

ParameterTypeDefaultDescription
primary_signal_keystr"primary_signal"Key in ctx.params containing the primary model’s directional signal (+1/-1/0).
windowint50Number of recent observations to keep for meta-label computation.

Injected into ctx.params

KeyTypeDescription
"meta_label"int0 (abstain) or 1 (act).
"meta_confidence"floatConfidence in [0, 1]. Blends latest barrier confidence with rolling hit rate once 5+ labels accumulate.
If the primary signal is 0 or missing, the pipeline passes through with meta_label=0 and meta_confidence=0.0.

Feature Importance (AFML Ch. 8)

Model-agnostic feature importance methods with purged cross-validation. Standard k-fold CV leaks information in time-series data because adjacent samples are correlated. Purged CV removes training samples within a configurable gap of each test fold, preventing look-ahead bias. All methods accept a generic score_fn(X_train, y_train, X_test, y_test) -> float so they work with any model (sklearn, xgboost, a simple function, etc.).
from horizon.feature_importance import (
    mda_importance,
    sfi_importance,
    clustered_mda,
    FeatureImportance,
)

mda_importance

Mean Decrease Accuracy (permutation importance). For each CV fold, computes a baseline test score, then shuffles each feature column individually and re-scores. Importance = mean decrease in score caused by shuffling.
def my_scorer(X_train, y_train, X_test, y_test):
    from collections import Counter
    majority = Counter(y_train).most_common(1)[0][0]
    return sum(1 for y in y_test if y == majority) / len(y_test)

results = mda_importance(
    score_fn=my_scorer,
    X=feature_matrix,
    y=labels,
    feature_names=["momentum", "vol", "spread", "imbalance"],
    n_splits=5,
    purge_gap=10,
    seed=42,
)

for fi in results:
    print(f"{fi.feature}: {fi.importance:.4f} +/- {fi.std:.4f}")

Parameters

ParameterTypeDefaultDescription
score_fncallablerequired(X_train, y_train, X_test, y_test) -> float. Higher = better.
Xlist[list[float]]requiredFeature matrix (N x D). Each inner list is one sample.
ylist[float]requiredLabel vector (length N).
feature_nameslist[str]requiredNames for each feature (length D).
n_splitsint5Number of cross-validation folds.
purge_gapint0Number of samples to purge around each test fold.
seedint or NoneNoneRandom seed for reproducibility.

Returns

List of FeatureImportance sorted by importance descending.

sfi_importance

Single Feature Importance (AFML Ch. 8.6). Trains the model on each feature individually and evaluates via cross-validation. The importance of a feature is its cross-validated score when used as the sole predictor.
results = sfi_importance(
    score_fn=my_scorer,
    X=feature_matrix,
    y=labels,
    feature_names=["momentum", "vol", "spread", "imbalance"],
    n_splits=5,
)

Parameters

ParameterTypeDefaultDescription
score_fncallablerequired(X_train, y_train, X_test, y_test) -> float.
Xlist[list[float]]requiredFeature matrix (N x D).
ylist[float]requiredLabel vector (length N).
feature_nameslist[str]requiredNames for each feature (length D).
n_splitsint5Number of cross-validation folds.

Returns

List of FeatureImportance sorted by importance descending.

clustered_mda

Clustered Feature Importance (AFML Ch. 8.7). Groups features by correlation using agglomerative clustering (distance = 1 - |correlation|), then permutes entire clusters at once. When one feature in a correlated group is shuffled, the model can compensate by using the remaining correlated features. Shuffling the entire cluster eliminates this substitution effect, giving a more accurate picture of the group’s true importance.
results = clustered_mda(
    score_fn=my_scorer,
    X=feature_matrix,
    y=labels,
    feature_names=["momentum", "vol", "spread", "imbalance"],
    n_clusters=2,
    n_splits=5,
    seed=42,
)

# Each result.feature contains comma-separated names of features in the cluster
for fi in results:
    print(f"Cluster [{fi.feature}]: {fi.importance:.4f} +/- {fi.std:.4f}")

Parameters

ParameterTypeDefaultDescription
score_fncallablerequired(X_train, y_train, X_test, y_test) -> float.
Xlist[list[float]]requiredFeature matrix (N x D).
ylist[float]requiredLabel vector (length N).
feature_nameslist[str]requiredNames for each feature (length D).
n_clustersint3Number of feature clusters.
n_splitsint5Number of cross-validation folds.
seedint or NoneNoneRandom seed for reproducibility.

Returns

List of FeatureImportance, one per cluster. The feature field contains comma-separated names of features in that cluster. Sorted by importance descending.

FeatureImportance

FieldTypeDescription
featurestrFeature name (or comma-separated cluster names for clustered_mda).
importancefloatMean importance score (higher = more important).
stdfloatStandard deviation of importance across folds.
MDA can understate the importance of correlated features. If your feature set has groups of highly correlated predictors (e.g., multiple momentum lookbacks), use clustered_mda instead.

Alpha Decay Tracking

Monitor whether your trading edge is dying. The AlphaDecayTracker computes rolling IC (Spearman rank correlation between predictions and outcomes), estimates half-life via AR(1) fit, and detects negative trend via linear regression on the IC series.
from horizon.alpha_decay import AlphaDecayTracker, alpha_decay_pipeline, AlphaDecayReport

AlphaDecayTracker

Stateful tracker that accumulates predictions and outcomes over time.
tracker = AlphaDecayTracker(window=100, alert_threshold=-0.05)

# Feed new observations
report = tracker.update(
    predictions=[0.6, 0.7, 0.55],
    outcomes=[1.0, 0.0, 1.0],
)

if report and report.is_decaying:
    print(f"Alpha decaying! IC={report.current_ic:.3f}, "
          f"half_life={report.half_life:.1f}")

Constructor Parameters

ParameterTypeDefaultDescription
windowint100Rolling window size for IC calculation.
alert_thresholdfloat-0.05IC trend slope below which alpha is considered decaying.

update()

Add a new batch of predictions/outcomes and compute the report.
ParameterTypeDefaultDescription
predictionslist[float]requiredModel prediction values (e.g., predicted probabilities).
outcomeslist[float]requiredRealized outcome values (e.g., 1.0 for win, 0.0 for loss).
timestampfloat or NoneNoneObservation timestamp. Defaults to current time.
Returns AlphaDecayReport if enough data has accumulated (at least window // 2 observations), None otherwise.

report()

Force compute the current alpha decay state. Returns AlphaDecayReport regardless of data size.

AlphaDecayReport

FieldTypeDescription
current_icfloatCurrent Spearman rank correlation between predictions and outcomes.
rolling_iclist[tuple[float, float]]History of (timestamp, IC) observations.
half_lifefloatEstimated half-life in observations (AR(1) fit). inf if not mean-reverting.
ic_trendfloatLinear regression slope of the IC series. Negative = decaying.
is_decayingboolTrue if ic_trend < alert_threshold.
rolling_sharpelist[tuple[float, float]]History of (timestamp, Sharpe) observations.
sharpe_trendfloatLinear regression slope of the rolling Sharpe series.
time_to_zerofloat or NoneEstimated observations until IC reaches 0 (linear extrapolation). None if IC is not trending down.

alpha_decay_pipeline

Pipeline function for hz.run(). Uses predictions from ctx.params["predictions"] and outcomes derived from fills to track alpha decay in real time.
hz.run(
    pipeline=[
        alpha_decay_pipeline(window=100, alert_threshold=-0.05),
        my_model,    # sets ctx.params["predictions"]
        my_quoter,
    ],
    ...
)

Parameters

ParameterTypeDefaultDescription
windowint100Rolling window for IC calculation.
alert_thresholdfloat-0.05IC trend slope below which alpha is considered decaying.

Injected into ctx.params

KeyTypeDescription
"alpha_ic"floatCurrent information coefficient.
"alpha_half_life"floatEstimated half-life of IC.
"alpha_decaying"boolWhether alpha is currently decaying.
The pipeline logs a warning when is_decaying transitions from False to True, including the current IC, trend slope, and half-life.

PnL Attribution

Break down portfolio PnL by market, time period, and factor exposure to understand where returns come from.
from horizon.pnl_attribution import (
    attribute_pnl,
    attribute_by_time,
    attribute_by_factor,
    pnl_attribution_pipeline,
    AttributionReport,
    PnLBreakdown,
    TimeBreakdown,
    FactorBreakdown,
)

attribute_pnl

Extract positions from an engine and compute per-market PnL breakdown. Results are sorted by absolute PnL descending.
report = attribute_pnl(engine)

print(f"Total PnL: {report.total_pnl:+.2f} across {report.n_positions} positions")
for mkt in report.by_market:
    print(f"  {mkt.market_id}: {mkt.pnl:+.4f} ({mkt.contribution:.1%})")

Parameters

ParameterTypeDefaultDescription
engineEnginerequiredHorizon Engine instance.

Returns

AttributionReport with by_market populated.

attribute_by_time

Group fills by time period and compute PnL per period.
fills = engine.fills()
daily = attribute_by_time(fills, period="daily")

for day in daily:
    print(f"{day.period}: PnL={day.pnl:+.4f}, trades={day.n_trades}, "
          f"win_rate={day.win_rate:.1%}")

Parameters

ParameterTypeDefaultDescription
fillslist[Fill]requiredList of Fill objects (must have timestamp, price, size attributes).
periodstr"daily"Aggregation period: "hourly", "daily", or "weekly".

Returns

List of TimeBreakdown sorted chronologically.

attribute_by_factor

Factor-based PnL attribution. Maps positions to factors via exposure weights and computes each factor’s PnL contribution and R-squared.
positions = engine.positions()
factor_exposures = {
    "crypto": {"btc-100k": 0.8, "eth-5k": 0.9},
    "politics": {"trump-win": 1.0, "senate-flip": 0.7},
}

factors = attribute_by_factor(positions, factor_exposures)
for f in factors:
    print(f"{f.factor}: exposure={f.exposure:.2f}, "
          f"pnl={f.pnl_contribution:+.4f}, R2={f.r_squared:.3f}")

Parameters

ParameterTypeDefaultDescription
positionslist[Position]requiredList of Position objects.
factor_exposuresdict[str, dict[str, float]]requiredMapping of factor_name -> {market_id: exposure_weight}.

Returns

List of FactorBreakdown, one per factor.

pnl_attribution_pipeline

Pipeline function for hz.run() that adds attribution data each cycle.
hz.run(
    pipeline=[
        pnl_attribution_pipeline(engine),
        my_model,
        my_quoter,
    ],
    ...
)

Parameters

ParameterTypeDefaultDescription
engineEngine or NoneNoneOptional engine override. If None, uses ctx.params["engine"].

Injected into ctx.params

KeyTypeDescription
"pnl_by_market"list[PnLBreakdown]Per-market PnL breakdowns.
"pnl_top_winner"PnLBreakdown or NoneMarket with highest positive PnL.
"pnl_top_loser"PnLBreakdown or NoneMarket with most negative PnL.

Type Reference

AttributionReport

FieldTypeDescription
by_marketlist[PnLBreakdown]Per-market breakdowns, sorted by absolute PnL descending.
by_timelist[TimeBreakdown]Per-period breakdowns (populated by attribute_by_time).
by_factorlist[FactorBreakdown]Per-factor breakdowns (populated by attribute_by_factor).
total_pnlfloatSum of all position PnLs.
n_positionsintNumber of positions in the engine.

PnLBreakdown

FieldTypeDescription
market_idstrMarket identifier.
pnlfloatRealized + unrealized PnL for this market.
pnl_pctfloatPnL as percentage of cost basis.
contributionfloatFraction of total PnL attributable to this market.

TimeBreakdown

FieldTypeDescription
periodstrTime bucket label (e.g., "2025-01-15" or "2025-01-15 14:00").
pnlfloatNet PnL for this period.
n_tradesintNumber of fills in this period.
win_ratefloatFraction of profitable fills in this period.

FactorBreakdown

FieldTypeDescription
factorstrFactor name.
exposurefloatAverage absolute exposure weight across markets in this factor.
pnl_contributionfloatWeighted PnL contribution from this factor.
r_squaredfloatFraction of total PnL variance explained by this factor (capped at 1.0).

Full Research Workflow

Combine all four modules for a complete alpha research loop:
import horizon as hz
from horizon.meta_label import compute_meta_labels
from horizon.feature_importance import mda_importance, clustered_mda
from horizon.alpha_decay import AlphaDecayTracker
from horizon.pnl_attribution import attribute_pnl, attribute_by_time

# 1. Meta-label your primary signals
labels = compute_meta_labels(
    prices=price_series,
    timestamps=ts_series,
    primary_signals=signals,
    pt_sl=(1.5, 1.0),      # Asymmetric: wider profit target
    max_holding=50,
)
hit_rate = sum(1 for lb in labels if lb.meta_label == 1) / len(labels)
print(f"Meta-label hit rate: {hit_rate:.1%}")

# 2. Rank your features
importances = mda_importance(
    score_fn=my_scorer,
    X=features,
    y=[lb.meta_label for lb in labels],
    feature_names=feature_names,
    purge_gap=5,
)
print("Top features:", [fi.feature for fi in importances[:3]])

# 3. Track alpha decay
tracker = AlphaDecayTracker(window=200)
report = tracker.update(predictions, outcomes)
if report and report.is_decaying:
    print(f"Edge decaying: half_life={report.half_life:.0f} obs")

# 4. Attribute PnL
attr = attribute_pnl(engine)
print(f"Top contributor: {attr.by_market[0].market_id} "
      f"({attr.by_market[0].contribution:.1%} of total)")