> ## Documentation Index
> Fetch the complete documentation index at: https://mathematicalcompany.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Alpha Research

> Meta-labeling, feature importance, alpha decay tracking, and PnL attribution for systematic strategy research.

<Note>
  **Pro Feature.** Requires a Pro or Ultra subscription. [Get started at api.mathematicalcompany.com](https://api.mathematicalcompany.com)
</Note>

# Alpha Research

Horizon ships four pure-Python research modules inspired by Lopez de Prado's *Advances in Financial Machine Learning*. Use them standalone for offline analysis or drop their pipeline functions into `hz.run()` for live monitoring.

<CardGroup cols={2}>
  <Card title="Meta-Labeling" icon="tags">
    Triple-barrier labeling: primary model gives direction, meta-label model decides sizing.
  </Card>

  <Card title="Feature Importance" icon="ranking-star">
    MDA, SFI, and clustered MDA with purged cross-validation to prevent leakage.
  </Card>

  <Card title="Alpha Decay" icon="hourglass-half">
    Track information coefficient over time, estimate half-life, detect dying edges.
  </Card>

  <Card title="PnL Attribution" icon="chart-pie">
    Break down returns by market, time period, and factor exposure.
  </Card>
</CardGroup>

***

## Meta-Labeling (AFML Ch. 3)

A two-model framework. The **primary model** predicts direction (+1 long, -1 short). The **meta-label model** then decides whether to *act* on that signal (1) or *abstain* (0), using a triple-barrier method: profit-taking, stop-loss, and a vertical (time) barrier.

This separation lets you use a high-recall primary model (catches most opportunities) and a high-precision meta-label model (filters out bad trades), which is far more effective than trying to build a single model that does both.

```python theme={null}
from horizon.meta_label import compute_meta_labels, meta_label_pipeline
```

### compute\_meta\_labels

Compute meta-labels from primary model signals using triple barriers.

For each primary signal, scans forward from the signal index and applies three barriers:

* **Profit-taking (PT)**: Return exceeds `vol * pt_sl[0]` in the direction of the primary signal. Meta-label = 1 (act).
* **Stop-loss (SL)**: Return exceeds `vol * pt_sl[1]` against the primary signal. Meta-label = 0 (abstain).
* **Vertical barrier**: `max_holding` bars elapse with no barrier hit. Meta-label = 1 if cumulative return > 0, else 0.

```python theme={null}
labels = compute_meta_labels(
    prices=[100, 101, 102, 99, 98, 103, 105],
    timestamps=[0, 1, 2, 3, 4, 5, 6],
    primary_signals=[(0, 1), (3, -1)],  # (event_index, side)
    pt_sl=(1.0, 1.0),
    max_holding=5,
    vol_span=20,
)

for label in labels:
    print(f"idx={label.event_idx} side={label.primary_side} "
          f"label={label.meta_label} ret={label.ret:.4f} "
          f"conf={label.confidence:.2f}")
```

#### Parameters

| Parameter         | Type                    | Default      | Description                                                                                                                     |
| ----------------- | ----------------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------- |
| `prices`          | `list[float]`           | *required*   | Price series (length T).                                                                                                        |
| `timestamps`      | `list[float]`           | *required*   | Timestamp series (length T, monotonically increasing).                                                                          |
| `primary_signals` | `list[tuple[int, int]]` | *required*   | List of `(event_index, side)` where side is +1 (long) or -1 (short).                                                            |
| `pt_sl`           | `tuple[float, float]`   | `(1.0, 1.0)` | Multipliers for profit-taking and stop-loss barriers, applied to local volatility. `(1.0, 1.0)` = symmetric barriers at 1x vol. |
| `max_holding`     | `int`                   | `100`        | Maximum bars before the vertical barrier fires.                                                                                 |
| `vol_span`        | `int`                   | `20`         | Span for the EWM standard deviation of log returns used to set barrier widths.                                                  |

#### Returns

List of `MetaLabel` objects.

### MetaLabel

| Field          | Type    | Description                                                                   |
| -------------- | ------- | ----------------------------------------------------------------------------- |
| `event_idx`    | `int`   | Index into the price series where the primary signal fired.                   |
| `primary_side` | `int`   | Direction of the primary signal (+1 long, -1 short).                          |
| `meta_label`   | `int`   | 1 if the signal was profitable (act), 0 if not (abstain).                     |
| `ret`          | `float` | Realized return from the primary signal's perspective.                        |
| `confidence`   | `float` | Confidence score in \[0, 1] based on return magnitude relative to volatility. |

### meta\_label\_pipeline

Pipeline function for `hz.run()`. Reads the primary model's signal from `ctx.params`, maintains a rolling buffer of price observations, and injects meta-label decisions.

```python theme={null}
hz.run(
    pipeline=[
        my_primary_model,         # sets ctx.params["primary_signal"] = +1/-1/0
        meta_label_pipeline(),    # injects meta_label and meta_confidence
        my_quoter,                # uses ctx.params["meta_label"] to gate trades
    ],
    ...
)
```

#### Parameters

| Parameter            | Type  | Default            | Description                                                                      |
| -------------------- | ----- | ------------------ | -------------------------------------------------------------------------------- |
| `primary_signal_key` | `str` | `"primary_signal"` | Key in `ctx.params` containing the primary model's directional signal (+1/-1/0). |
| `window`             | `int` | `50`               | Number of recent observations to keep for meta-label computation.                |

#### Injected into ctx.params

| Key                 | Type    | Description                                                                                              |
| ------------------- | ------- | -------------------------------------------------------------------------------------------------------- |
| `"meta_label"`      | `int`   | 0 (abstain) or 1 (act).                                                                                  |
| `"meta_confidence"` | `float` | Confidence in \[0, 1]. Blends latest barrier confidence with rolling hit rate once 5+ labels accumulate. |

<Note>
  If the primary signal is 0 or missing, the pipeline passes through with `meta_label=0` and `meta_confidence=0.0`.
</Note>

***

## Feature Importance (AFML Ch. 8)

Model-agnostic feature importance methods with **purged cross-validation**. Standard k-fold CV leaks information in time-series data because adjacent samples are correlated. Purged CV removes training samples within a configurable gap of each test fold, preventing look-ahead bias.

All methods accept a generic `score_fn(X_train, y_train, X_test, y_test) -> float` so they work with any model (sklearn, xgboost, a simple function, etc.).

```python theme={null}
from horizon.feature_importance import (
    mda_importance,
    sfi_importance,
    clustered_mda,
    FeatureImportance,
)
```

### mda\_importance

**Mean Decrease Accuracy** (permutation importance). For each CV fold, computes a baseline test score, then shuffles each feature column individually and re-scores. Importance = mean decrease in score caused by shuffling.

```python theme={null}
def my_scorer(X_train, y_train, X_test, y_test):
    from collections import Counter
    majority = Counter(y_train).most_common(1)[0][0]
    return sum(1 for y in y_test if y == majority) / len(y_test)

results = mda_importance(
    score_fn=my_scorer,
    X=feature_matrix,
    y=labels,
    feature_names=["momentum", "vol", "spread", "imbalance"],
    n_splits=5,
    purge_gap=10,
    seed=42,
)

for fi in results:
    print(f"{fi.feature}: {fi.importance:.4f} +/- {fi.std:.4f}")
```

#### Parameters

| Parameter       | Type                | Default    | Description                                                     |
| --------------- | ------------------- | ---------- | --------------------------------------------------------------- |
| `score_fn`      | `callable`          | *required* | `(X_train, y_train, X_test, y_test) -> float`. Higher = better. |
| `X`             | `list[list[float]]` | *required* | Feature matrix (N x D). Each inner list is one sample.          |
| `y`             | `list[float]`       | *required* | Label vector (length N).                                        |
| `feature_names` | `list[str]`         | *required* | Names for each feature (length D).                              |
| `n_splits`      | `int`               | `5`        | Number of cross-validation folds.                               |
| `purge_gap`     | `int`               | `0`        | Number of samples to purge around each test fold.               |
| `seed`          | `int or None`       | `None`     | Random seed for reproducibility.                                |

#### Returns

List of `FeatureImportance` sorted by importance descending.

### sfi\_importance

**Single Feature Importance** (AFML Ch. 8.6). Trains the model on each feature *individually* and evaluates via cross-validation. The importance of a feature is its cross-validated score when used as the sole predictor.

```python theme={null}
results = sfi_importance(
    score_fn=my_scorer,
    X=feature_matrix,
    y=labels,
    feature_names=["momentum", "vol", "spread", "imbalance"],
    n_splits=5,
)
```

#### Parameters

| Parameter       | Type                | Default    | Description                                    |
| --------------- | ------------------- | ---------- | ---------------------------------------------- |
| `score_fn`      | `callable`          | *required* | `(X_train, y_train, X_test, y_test) -> float`. |
| `X`             | `list[list[float]]` | *required* | Feature matrix (N x D).                        |
| `y`             | `list[float]`       | *required* | Label vector (length N).                       |
| `feature_names` | `list[str]`         | *required* | Names for each feature (length D).             |
| `n_splits`      | `int`               | `5`        | Number of cross-validation folds.              |

#### Returns

List of `FeatureImportance` sorted by importance descending.

### clustered\_mda

**Clustered Feature Importance** (AFML Ch. 8.7). Groups features by correlation using agglomerative clustering (distance = 1 - |correlation|), then permutes entire clusters at once.

When one feature in a correlated group is shuffled, the model can compensate by using the remaining correlated features. Shuffling the entire cluster eliminates this substitution effect, giving a more accurate picture of the group's true importance.

```python theme={null}
results = clustered_mda(
    score_fn=my_scorer,
    X=feature_matrix,
    y=labels,
    feature_names=["momentum", "vol", "spread", "imbalance"],
    n_clusters=2,
    n_splits=5,
    seed=42,
)

# Each result.feature contains comma-separated names of features in the cluster
for fi in results:
    print(f"Cluster [{fi.feature}]: {fi.importance:.4f} +/- {fi.std:.4f}")
```

#### Parameters

| Parameter       | Type                | Default    | Description                                    |
| --------------- | ------------------- | ---------- | ---------------------------------------------- |
| `score_fn`      | `callable`          | *required* | `(X_train, y_train, X_test, y_test) -> float`. |
| `X`             | `list[list[float]]` | *required* | Feature matrix (N x D).                        |
| `y`             | `list[float]`       | *required* | Label vector (length N).                       |
| `feature_names` | `list[str]`         | *required* | Names for each feature (length D).             |
| `n_clusters`    | `int`               | `3`        | Number of feature clusters.                    |
| `n_splits`      | `int`               | `5`        | Number of cross-validation folds.              |
| `seed`          | `int or None`       | `None`     | Random seed for reproducibility.               |

#### Returns

List of `FeatureImportance`, one per *cluster*. The `feature` field contains comma-separated names of features in that cluster. Sorted by importance descending.

### FeatureImportance

| Field        | Type    | Description                                                          |
| ------------ | ------- | -------------------------------------------------------------------- |
| `feature`    | `str`   | Feature name (or comma-separated cluster names for `clustered_mda`). |
| `importance` | `float` | Mean importance score (higher = more important).                     |
| `std`        | `float` | Standard deviation of importance across folds.                       |

<Warning>
  MDA can understate the importance of correlated features. If your feature set has groups of highly correlated predictors (e.g., multiple momentum lookbacks), use `clustered_mda` instead.
</Warning>

***

## Alpha Decay Tracking

Monitor whether your trading edge is dying. The `AlphaDecayTracker` computes rolling IC (Spearman rank correlation between predictions and outcomes), estimates half-life via AR(1) fit, and detects negative trend via linear regression on the IC series.

```python theme={null}
from horizon.alpha_decay import AlphaDecayTracker, alpha_decay_pipeline, AlphaDecayReport
```

### AlphaDecayTracker

Stateful tracker that accumulates predictions and outcomes over time.

```python theme={null}
tracker = AlphaDecayTracker(window=100, alert_threshold=-0.05)

# Feed new observations
report = tracker.update(
    predictions=[0.6, 0.7, 0.55],
    outcomes=[1.0, 0.0, 1.0],
)

if report and report.is_decaying:
    print(f"Alpha decaying! IC={report.current_ic:.3f}, "
          f"half_life={report.half_life:.1f}")
```

#### Constructor Parameters

| Parameter         | Type    | Default | Description                                              |
| ----------------- | ------- | ------- | -------------------------------------------------------- |
| `window`          | `int`   | `100`   | Rolling window size for IC calculation.                  |
| `alert_threshold` | `float` | `-0.05` | IC trend slope below which alpha is considered decaying. |

#### update()

Add a new batch of predictions/outcomes and compute the report.

| Parameter     | Type            | Default    | Description                                                |
| ------------- | --------------- | ---------- | ---------------------------------------------------------- |
| `predictions` | `list[float]`   | *required* | Model prediction values (e.g., predicted probabilities).   |
| `outcomes`    | `list[float]`   | *required* | Realized outcome values (e.g., 1.0 for win, 0.0 for loss). |
| `timestamp`   | `float or None` | `None`     | Observation timestamp. Defaults to current time.           |

Returns `AlphaDecayReport` if enough data has accumulated (at least `window // 2` observations), `None` otherwise.

#### report()

Force compute the current alpha decay state. Returns `AlphaDecayReport` regardless of data size.

### AlphaDecayReport

| Field            | Type                        | Description                                                                                          |
| ---------------- | --------------------------- | ---------------------------------------------------------------------------------------------------- |
| `current_ic`     | `float`                     | Current Spearman rank correlation between predictions and outcomes.                                  |
| `rolling_ic`     | `list[tuple[float, float]]` | History of `(timestamp, IC)` observations.                                                           |
| `half_life`      | `float`                     | Estimated half-life in observations (AR(1) fit). `inf` if not mean-reverting.                        |
| `ic_trend`       | `float`                     | Linear regression slope of the IC series. Negative = decaying.                                       |
| `is_decaying`    | `bool`                      | `True` if `ic_trend < alert_threshold`.                                                              |
| `rolling_sharpe` | `list[tuple[float, float]]` | History of `(timestamp, Sharpe)` observations.                                                       |
| `sharpe_trend`   | `float`                     | Linear regression slope of the rolling Sharpe series.                                                |
| `time_to_zero`   | `float or None`             | Estimated observations until IC reaches 0 (linear extrapolation). `None` if IC is not trending down. |

### alpha\_decay\_pipeline

Pipeline function for `hz.run()`. Uses predictions from `ctx.params["predictions"]` and outcomes derived from fills to track alpha decay in real time.

```python theme={null}
hz.run(
    pipeline=[
        alpha_decay_pipeline(window=100, alert_threshold=-0.05),
        my_model,    # sets ctx.params["predictions"]
        my_quoter,
    ],
    ...
)
```

#### Parameters

| Parameter         | Type    | Default | Description                                              |
| ----------------- | ------- | ------- | -------------------------------------------------------- |
| `window`          | `int`   | `100`   | Rolling window for IC calculation.                       |
| `alert_threshold` | `float` | `-0.05` | IC trend slope below which alpha is considered decaying. |

#### Injected into ctx.params

| Key                 | Type    | Description                          |
| ------------------- | ------- | ------------------------------------ |
| `"alpha_ic"`        | `float` | Current information coefficient.     |
| `"alpha_half_life"` | `float` | Estimated half-life of IC.           |
| `"alpha_decaying"`  | `bool`  | Whether alpha is currently decaying. |

<Note>
  The pipeline logs a warning when `is_decaying` transitions from `False` to `True`, including the current IC, trend slope, and half-life.
</Note>

***

## PnL Attribution

Break down portfolio PnL by market, time period, and factor exposure to understand where returns come from.

```python theme={null}
from horizon.pnl_attribution import (
    attribute_pnl,
    attribute_by_time,
    attribute_by_factor,
    pnl_attribution_pipeline,
    AttributionReport,
    PnLBreakdown,
    TimeBreakdown,
    FactorBreakdown,
)
```

### attribute\_pnl

Extract positions from an engine and compute per-market PnL breakdown. Results are sorted by absolute PnL descending.

```python theme={null}
report = attribute_pnl(engine)

print(f"Total PnL: {report.total_pnl:+.2f} across {report.n_positions} positions")
for mkt in report.by_market:
    print(f"  {mkt.market_id}: {mkt.pnl:+.4f} ({mkt.contribution:.1%})")
```

#### Parameters

| Parameter | Type     | Default    | Description              |
| --------- | -------- | ---------- | ------------------------ |
| `engine`  | `Engine` | *required* | Horizon Engine instance. |

#### Returns

`AttributionReport` with `by_market` populated.

### attribute\_by\_time

Group fills by time period and compute PnL per period.

```python theme={null}
fills = engine.fills()
daily = attribute_by_time(fills, period="daily")

for day in daily:
    print(f"{day.period}: PnL={day.pnl:+.4f}, trades={day.n_trades}, "
          f"win_rate={day.win_rate:.1%}")
```

#### Parameters

| Parameter | Type         | Default    | Description                                                               |
| --------- | ------------ | ---------- | ------------------------------------------------------------------------- |
| `fills`   | `list[Fill]` | *required* | List of Fill objects (must have `timestamp`, `price`, `size` attributes). |
| `period`  | `str`        | `"daily"`  | Aggregation period: `"hourly"`, `"daily"`, or `"weekly"`.                 |

#### Returns

List of `TimeBreakdown` sorted chronologically.

### attribute\_by\_factor

Factor-based PnL attribution. Maps positions to factors via exposure weights and computes each factor's PnL contribution and R-squared.

```python theme={null}
positions = engine.positions()
factor_exposures = {
    "crypto": {"btc-100k": 0.8, "eth-5k": 0.9},
    "politics": {"trump-win": 1.0, "senate-flip": 0.7},
}

factors = attribute_by_factor(positions, factor_exposures)
for f in factors:
    print(f"{f.factor}: exposure={f.exposure:.2f}, "
          f"pnl={f.pnl_contribution:+.4f}, R2={f.r_squared:.3f}")
```

#### Parameters

| Parameter          | Type                          | Default    | Description                                               |
| ------------------ | ----------------------------- | ---------- | --------------------------------------------------------- |
| `positions`        | `list[Position]`              | *required* | List of Position objects.                                 |
| `factor_exposures` | `dict[str, dict[str, float]]` | *required* | Mapping of `factor_name -> {market_id: exposure_weight}`. |

#### Returns

List of `FactorBreakdown`, one per factor.

### pnl\_attribution\_pipeline

Pipeline function for `hz.run()` that adds attribution data each cycle.

```python theme={null}
hz.run(
    pipeline=[
        pnl_attribution_pipeline(engine),
        my_model,
        my_quoter,
    ],
    ...
)
```

#### Parameters

| Parameter | Type             | Default | Description                                                       |
| --------- | ---------------- | ------- | ----------------------------------------------------------------- |
| `engine`  | `Engine or None` | `None`  | Optional engine override. If `None`, uses `ctx.params["engine"]`. |

#### Injected into ctx.params

| Key                | Type                   | Description                       |
| ------------------ | ---------------------- | --------------------------------- |
| `"pnl_by_market"`  | `list[PnLBreakdown]`   | Per-market PnL breakdowns.        |
| `"pnl_top_winner"` | `PnLBreakdown or None` | Market with highest positive PnL. |
| `"pnl_top_loser"`  | `PnLBreakdown or None` | Market with most negative PnL.    |

### Type Reference

#### AttributionReport

| Field         | Type                    | Description                                                 |
| ------------- | ----------------------- | ----------------------------------------------------------- |
| `by_market`   | `list[PnLBreakdown]`    | Per-market breakdowns, sorted by absolute PnL descending.   |
| `by_time`     | `list[TimeBreakdown]`   | Per-period breakdowns (populated by `attribute_by_time`).   |
| `by_factor`   | `list[FactorBreakdown]` | Per-factor breakdowns (populated by `attribute_by_factor`). |
| `total_pnl`   | `float`                 | Sum of all position PnLs.                                   |
| `n_positions` | `int`                   | Number of positions in the engine.                          |

#### PnLBreakdown

| Field          | Type    | Description                                        |
| -------------- | ------- | -------------------------------------------------- |
| `market_id`    | `str`   | Market identifier.                                 |
| `pnl`          | `float` | Realized + unrealized PnL for this market.         |
| `pnl_pct`      | `float` | PnL as percentage of cost basis.                   |
| `contribution` | `float` | Fraction of total PnL attributable to this market. |

#### TimeBreakdown

| Field      | Type    | Description                                                       |
| ---------- | ------- | ----------------------------------------------------------------- |
| `period`   | `str`   | Time bucket label (e.g., `"2025-01-15"` or `"2025-01-15 14:00"`). |
| `pnl`      | `float` | Net PnL for this period.                                          |
| `n_trades` | `int`   | Number of fills in this period.                                   |
| `win_rate` | `float` | Fraction of profitable fills in this period.                      |

#### FactorBreakdown

| Field              | Type    | Description                                                              |
| ------------------ | ------- | ------------------------------------------------------------------------ |
| `factor`           | `str`   | Factor name.                                                             |
| `exposure`         | `float` | Average absolute exposure weight across markets in this factor.          |
| `pnl_contribution` | `float` | Weighted PnL contribution from this factor.                              |
| `r_squared`        | `float` | Fraction of total PnL variance explained by this factor (capped at 1.0). |

***

## Full Research Workflow

Combine all four modules for a complete alpha research loop:

```python theme={null}
import horizon as hz
from horizon.meta_label import compute_meta_labels
from horizon.feature_importance import mda_importance, clustered_mda
from horizon.alpha_decay import AlphaDecayTracker
from horizon.pnl_attribution import attribute_pnl, attribute_by_time

# 1. Meta-label your primary signals
labels = compute_meta_labels(
    prices=price_series,
    timestamps=ts_series,
    primary_signals=signals,
    pt_sl=(1.5, 1.0),      # Asymmetric: wider profit target
    max_holding=50,
)
hit_rate = sum(1 for lb in labels if lb.meta_label == 1) / len(labels)
print(f"Meta-label hit rate: {hit_rate:.1%}")

# 2. Rank your features
importances = mda_importance(
    score_fn=my_scorer,
    X=features,
    y=[lb.meta_label for lb in labels],
    feature_names=feature_names,
    purge_gap=5,
)
print("Top features:", [fi.feature for fi in importances[:3]])

# 3. Track alpha decay
tracker = AlphaDecayTracker(window=200)
report = tracker.update(predictions, outcomes)
if report and report.is_decaying:
    print(f"Edge decaying: half_life={report.half_life:.0f} obs")

# 4. Attribute PnL
attr = attribute_pnl(engine)
print(f"Top contributor: {attr.by_market[0].market_id} "
      f"({attr.by_market[0].contribution:.1%} of total)")
```
