> ## Documentation Index
> Fetch the complete documentation index at: https://mathematicalcompany.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Bars & Labeling

> Information-driven bars, triple barrier labeling, CUSUM filter, and meta-labels from Advances in Financial Machine Learning.

<Note>
  **Pro Feature.** Requires a Pro or Ultra subscription. [Get started at api.mathematicalcompany.com](https://api.mathematicalcompany.com)
</Note>

# Bars & Labeling

Horizon implements the full information-driven bar sampling and triple barrier labeling pipeline from Marcos Lopez de Prado's *Advances in Financial Machine Learning* (Chapters 2-3). All functions run in Rust for maximum throughput on tick-level data.

<CardGroup cols={2}>
  <Card title="7 Bar Types" icon="bars-staggered">
    Tick, volume, dollar, tick/volume imbalance, and tick/volume run bars. All Rust-native.
  </Card>

  <Card title="Triple Barrier" icon="brackets-curly">
    Profit-taking, stop-loss, and vertical barriers with volatility-scaled thresholds.
  </Card>

  <Card title="CUSUM Filter" icon="filter">
    Symmetric CUSUM for event-driven sampling of structural breaks.
  </Card>

  <Card title="Meta-Labels" icon="tags">
    Binary 0/1 labels for bet sizing on top of a primary directional model.
  </Card>
</CardGroup>

***

## Information-Driven Bars

Standard time bars (1-minute, 5-minute) sample at fixed clock intervals, which over-samples quiet periods and under-samples volatile ones. Information-driven bars instead sample based on market activity, producing bars that carry roughly equal information content.

Horizon provides three standard bar types and four information-driven bar types:

<Tabs>
  <Tab title="Standard Bars">
    ### Tick Bars

    Sample a new bar every `threshold` ticks.

    ```python theme={null}
    import horizon as hz

    bars = hz.tick_bars(timestamps, prices, volumes, threshold=100)
    ```

    ### Volume Bars

    Sample a new bar when cumulative volume exceeds `threshold`.

    ```python theme={null}
    bars = hz.volume_bars(timestamps, prices, volumes, threshold=1000.0)
    ```

    ### Dollar Bars

    Sample a new bar when cumulative dollar volume (price x volume) exceeds `threshold`.

    ```python theme={null}
    bars = hz.dollar_bars(timestamps, prices, volumes, threshold=50000.0)
    ```
  </Tab>

  <Tab title="Imbalance Bars">
    ### Tick Imbalance Bars (TIBs)

    Close a bar when the absolute cumulative tick direction imbalance exceeds a dynamically estimated threshold (AFML Ch. 2.3.2).

    Each tick is classified as a buy (+1) or sell (-1) based on the sign of the price change. The cumulative imbalance theta\_t = sum(b\_i) is tracked, and a new bar is emitted when |theta\_t| >= threshold. The threshold adapts via exponentially weighted moving average of past bar lengths.

    ```python theme={null}
    bars = hz.tick_imbalance_bars(timestamps, prices, volumes, initial_estimate=50.0)
    ```

    ### Volume Imbalance Bars (VIBs)

    Same as tick imbalance bars, but weights each tick direction by its volume: theta\_t = sum(b\_i \* v\_i).

    ```python theme={null}
    bars = hz.volume_imbalance_bars(timestamps, prices, volumes, initial_estimate=500.0)
    ```
  </Tab>

  <Tab title="Run Bars">
    ### Tick Run Bars (TRBs)

    Count buy ticks and sell ticks separately within each bar. A new bar is emitted when max(buy\_run, sell\_run) exceeds a dynamically estimated threshold (AFML Ch. 2.3.3).

    ```python theme={null}
    bars = hz.tick_run_bars(timestamps, prices, volumes, initial_estimate=50.0)
    ```

    ### Volume Run Bars (VRBs)

    Same as tick run bars, but accumulates volume in buy/sell runs rather than counting ticks.

    ```python theme={null}
    bars = hz.volume_run_bars(timestamps, prices, volumes, initial_estimate=500.0)
    ```
  </Tab>
</Tabs>

All bar functions accept the same three input arrays:

| Parameter    | Type          | Description                                |
| ------------ | ------------- | ------------------------------------------ |
| `timestamps` | `list[float]` | Tick timestamps (e.g., Unix epoch seconds) |
| `prices`     | `list[float]` | Tick prices                                |
| `volumes`    | `list[float]` | Tick volumes                               |

All three arrays must have the same length.

### Bar Type

Every bar function returns `list[Bar]`. Each `Bar` object has the following fields:

| Field       | Type    | Description                            |
| ----------- | ------- | -------------------------------------- |
| `timestamp` | `float` | Timestamp of the first tick in the bar |
| `open`      | `float` | Opening price (first tick)             |
| `high`      | `float` | Highest price in the bar               |
| `low`       | `float` | Lowest price in the bar                |
| `close`     | `float` | Closing price (last tick)              |
| `volume`    | `float` | Total volume in the bar                |
| `vwap`      | `float` | Volume-weighted average price          |
| `n_ticks`   | `int`   | Number of ticks in the bar             |

<Note>
  If the last group of ticks does not fully meet the bar threshold, a partial bar is still emitted. This ensures no tick data is silently dropped.
</Note>

### Function Reference

| Function                   | Threshold Parameter       | Threshold Meaning                                            |
| -------------------------- | ------------------------- | ------------------------------------------------------------ |
| `hz.tick_bars`             | `threshold: int`          | Number of ticks per bar                                      |
| `hz.volume_bars`           | `threshold: float`        | Cumulative volume per bar                                    |
| `hz.dollar_bars`           | `threshold: float`        | Cumulative dollar volume per bar                             |
| `hz.tick_imbalance_bars`   | `initial_estimate: float` | Initial expected tick imbalance threshold (adapts via EWM)   |
| `hz.volume_imbalance_bars` | `initial_estimate: float` | Initial expected volume imbalance threshold (adapts via EWM) |
| `hz.tick_run_bars`         | `initial_estimate: float` | Initial expected max tick run length (adapts via EWM)        |
| `hz.volume_run_bars`       | `initial_estimate: float` | Initial expected max volume run (adapts via EWM)             |

### Example: Comparing Bar Types

```python theme={null}
import horizon as hz

# Load tick data (timestamps, prices, volumes as lists of float)
timestamps = [...]
prices = [...]
volumes = [...]

# Standard: fixed 100-tick bars
tick = hz.tick_bars(timestamps, prices, volumes, threshold=100)

# Information-driven: bars adapt to market activity
tib = hz.tick_imbalance_bars(timestamps, prices, volumes, initial_estimate=50.0)
trb = hz.tick_run_bars(timestamps, prices, volumes, initial_estimate=50.0)

print(f"Tick bars: {len(tick)}")
print(f"Tick imbalance bars: {len(tib)}")
print(f"Tick run bars: {len(trb)}")

# Inspect a bar
bar = tick[0]
print(f"O={bar.open}, H={bar.high}, L={bar.low}, C={bar.close}")
print(f"Volume={bar.volume}, VWAP={bar.vwap}, Ticks={bar.n_ticks}")
```

***

## Triple Barrier Labeling

The triple barrier method (AFML Ch. 3) labels each trading event with the outcome of three competing barriers:

1. **Profit-taking (PT)**: upper barrier, price rises by a volatility-scaled amount
2. **Stop-loss (SL)**: lower barrier, price falls by a volatility-scaled amount
3. **Vertical barrier (VB)**: maximum holding period expires

Whichever barrier is touched first determines the label: +1 (profit-taking), -1 (stop-loss), or 0 (vertical barrier / insufficient return).

### Step 1: Compute Daily Volatility

```python theme={null}
import horizon as hz

# Exponentially weighted daily volatility from log returns
vol = hz.get_daily_vol(prices, span=20)
```

Returns a list the same length as `prices`. The first element is always 0.0 (no return available from a single price).

| Parameter | Type          | Description                                    |
| --------- | ------------- | ---------------------------------------------- |
| `prices`  | `list[float]` | Raw price series (at least 2 elements)         |
| `span`    | `int`         | EWM lookback (e.g., 20 for \~20-day half-life) |

### Step 2: CUSUM Filter for Structural Breaks

The symmetric CUSUM filter (AFML Ch. 2.5.2.1) produces a structurally meaningful subsample of the time series by detecting significant price moves and filtering out noise.

```python theme={null}
events = hz.cusum_filter(prices, threshold=0.02)
# Returns: list of indices where structural breaks occurred
```

| Parameter   | Type          | Description                                         |
| ----------- | ------------- | --------------------------------------------------- |
| `prices`    | `list[float]` | Raw price series                                    |
| `threshold` | `float`       | Trigger threshold in price units (must be positive) |

The filter tracks cumulative positive and negative sums of price changes. When either sum exceeds the threshold, the index is recorded and both sums reset.

### Step 3: Apply Triple Barrier Labels

```python theme={null}
labels = hz.triple_barrier_labels(
    prices=prices,
    timestamps=timestamps,
    events=events,            # indices from cusum_filter
    pt_sl=[1.0, 1.0],        # [profit_taking_mult, stop_loss_mult]; 0.0 = disabled
    min_ret=0.005,            # minimum return to assign +1/-1 (below this -> 0)
    max_holding=100,          # vertical barrier in bars (0 = no vertical barrier)
    vol_span=20,              # lookback span for daily vol estimation
)
```

| Parameter     | Type             | Description                                                               |
| ------------- | ---------------- | ------------------------------------------------------------------------- |
| `prices`      | `list[float]`    | Raw price series (at least 2 elements)                                    |
| `timestamps`  | `list[float]`    | Timestamps for each price (same length as prices)                         |
| `events`      | `list[int]`      | Event indices from `cusum_filter` or user-provided                        |
| `pt_sl`       | `[float, float]` | `[profit_taking_multiplier, stop_loss_multiplier]`; set to 0.0 to disable |
| `min_ret`     | `float`          | Minimum absolute return to assign +1/-1 (below this, label = 0)           |
| `max_holding` | `int`            | Vertical barrier in bars forward from entry (0 = no vertical barrier)     |
| `vol_span`    | `int`            | Lookback span for EWM daily volatility estimation                         |

Barrier levels are computed in price space:

* Upper: `entry_price * (1 + daily_vol * pt_sl[0])`
* Lower: `entry_price * (1 - daily_vol * pt_sl[1])`

### Step 4: Meta-Labels (Bet Sizing)

Meta-labeling (AFML Ch. 3.6) determines whether a primary model's directional signals are correct. The primary model provides the direction (+1 long, -1 short), and the meta-label indicates if acting on that signal is profitable (1) or not (0).

```python theme={null}
meta = hz.meta_labels(
    prices=prices,
    timestamps=timestamps,
    primary_signals=[(10, 1), (50, -1), (120, 1)],  # (event_idx, side)
    pt_sl=[2.0, 1.0],        # asymmetric: 2x vol for PT, 1x vol for SL
    max_holding=50,
    vol_span=20,
)
```

| Parameter         | Type               | Description                                        |
| ----------------- | ------------------ | -------------------------------------------------- |
| `prices`          | `list[float]`      | Raw price series                                   |
| `timestamps`      | `list[float]`      | Timestamps (same length as prices)                 |
| `primary_signals` | `list[(int, int)]` | List of `(event_idx, side)` where side is +1 or -1 |
| `pt_sl`           | `[float, float]`   | `[profit_taking_mult, stop_loss_mult]`             |
| `max_holding`     | `int`              | Vertical barrier in bars (0 = no vertical barrier) |
| `vol_span`        | `int`              | Lookback span for daily vol estimation             |

<Tip>
  Meta-labeling separates the problem into two models: a primary model for direction and a secondary model for bet sizing. The secondary model learns which of the primary model's signals are worth acting on (label=1) vs. skipping (label=0). This is more robust than training a single model to do both.
</Tip>

### Step 5: Drop Rare Labels

Remove label classes that appear less than a minimum percentage of the total. This prevents training classifiers on heavily imbalanced datasets.

```python theme={null}
cleaned = hz.drop_labels(labels, min_pct=0.05)
```

| Parameter | Type                 | Description                                                      |
| --------- | -------------------- | ---------------------------------------------------------------- |
| `labels`  | `list[BarrierLabel]` | Labels from `triple_barrier_labels` or `meta_labels`             |
| `min_pct` | `float`              | Minimum fraction of total (e.g., 0.05 = 5%). Must be in \[0, 1]. |

### BarrierLabel Type

Every label function returns `list[BarrierLabel]`. Each object has:

| Field       | Type    | Description                                                                               |
| ----------- | ------- | ----------------------------------------------------------------------------------------- |
| `event_idx` | `int`   | Index in the original price series where the event started                                |
| `label`     | `int`   | -1 (stop-loss), 0 (vertical/below min\_ret), +1 (profit-taking). For meta-labels: 0 or 1. |
| `ret`       | `float` | Log return from entry to barrier touch                                                    |
| `barrier`   | `str`   | Which barrier was touched: `"pt"`, `"sl"`, or `"vb"`                                      |
| `touch_idx` | `int`   | Index in the price series where the barrier was touched                                   |

***

## Full Pipeline Example

```python theme={null}
import horizon as hz

# 1. Load tick data and build information-driven bars
timestamps = [...]  # tick timestamps
prices = [...]      # tick prices
volumes = [...]     # tick volumes

bars = hz.volume_bars(timestamps, prices, volumes, threshold=10000.0)
bar_prices = [b.close for b in bars]
bar_timestamps = [b.timestamp for b in bars]

# 2. Compute daily volatility
vol = hz.get_daily_vol(bar_prices, span=20)

# 3. CUSUM filter for structural breaks
events = hz.cusum_filter(bar_prices, threshold=0.02)
print(f"CUSUM detected {len(events)} events")

# 4. Triple barrier labeling
labels = hz.triple_barrier_labels(
    prices=bar_prices,
    timestamps=bar_timestamps,
    events=events,
    pt_sl=[1.0, 1.0],      # symmetric barriers at 1x daily vol
    min_ret=0.005,          # 50bps minimum return
    max_holding=100,        # 100-bar max holding period
    vol_span=20,
)

# 5. Inspect label distribution
from collections import Counter
dist = Counter(l.label for l in labels)
print(f"Label distribution: {dict(dist)}")

# 6. Drop rare labels (< 5%)
cleaned = hz.drop_labels(labels, min_pct=0.05)
print(f"After dropping rare labels: {len(cleaned)} / {len(labels)}")

# 7. Use labels for ML training
features = []
targets = []
for label in cleaned:
    idx = label.event_idx
    # Extract features at event time (your feature engineering here)
    features.append(bar_prices[idx])
    targets.append(label.label)
```

<Warning>
  The `events` list must contain valid indices into the `prices` array. Out-of-bounds indices will raise a `ValueError`. When chaining `cusum_filter` output into `triple_barrier_labels`, both must reference the same price series.
</Warning>

***

## Choosing Bar Types

<AccordionGroup>
  <Accordion title="When to use tick bars">
    Tick bars are the simplest non-time-based alternative. Each bar contains a fixed number of trades, so bars arrive faster during active periods and slower during quiet periods. Good baseline for comparison.
  </Accordion>

  <Accordion title="When to use volume/dollar bars">
    Volume bars normalize by trading volume, and dollar bars normalize by dollar volume. Dollar bars are preferred when price varies significantly over the sample period, as they keep the economic significance of each bar roughly constant.
  </Accordion>

  <Accordion title="When to use imbalance bars">
    Imbalance bars detect asymmetry in order flow. They produce more bars when one side (buy or sell) dominates, which often coincides with informed trading activity. Use these when order flow toxicity matters for your strategy.
  </Accordion>

  <Accordion title="When to use run bars">
    Run bars detect sustained sequences of same-direction ticks. They are sensitive to persistent buying or selling pressure and produce more bars when the market trends strongly in one direction.
  </Accordion>
</AccordionGroup>
