Pro Feature. Requires a Pro or Ultra subscription. Get started at api.mathematicalcompany.com

Bars & Labeling

Horizon implements the full information-driven bar sampling and triple barrier labeling pipeline from Marcos Lopez de Prado’s Advances in Financial Machine Learning (Chapters 2-3). All functions run in Rust for maximum throughput on tick-level data.

7 Bar Types

Tick, volume, dollar, tick/volume imbalance, and tick/volume run bars. All Rust-native.

Triple Barrier

Profit-taking, stop-loss, and vertical barriers with volatility-scaled thresholds.

CUSUM Filter

Symmetric CUSUM for event-driven sampling of structural breaks.

Meta-Labels

Binary 0/1 labels for bet sizing on top of a primary directional model.

Information-Driven Bars

Standard time bars (1-minute, 5-minute) sample at fixed clock intervals, which over-samples quiet periods and under-samples volatile ones. Information-driven bars instead sample based on market activity, producing bars that carry roughly equal information content. Horizon provides three standard bar types and four information-driven bar types:

Standard Bars
Imbalance Bars
Run Bars

Tick Bars

Sample a new bar every threshold ticks.

import horizon as hz

bars = hz.tick_bars(timestamps, prices, volumes, threshold=100)

Volume Bars

Sample a new bar when cumulative volume exceeds threshold.

bars = hz.volume_bars(timestamps, prices, volumes, threshold=1000.0)

Dollar Bars

Sample a new bar when cumulative dollar volume (price x volume) exceeds threshold.

bars = hz.dollar_bars(timestamps, prices, volumes, threshold=50000.0)

Tick Imbalance Bars (TIBs)

Close a bar when the absolute cumulative tick direction imbalance exceeds a dynamically estimated threshold (AFML Ch. 2.3.2).Each tick is classified as a buy (+1) or sell (-1) based on the sign of the price change. The cumulative imbalance theta_t = sum(b_i) is tracked, and a new bar is emitted when |theta_t| >= threshold. The threshold adapts via exponentially weighted moving average of past bar lengths.

bars = hz.tick_imbalance_bars(timestamps, prices, volumes, initial_estimate=50.0)

Volume Imbalance Bars (VIBs)

Same as tick imbalance bars, but weights each tick direction by its volume: theta_t = sum(b_i * v_i).

bars = hz.volume_imbalance_bars(timestamps, prices, volumes, initial_estimate=500.0)

Tick Run Bars (TRBs)

Count buy ticks and sell ticks separately within each bar. A new bar is emitted when max(buy_run, sell_run) exceeds a dynamically estimated threshold (AFML Ch. 2.3.3).

bars = hz.tick_run_bars(timestamps, prices, volumes, initial_estimate=50.0)

Volume Run Bars (VRBs)

Same as tick run bars, but accumulates volume in buy/sell runs rather than counting ticks.

bars = hz.volume_run_bars(timestamps, prices, volumes, initial_estimate=500.0)

All bar functions accept the same three input arrays:

Parameter	Type	Description
`timestamps`	`list[float]`	Tick timestamps (e.g., Unix epoch seconds)
`prices`	`list[float]`	Tick prices
`volumes`	`list[float]`	Tick volumes

All three arrays must have the same length.

Bar Type

Every bar function returns list[Bar]. Each Bar object has the following fields:

Field	Type	Description
`timestamp`	`float`	Timestamp of the first tick in the bar
`open`	`float`	Opening price (first tick)
`high`	`float`	Highest price in the bar
`low`	`float`	Lowest price in the bar
`close`	`float`	Closing price (last tick)
`volume`	`float`	Total volume in the bar
`vwap`	`float`	Volume-weighted average price
`n_ticks`	`int`	Number of ticks in the bar

If the last group of ticks does not fully meet the bar threshold, a partial bar is still emitted. This ensures no tick data is silently dropped.

Function Reference

Function	Threshold Parameter	Threshold Meaning
`hz.tick_bars`	`threshold: int`	Number of ticks per bar
`hz.volume_bars`	`threshold: float`	Cumulative volume per bar
`hz.dollar_bars`	`threshold: float`	Cumulative dollar volume per bar
`hz.tick_imbalance_bars`	`initial_estimate: float`	Initial expected tick imbalance threshold (adapts via EWM)
`hz.volume_imbalance_bars`	`initial_estimate: float`	Initial expected volume imbalance threshold (adapts via EWM)
`hz.tick_run_bars`	`initial_estimate: float`	Initial expected max tick run length (adapts via EWM)
`hz.volume_run_bars`	`initial_estimate: float`	Initial expected max volume run (adapts via EWM)

Example: Comparing Bar Types

import horizon as hz

# Load tick data (timestamps, prices, volumes as lists of float)
timestamps = [...]
prices = [...]
volumes = [...]

# Standard: fixed 100-tick bars
tick = hz.tick_bars(timestamps, prices, volumes, threshold=100)

# Information-driven: bars adapt to market activity
tib = hz.tick_imbalance_bars(timestamps, prices, volumes, initial_estimate=50.0)
trb = hz.tick_run_bars(timestamps, prices, volumes, initial_estimate=50.0)

print(f"Tick bars: {len(tick)}")
print(f"Tick imbalance bars: {len(tib)}")
print(f"Tick run bars: {len(trb)}")

# Inspect a bar
bar = tick[0]
print(f"O={bar.open}, H={bar.high}, L={bar.low}, C={bar.close}")
print(f"Volume={bar.volume}, VWAP={bar.vwap}, Ticks={bar.n_ticks}")

Triple Barrier Labeling

The triple barrier method (AFML Ch. 3) labels each trading event with the outcome of three competing barriers:

Profit-taking (PT): upper barrier, price rises by a volatility-scaled amount
Stop-loss (SL): lower barrier, price falls by a volatility-scaled amount
Vertical barrier (VB): maximum holding period expires

Whichever barrier is touched first determines the label: +1 (profit-taking), -1 (stop-loss), or 0 (vertical barrier / insufficient return).

Step 1: Compute Daily Volatility

import horizon as hz

# Exponentially weighted daily volatility from log returns
vol = hz.get_daily_vol(prices, span=20)

Returns a list the same length as prices. The first element is always 0.0 (no return available from a single price).

Parameter	Type	Description
`prices`	`list[float]`	Raw price series (at least 2 elements)
`span`	`int`	EWM lookback (e.g., 20 for ~20-day half-life)

Step 2: CUSUM Filter for Structural Breaks

The symmetric CUSUM filter (AFML Ch. 2.5.2.1) produces a structurally meaningful subsample of the time series by detecting significant price moves and filtering out noise.

events = hz.cusum_filter(prices, threshold=0.02)
# Returns: list of indices where structural breaks occurred

Parameter	Type	Description
`prices`	`list[float]`	Raw price series
`threshold`	`float`	Trigger threshold in price units (must be positive)

The filter tracks cumulative positive and negative sums of price changes. When either sum exceeds the threshold, the index is recorded and both sums reset.

Step 3: Apply Triple Barrier Labels

labels = hz.triple_barrier_labels(
    prices=prices,
    timestamps=timestamps,
    events=events,            # indices from cusum_filter
    pt_sl=[1.0, 1.0],        # [profit_taking_mult, stop_loss_mult]; 0.0 = disabled
    min_ret=0.005,            # minimum return to assign +1/-1 (below this -> 0)
    max_holding=100,          # vertical barrier in bars (0 = no vertical barrier)
    vol_span=20,              # lookback span for daily vol estimation
)

Parameter	Type	Description
`prices`	`list[float]`	Raw price series (at least 2 elements)
`timestamps`	`list[float]`	Timestamps for each price (same length as prices)
`events`	`list[int]`	Event indices from `cusum_filter` or user-provided
`pt_sl`	`[float, float]`	`[profit_taking_multiplier, stop_loss_multiplier]`; set to 0.0 to disable
`min_ret`	`float`	Minimum absolute return to assign +1/-1 (below this, label = 0)
`max_holding`	`int`	Vertical barrier in bars forward from entry (0 = no vertical barrier)
`vol_span`	`int`	Lookback span for EWM daily volatility estimation

Barrier levels are computed in price space:

Upper: entry_price * (1 + daily_vol * pt_sl[0])
Lower: entry_price * (1 - daily_vol * pt_sl[1])

Step 4: Meta-Labels (Bet Sizing)

Meta-labeling (AFML Ch. 3.6) determines whether a primary model’s directional signals are correct. The primary model provides the direction (+1 long, -1 short), and the meta-label indicates if acting on that signal is profitable (1) or not (0).

meta = hz.meta_labels(
    prices=prices,
    timestamps=timestamps,
    primary_signals=[(10, 1), (50, -1), (120, 1)],  # (event_idx, side)
    pt_sl=[2.0, 1.0],        # asymmetric: 2x vol for PT, 1x vol for SL
    max_holding=50,
    vol_span=20,
)

Parameter	Type	Description
`prices`	`list[float]`	Raw price series
`timestamps`	`list[float]`	Timestamps (same length as prices)
`primary_signals`	`list[(int, int)]`	List of `(event_idx, side)` where side is +1 or -1
`pt_sl`	`[float, float]`	`[profit_taking_mult, stop_loss_mult]`
`max_holding`	`int`	Vertical barrier in bars (0 = no vertical barrier)
`vol_span`	`int`	Lookback span for daily vol estimation

Meta-labeling separates the problem into two models: a primary model for direction and a secondary model for bet sizing. The secondary model learns which of the primary model’s signals are worth acting on (label=1) vs. skipping (label=0). This is more robust than training a single model to do both.

Step 5: Drop Rare Labels

Remove label classes that appear less than a minimum percentage of the total. This prevents training classifiers on heavily imbalanced datasets.

cleaned = hz.drop_labels(labels, min_pct=0.05)

Parameter	Type	Description
`labels`	`list[BarrierLabel]`	Labels from `triple_barrier_labels` or `meta_labels`
`min_pct`	`float`	Minimum fraction of total (e.g., 0.05 = 5%). Must be in [0, 1].

BarrierLabel Type

Every label function returns list[BarrierLabel]. Each object has:

Field	Type	Description
`event_idx`	`int`	Index in the original price series where the event started
`label`	`int`	-1 (stop-loss), 0 (vertical/below min_ret), +1 (profit-taking). For meta-labels: 0 or 1.
`ret`	`float`	Log return from entry to barrier touch
`barrier`	`str`	Which barrier was touched: `"pt"`, `"sl"`, or `"vb"`
`touch_idx`	`int`	Index in the price series where the barrier was touched

Full Pipeline Example

import horizon as hz

# 1. Load tick data and build information-driven bars
timestamps = [...]  # tick timestamps
prices = [...]      # tick prices
volumes = [...]     # tick volumes

bars = hz.volume_bars(timestamps, prices, volumes, threshold=10000.0)
bar_prices = [b.close for b in bars]
bar_timestamps = [b.timestamp for b in bars]

# 2. Compute daily volatility
vol = hz.get_daily_vol(bar_prices, span=20)

# 3. CUSUM filter for structural breaks
events = hz.cusum_filter(bar_prices, threshold=0.02)
print(f"CUSUM detected {len(events)} events")

# 4. Triple barrier labeling
labels = hz.triple_barrier_labels(
    prices=bar_prices,
    timestamps=bar_timestamps,
    events=events,
    pt_sl=[1.0, 1.0],      # symmetric barriers at 1x daily vol
    min_ret=0.005,          # 50bps minimum return
    max_holding=100,        # 100-bar max holding period
    vol_span=20,
)

# 5. Inspect label distribution
from collections import Counter
dist = Counter(l.label for l in labels)
print(f"Label distribution: {dict(dist)}")

# 6. Drop rare labels (< 5%)
cleaned = hz.drop_labels(labels, min_pct=0.05)
print(f"After dropping rare labels: {len(cleaned)} / {len(labels)}")

# 7. Use labels for ML training
features = []
targets = []
for label in cleaned:
    idx = label.event_idx
    # Extract features at event time (your feature engineering here)
    features.append(bar_prices[idx])
    targets.append(label.label)

The events list must contain valid indices into the prices array. Out-of-bounds indices will raise a ValueError. When chaining cusum_filter output into triple_barrier_labels, both must reference the same price series.

Choosing Bar Types

When to use tick bars

Tick bars are the simplest non-time-based alternative. Each bar contains a fixed number of trades, so bars arrive faster during active periods and slower during quiet periods. Good baseline for comparison.

When to use volume/dollar bars

Volume bars normalize by trading volume, and dollar bars normalize by dollar volume. Dollar bars are preferred when price varies significantly over the sample period, as they keep the economic significance of each bar roughly constant.

When to use imbalance bars

Imbalance bars detect asymmetry in order flow. They produce more bars when one side (buy or sell) dominates, which often coincides with informed trading activity. Use these when order flow toxicity matters for your strategy.

When to use run bars

Run bars detect sustained sequences of same-direction ticks. They are sensitive to persistent buying or selling pressure and produce more bars when the market trends strongly in one direction.

​Bars & Labeling

7 Bar Types

Triple Barrier

CUSUM Filter

Meta-Labels

​Information-Driven Bars

​Tick Bars

​Volume Bars

​Dollar Bars

​Tick Imbalance Bars (TIBs)

​Volume Imbalance Bars (VIBs)

​Tick Run Bars (TRBs)

​Volume Run Bars (VRBs)

​Bar Type

​Function Reference

​Example: Comparing Bar Types

​Triple Barrier Labeling

​Step 1: Compute Daily Volatility

​Step 2: CUSUM Filter for Structural Breaks

​Step 3: Apply Triple Barrier Labels

​Step 4: Meta-Labels (Bet Sizing)

​Step 5: Drop Rare Labels

​BarrierLabel Type

​Full Pipeline Example

​Choosing Bar Types

Bars & Labeling

Information-Driven Bars

Tick Bars

Volume Bars

Dollar Bars

Tick Imbalance Bars (TIBs)

Volume Imbalance Bars (VIBs)

Tick Run Bars (TRBs)

Volume Run Bars (VRBs)

Bar Type

Function Reference

Example: Comparing Bar Types

Triple Barrier Labeling

Step 1: Compute Daily Volatility

Step 2: CUSUM Filter for Structural Breaks

Step 3: Apply Triple Barrier Labels

Step 4: Meta-Labels (Bet Sizing)

Step 5: Drop Rare Labels

BarrierLabel Type

Full Pipeline Example

Choosing Bar Types