Skip to main content
Pro Feature. Requires a Pro or Ultra subscription. Get started at api.mathematicalcompany.com

Bars & Labeling

Horizon implements the full information-driven bar sampling and triple barrier labeling pipeline from Marcos Lopez de Prado’s Advances in Financial Machine Learning (Chapters 2-3). All functions run in Rust for maximum throughput on tick-level data.

7 Bar Types

Tick, volume, dollar, tick/volume imbalance, and tick/volume run bars. All Rust-native.

Triple Barrier

Profit-taking, stop-loss, and vertical barriers with volatility-scaled thresholds.

CUSUM Filter

Symmetric CUSUM for event-driven sampling of structural breaks.

Meta-Labels

Binary 0/1 labels for bet sizing on top of a primary directional model.

Information-Driven Bars

Standard time bars (1-minute, 5-minute) sample at fixed clock intervals, which over-samples quiet periods and under-samples volatile ones. Information-driven bars instead sample based on market activity, producing bars that carry roughly equal information content. Horizon provides three standard bar types and four information-driven bar types:

Tick Bars

Sample a new bar every threshold ticks.
import horizon as hz

bars = hz.tick_bars(timestamps, prices, volumes, threshold=100)

Volume Bars

Sample a new bar when cumulative volume exceeds threshold.
bars = hz.volume_bars(timestamps, prices, volumes, threshold=1000.0)

Dollar Bars

Sample a new bar when cumulative dollar volume (price x volume) exceeds threshold.
bars = hz.dollar_bars(timestamps, prices, volumes, threshold=50000.0)
All bar functions accept the same three input arrays:
ParameterTypeDescription
timestampslist[float]Tick timestamps (e.g., Unix epoch seconds)
priceslist[float]Tick prices
volumeslist[float]Tick volumes
All three arrays must have the same length.

Bar Type

Every bar function returns list[Bar]. Each Bar object has the following fields:
FieldTypeDescription
timestampfloatTimestamp of the first tick in the bar
openfloatOpening price (first tick)
highfloatHighest price in the bar
lowfloatLowest price in the bar
closefloatClosing price (last tick)
volumefloatTotal volume in the bar
vwapfloatVolume-weighted average price
n_ticksintNumber of ticks in the bar
If the last group of ticks does not fully meet the bar threshold, a partial bar is still emitted. This ensures no tick data is silently dropped.

Function Reference

FunctionThreshold ParameterThreshold Meaning
hz.tick_barsthreshold: intNumber of ticks per bar
hz.volume_barsthreshold: floatCumulative volume per bar
hz.dollar_barsthreshold: floatCumulative dollar volume per bar
hz.tick_imbalance_barsinitial_estimate: floatInitial expected tick imbalance threshold (adapts via EWM)
hz.volume_imbalance_barsinitial_estimate: floatInitial expected volume imbalance threshold (adapts via EWM)
hz.tick_run_barsinitial_estimate: floatInitial expected max tick run length (adapts via EWM)
hz.volume_run_barsinitial_estimate: floatInitial expected max volume run (adapts via EWM)

Example: Comparing Bar Types

import horizon as hz

# Load tick data (timestamps, prices, volumes as lists of float)
timestamps = [...]
prices = [...]
volumes = [...]

# Standard: fixed 100-tick bars
tick = hz.tick_bars(timestamps, prices, volumes, threshold=100)

# Information-driven: bars adapt to market activity
tib = hz.tick_imbalance_bars(timestamps, prices, volumes, initial_estimate=50.0)
trb = hz.tick_run_bars(timestamps, prices, volumes, initial_estimate=50.0)

print(f"Tick bars: {len(tick)}")
print(f"Tick imbalance bars: {len(tib)}")
print(f"Tick run bars: {len(trb)}")

# Inspect a bar
bar = tick[0]
print(f"O={bar.open}, H={bar.high}, L={bar.low}, C={bar.close}")
print(f"Volume={bar.volume}, VWAP={bar.vwap}, Ticks={bar.n_ticks}")

Triple Barrier Labeling

The triple barrier method (AFML Ch. 3) labels each trading event with the outcome of three competing barriers:
  1. Profit-taking (PT): upper barrier, price rises by a volatility-scaled amount
  2. Stop-loss (SL): lower barrier, price falls by a volatility-scaled amount
  3. Vertical barrier (VB): maximum holding period expires
Whichever barrier is touched first determines the label: +1 (profit-taking), -1 (stop-loss), or 0 (vertical barrier / insufficient return).

Step 1: Compute Daily Volatility

import horizon as hz

# Exponentially weighted daily volatility from log returns
vol = hz.get_daily_vol(prices, span=20)
Returns a list the same length as prices. The first element is always 0.0 (no return available from a single price).
ParameterTypeDescription
priceslist[float]Raw price series (at least 2 elements)
spanintEWM lookback (e.g., 20 for ~20-day half-life)

Step 2: CUSUM Filter for Structural Breaks

The symmetric CUSUM filter (AFML Ch. 2.5.2.1) produces a structurally meaningful subsample of the time series by detecting significant price moves and filtering out noise.
events = hz.cusum_filter(prices, threshold=0.02)
# Returns: list of indices where structural breaks occurred
ParameterTypeDescription
priceslist[float]Raw price series
thresholdfloatTrigger threshold in price units (must be positive)
The filter tracks cumulative positive and negative sums of price changes. When either sum exceeds the threshold, the index is recorded and both sums reset.

Step 3: Apply Triple Barrier Labels

labels = hz.triple_barrier_labels(
    prices=prices,
    timestamps=timestamps,
    events=events,            # indices from cusum_filter
    pt_sl=[1.0, 1.0],        # [profit_taking_mult, stop_loss_mult]; 0.0 = disabled
    min_ret=0.005,            # minimum return to assign +1/-1 (below this -> 0)
    max_holding=100,          # vertical barrier in bars (0 = no vertical barrier)
    vol_span=20,              # lookback span for daily vol estimation
)
ParameterTypeDescription
priceslist[float]Raw price series (at least 2 elements)
timestampslist[float]Timestamps for each price (same length as prices)
eventslist[int]Event indices from cusum_filter or user-provided
pt_sl[float, float][profit_taking_multiplier, stop_loss_multiplier]; set to 0.0 to disable
min_retfloatMinimum absolute return to assign +1/-1 (below this, label = 0)
max_holdingintVertical barrier in bars forward from entry (0 = no vertical barrier)
vol_spanintLookback span for EWM daily volatility estimation
Barrier levels are computed in price space:
  • Upper: entry_price * (1 + daily_vol * pt_sl[0])
  • Lower: entry_price * (1 - daily_vol * pt_sl[1])

Step 4: Meta-Labels (Bet Sizing)

Meta-labeling (AFML Ch. 3.6) determines whether a primary model’s directional signals are correct. The primary model provides the direction (+1 long, -1 short), and the meta-label indicates if acting on that signal is profitable (1) or not (0).
meta = hz.meta_labels(
    prices=prices,
    timestamps=timestamps,
    primary_signals=[(10, 1), (50, -1), (120, 1)],  # (event_idx, side)
    pt_sl=[2.0, 1.0],        # asymmetric: 2x vol for PT, 1x vol for SL
    max_holding=50,
    vol_span=20,
)
ParameterTypeDescription
priceslist[float]Raw price series
timestampslist[float]Timestamps (same length as prices)
primary_signalslist[(int, int)]List of (event_idx, side) where side is +1 or -1
pt_sl[float, float][profit_taking_mult, stop_loss_mult]
max_holdingintVertical barrier in bars (0 = no vertical barrier)
vol_spanintLookback span for daily vol estimation
Meta-labeling separates the problem into two models: a primary model for direction and a secondary model for bet sizing. The secondary model learns which of the primary model’s signals are worth acting on (label=1) vs. skipping (label=0). This is more robust than training a single model to do both.

Step 5: Drop Rare Labels

Remove label classes that appear less than a minimum percentage of the total. This prevents training classifiers on heavily imbalanced datasets.
cleaned = hz.drop_labels(labels, min_pct=0.05)
ParameterTypeDescription
labelslist[BarrierLabel]Labels from triple_barrier_labels or meta_labels
min_pctfloatMinimum fraction of total (e.g., 0.05 = 5%). Must be in [0, 1].

BarrierLabel Type

Every label function returns list[BarrierLabel]. Each object has:
FieldTypeDescription
event_idxintIndex in the original price series where the event started
labelint-1 (stop-loss), 0 (vertical/below min_ret), +1 (profit-taking). For meta-labels: 0 or 1.
retfloatLog return from entry to barrier touch
barrierstrWhich barrier was touched: "pt", "sl", or "vb"
touch_idxintIndex in the price series where the barrier was touched

Full Pipeline Example

import horizon as hz

# 1. Load tick data and build information-driven bars
timestamps = [...]  # tick timestamps
prices = [...]      # tick prices
volumes = [...]     # tick volumes

bars = hz.volume_bars(timestamps, prices, volumes, threshold=10000.0)
bar_prices = [b.close for b in bars]
bar_timestamps = [b.timestamp for b in bars]

# 2. Compute daily volatility
vol = hz.get_daily_vol(bar_prices, span=20)

# 3. CUSUM filter for structural breaks
events = hz.cusum_filter(bar_prices, threshold=0.02)
print(f"CUSUM detected {len(events)} events")

# 4. Triple barrier labeling
labels = hz.triple_barrier_labels(
    prices=bar_prices,
    timestamps=bar_timestamps,
    events=events,
    pt_sl=[1.0, 1.0],      # symmetric barriers at 1x daily vol
    min_ret=0.005,          # 50bps minimum return
    max_holding=100,        # 100-bar max holding period
    vol_span=20,
)

# 5. Inspect label distribution
from collections import Counter
dist = Counter(l.label for l in labels)
print(f"Label distribution: {dict(dist)}")

# 6. Drop rare labels (< 5%)
cleaned = hz.drop_labels(labels, min_pct=0.05)
print(f"After dropping rare labels: {len(cleaned)} / {len(labels)}")

# 7. Use labels for ML training
features = []
targets = []
for label in cleaned:
    idx = label.event_idx
    # Extract features at event time (your feature engineering here)
    features.append(bar_prices[idx])
    targets.append(label.label)
The events list must contain valid indices into the prices array. Out-of-bounds indices will raise a ValueError. When chaining cusum_filter output into triple_barrier_labels, both must reference the same price series.

Choosing Bar Types

Tick bars are the simplest non-time-based alternative. Each bar contains a fixed number of trades, so bars arrive faster during active periods and slower during quiet periods. Good baseline for comparison.
Volume bars normalize by trading volume, and dollar bars normalize by dollar volume. Dollar bars are preferred when price varies significantly over the sample period, as they keep the economic significance of each bar roughly constant.
Imbalance bars detect asymmetry in order flow. They produce more bars when one side (buy or sell) dominates, which often coincides with informed trading activity. Use these when order flow toxicity matters for your strategy.
Run bars detect sustained sequences of same-direction ticks. They are sensitive to persistent buying or selling pressure and produce more bars when the market trends strongly in one direction.