Skip to main content
Pro Feature. Requires a Pro or Ultra subscription. Get started at api.mathematicalcompany.com

Fractional Differentiation

Horizon implements fractional differentiation from Chapter 5 of Marcos Lopez de Prado’s Advances in Financial Machine Learning. All functions run in Rust for maximum performance and are exposed to Python via PyO3.

FFD (Recommended)

Fixed-width window fractional differentiation. Constant lag count per output point. Suitable for modeling.

Expanding Window

Full-memory fractional differentiation. Preserves all history but uses variable lag counts.

ADF Test

Simplified Augmented Dickey-Fuller statistic for stationarity verification.

Minimum d Search

Automatically find the smallest differentiation order that achieves stationarity.

Why Fractional Differentiation?

Integer differencing is the standard tool for making time series stationary:
  • d = 0 (no differencing): preserves all memory but the series is non-stationary
  • d = 1 (first difference): achieves stationarity but destroys long-range memory
The problem is that d = 1 throws away information. In financial time series, memory (autocorrelation structure) is precisely what carries predictive signal. Fractional differentiation with d between 0 and 1 offers a middle ground: make the series stationary while preserving as much memory as possible.
The key insight from AFML Ch. 5: there exists a minimum d* (typically 0.2 to 0.6 for financial prices) that makes the series just barely stationary. Using d* instead of d = 1 preserves substantially more predictive signal for downstream ML models.

API

hz.frac_diff_weights

Compute the fractional differentiation weights for order d. These weights follow the recursion w_k = -w_(k-1) * (d - k + 1) / k, starting with w_0 = 1. Generation stops when |w_k| < threshold.
import horizon as hz

weights = hz.frac_diff_weights(d=0.5, threshold=1e-5)
print(f"Number of weights: {len(weights)}")
print(f"First 5: {weights[:5]}")
# w_0=1.0, w_1=-0.5, w_2=-0.125, ...
ParameterTypeDefaultDescription
dfloatrequiredDifferentiation order (typically 0 to 1)
thresholdfloat1e-5Minimum absolute weight to include
Returns list[float] of weights.

hz.frac_diff_ffd

Fixed-Width Window Fractional Differentiation (FFD): the recommended method from AFML Ch. 5.4. Computes weights via frac_diff_weights(d, threshold) and applies them as a convolution over the series. Every output point uses the same number of lags, making the resulting series suitable for modeling.
stationary = hz.frac_diff_ffd(prices, d=0.5, threshold=1e-5)
ParameterTypeDefaultDescription
serieslist[float]requiredInput price (or log-price) series
dfloatrequiredDifferentiation order (non-negative)
thresholdfloat1e-5Weight truncation threshold (positive)
Returns list[float] of length len(series) - len(weights) + 1. The output is shorter than the input because the first entries lack enough history for the full weight window.
If the series is too short relative to the number of weights generated by d and threshold, a ValueError is raised. Lower the threshold or provide a longer series.

hz.frac_diff_expanding

Expanding-window (full-memory) fractional differentiation. At each point t, uses all weights from lag 0 to lag t. This preserves the full information content of the original series but produces a non-stationary weight structure.
stationary = hz.frac_diff_expanding(prices, d=0.5)
ParameterTypeDescription
serieslist[float]Input price series (non-empty)
dfloatDifferentiation order (non-negative)
Returns list[float] of the same length as the input.
Expanding window is O(n^2) vs O(n * w_len) for FFD. Use FFD for production and expanding window for analysis where you need full-length output.

hz.adf_statistic

Simplified Augmented Dickey-Fuller test statistic (no augmenting lags). Fits the regression delta_y[t] = alpha + beta * y[t-1] + epsilon[t] and returns ADF stat = beta / SE(beta). More negative values indicate stronger stationarity evidence.
t_stat = hz.adf_statistic(prices)
print(f"ADF statistic: {t_stat:.4f}")

# Approximate critical values (n > 100):
#   1%:  -3.43
#   5%:  -2.862
#   10%: -2.567
if t_stat < -2.862:
    print("Stationary at 5% significance level")
ParameterTypeDescription
serieslist[float]Input series (at least 3 observations, all finite)
Returns float: the ADF test statistic.

hz.min_frac_diff

Find the minimum differentiation order d that makes the series stationary (AFML Ch. 5.5). Searches d from 0 to max_d in n_steps equal increments. For each d, applies frac_diff_ffd, then computes the ADF test statistic. Returns the smallest d whose ADF stat is below the 5% critical value (-2.862).
d_star, scan_results = hz.min_frac_diff(
    prices,
    p_threshold=0.05,       # reserved for future use
    max_d=1.0,              # upper bound on d
    n_steps=20,             # grid resolution
    weight_threshold=1e-5,  # FFD weight threshold
)
print(f"Minimum d for stationarity: {d_star:.3f}")

# scan_results is a list of (d, adf_stat) tuples
for d, adf in scan_results:
    marker = " <-- d*" if d == d_star else ""
    print(f"  d={d:.2f}  ADF={adf:.4f}{marker}")
ParameterTypeDefaultDescription
serieslist[float]requiredPrice series (at least 10 observations)
p_thresholdfloat0.05Reserved for future p-value based stopping
max_dfloat1.0Upper bound on d search range
n_stepsint20Number of grid points between 0 and max_d
weight_thresholdfloat1e-5Threshold for FFD weight truncation
Returns (float, list[(float, float)]): the optimal d and a list of (d, ADF statistic) scan results. If no d in the range achieves stationarity, optimal_d is set to max_d.

Workflow

The typical workflow for fractional differentiation:
import horizon as hz

# 1. Raw price series
prices = [...]  # e.g., daily close prices

# 2. Find minimum d for stationarity
d_star, scan = hz.min_frac_diff(
    prices,
    max_d=1.0,
    n_steps=20,
    weight_threshold=1e-5,
)
print(f"Optimal d: {d_star:.3f}")

# 3. Apply FFD with d*
stationary = hz.frac_diff_ffd(prices, d=d_star, threshold=1e-5)

# 4. Verify stationarity
adf = hz.adf_statistic(stationary)
print(f"ADF statistic: {adf:.4f}")
assert adf < -2.862, "Series not stationary at 5% level"

# 5. Use as ML feature
# The stationary series is shorter by len(weights) - 1
# Align with your target labels accordingly
print(f"Original length: {len(prices)}")
print(f"Stationary length: {len(stationary)}")
weights = hz.frac_diff_weights(d=d_star, threshold=1e-5)
print(f"Lost {len(weights) - 1} points to warm-up")

Comparing d Values

import horizon as hz

prices = [...]  # raw prices

# Scan across d values
for d in [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]:
    if d == 0.0:
        series = prices
    else:
        series = hz.frac_diff_ffd(prices, d=d, threshold=1e-4)

    if len(series) >= 3:
        adf = hz.adf_statistic(series)
        print(f"d={d:.1f}  length={len(series):5d}  ADF={adf:8.4f}")

Using with Information-Driven Bars

Combine fractional differentiation with information-driven bars for a complete AFML pipeline:
import horizon as hz

# 1. Build dollar bars from tick data
bars = hz.dollar_bars(timestamps, prices, volumes, threshold=50000.0)
bar_prices = [b.close for b in bars]

# 2. Fractionally differentiate the bar prices
d_star, _ = hz.min_frac_diff(bar_prices, max_d=1.0, n_steps=20)
stationary = hz.frac_diff_ffd(bar_prices, d=d_star)

# 3. Use stationary series for CUSUM event detection
events = hz.cusum_filter(stationary, threshold=0.02)

# 4. Label events using the original bar prices
# (align indices: stationary series is offset by len(weights) - 1)
weights = hz.frac_diff_weights(d=d_star, threshold=1e-5)
offset = len(weights) - 1
aligned_events = [e + offset for e in events if e + offset < len(bar_prices)]

labels = hz.triple_barrier_labels(
    prices=bar_prices,
    timestamps=[b.timestamp for b in bars],
    events=aligned_events,
    pt_sl=[1.0, 1.0],
    min_ret=0.005,
    max_holding=50,
    vol_span=20,
)

Mathematical Background

The fractional differentiation operator of order d is defined by the binomial series:(1 - B)^d = sum(w_k * B^k, k=0..inf)where B is the backshift operator and the weights follow:
  • w_0 = 1
  • w_k = -w_(k-1) * (d - k + 1) / k
For integer d = 1, this gives w = [1, -1, 0, 0, …] (standard first difference). For d = 0.5, the weights decay slowly: [1, -0.5, -0.125, -0.0625, …], preserving long-range memory.
The expanding window method applies all weights from lag 0 to lag t at each point t. This preserves the full information content but means early and late points use different numbers of lags, making the series non-stationary in its construction.The fixed-width window (FFD) method truncates weights below a threshold, fixing the window width. Every output point uses the same number of lags, producing a consistently constructed series. The trade-off is losing the first len(weights) - 1 observations.FFD is preferred for production use because:
  1. Consistent lag structure across all output points
  2. Faster computation: O(n * w_len) vs O(n^2)
  3. The truncated weights are negligibly small
The Augmented Dickey-Fuller test checks the null hypothesis that a series has a unit root (is non-stationary). The test fits:delta_y[t] = alpha + beta * y[t-1] + epsilon[t]The ADF statistic is beta / SE(beta). More negative values provide stronger evidence against the unit root hypothesis. The 5% critical value is approximately -2.862 for series with >100 observations.Horizon implements the simplified version without augmenting lags, which is sufficient for the min_frac_diff search where the goal is finding the stationarity threshold rather than precise p-values.