> ## Documentation Index > Fetch the complete documentation index at: https://mathematicalcompany.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. # Fractional Differentiation > Make price series stationary while preserving memory, from Advances in Financial Machine Learning Ch. 5. **Pro Feature.** Requires a Pro or Ultra subscription. [Get started at api.mathematicalcompany.com](https://api.mathematicalcompany.com) # Fractional Differentiation Horizon implements fractional differentiation from Chapter 5 of Marcos Lopez de Prado's *Advances in Financial Machine Learning*. All functions run in Rust for maximum performance and are exposed to Python via PyO3. Fixed-width window fractional differentiation. Constant lag count per output point. Suitable for modeling. Full-memory fractional differentiation. Preserves all history but uses variable lag counts. Simplified Augmented Dickey-Fuller statistic for stationarity verification. Automatically find the smallest differentiation order that achieves stationarity. *** ## Why Fractional Differentiation? Integer differencing is the standard tool for making time series stationary: * **d = 0** (no differencing): preserves all memory but the series is non-stationary * **d = 1** (first difference): achieves stationarity but destroys long-range memory The problem is that d = 1 throws away information. In financial time series, memory (autocorrelation structure) is precisely what carries predictive signal. Fractional differentiation with d between 0 and 1 offers a middle ground: make the series stationary while preserving as much memory as possible. The key insight from AFML Ch. 5: there exists a minimum d\* (typically 0.2 to 0.6 for financial prices) that makes the series just barely stationary. Using d\* instead of d = 1 preserves substantially more predictive signal for downstream ML models. *** ## API ### hz.frac\_diff\_weights Compute the fractional differentiation weights for order `d`. These weights follow the recursion `w_k = -w_(k-1) * (d - k + 1) / k`, starting with `w_0 = 1`. Generation stops when `|w_k| < threshold`. ```python theme={null} import horizon as hz weights = hz.frac_diff_weights(d=0.5, threshold=1e-5) print(f"Number of weights: {len(weights)}") print(f"First 5: {weights[:5]}") # w_0=1.0, w_1=-0.5, w_2=-0.125, ... ``` | Parameter | Type | Default | Description | | ----------- | ------- | -------- | ---------------------------------------- | | `d` | `float` | required | Differentiation order (typically 0 to 1) | | `threshold` | `float` | `1e-5` | Minimum absolute weight to include | Returns `list[float]` of weights. ### hz.frac\_diff\_ffd Fixed-Width Window Fractional Differentiation (FFD): the recommended method from AFML Ch. 5.4. Computes weights via `frac_diff_weights(d, threshold)` and applies them as a convolution over the series. Every output point uses the same number of lags, making the resulting series suitable for modeling. ```python theme={null} stationary = hz.frac_diff_ffd(prices, d=0.5, threshold=1e-5) ``` | Parameter | Type | Default | Description | | ----------- | ------------- | -------- | -------------------------------------- | | `series` | `list[float]` | required | Input price (or log-price) series | | `d` | `float` | required | Differentiation order (non-negative) | | `threshold` | `float` | `1e-5` | Weight truncation threshold (positive) | Returns `list[float]` of length `len(series) - len(weights) + 1`. The output is shorter than the input because the first entries lack enough history for the full weight window. If the series is too short relative to the number of weights generated by `d` and `threshold`, a `ValueError` is raised. Lower the threshold or provide a longer series. ### hz.frac\_diff\_expanding Expanding-window (full-memory) fractional differentiation. At each point t, uses all weights from lag 0 to lag t. This preserves the full information content of the original series but produces a non-stationary weight structure. ```python theme={null} stationary = hz.frac_diff_expanding(prices, d=0.5) ``` | Parameter | Type | Description | | --------- | ------------- | ------------------------------------ | | `series` | `list[float]` | Input price series (non-empty) | | `d` | `float` | Differentiation order (non-negative) | Returns `list[float]` of the same length as the input. Expanding window is O(n^2) vs O(n \* w\_len) for FFD. Use FFD for production and expanding window for analysis where you need full-length output. ### hz.adf\_statistic Simplified Augmented Dickey-Fuller test statistic (no augmenting lags). Fits the regression delta\_y\[t] = alpha + beta \* y\[t-1] + epsilon\[t] and returns ADF stat = beta / SE(beta). More negative values indicate stronger stationarity evidence. ```python theme={null} t_stat = hz.adf_statistic(prices) print(f"ADF statistic: {t_stat:.4f}") # Approximate critical values (n > 100): # 1%: -3.43 # 5%: -2.862 # 10%: -2.567 if t_stat < -2.862: print("Stationary at 5% significance level") ``` | Parameter | Type | Description | | --------- | ------------- | -------------------------------------------------- | | `series` | `list[float]` | Input series (at least 3 observations, all finite) | Returns `float`: the ADF test statistic. ### hz.min\_frac\_diff Find the minimum differentiation order `d` that makes the series stationary (AFML Ch. 5.5). Searches d from 0 to `max_d` in `n_steps` equal increments. For each d, applies `frac_diff_ffd`, then computes the ADF test statistic. Returns the smallest d whose ADF stat is below the 5% critical value (-2.862). ```python theme={null} d_star, scan_results = hz.min_frac_diff( prices, p_threshold=0.05, # reserved for future use max_d=1.0, # upper bound on d n_steps=20, # grid resolution weight_threshold=1e-5, # FFD weight threshold ) print(f"Minimum d for stationarity: {d_star:.3f}") # scan_results is a list of (d, adf_stat) tuples for d, adf in scan_results: marker = " <-- d*" if d == d_star else "" print(f" d={d:.2f} ADF={adf:.4f}{marker}") ``` | Parameter | Type | Default | Description | | ------------------ | ------------- | -------- | ------------------------------------------ | | `series` | `list[float]` | required | Price series (at least 10 observations) | | `p_threshold` | `float` | `0.05` | Reserved for future p-value based stopping | | `max_d` | `float` | `1.0` | Upper bound on d search range | | `n_steps` | `int` | `20` | Number of grid points between 0 and max\_d | | `weight_threshold` | `float` | `1e-5` | Threshold for FFD weight truncation | Returns `(float, list[(float, float)])`: the optimal d and a list of (d, ADF statistic) scan results. If no d in the range achieves stationarity, optimal\_d is set to `max_d`. *** ## Workflow The typical workflow for fractional differentiation: ```python theme={null} import horizon as hz # 1. Raw price series prices = [...] # e.g., daily close prices # 2. Find minimum d for stationarity d_star, scan = hz.min_frac_diff( prices, max_d=1.0, n_steps=20, weight_threshold=1e-5, ) print(f"Optimal d: {d_star:.3f}") # 3. Apply FFD with d* stationary = hz.frac_diff_ffd(prices, d=d_star, threshold=1e-5) # 4. Verify stationarity adf = hz.adf_statistic(stationary) print(f"ADF statistic: {adf:.4f}") assert adf < -2.862, "Series not stationary at 5% level" # 5. Use as ML feature # The stationary series is shorter by len(weights) - 1 # Align with your target labels accordingly print(f"Original length: {len(prices)}") print(f"Stationary length: {len(stationary)}") weights = hz.frac_diff_weights(d=d_star, threshold=1e-5) print(f"Lost {len(weights) - 1} points to warm-up") ``` ### Comparing d Values ```python theme={null} import horizon as hz prices = [...] # raw prices # Scan across d values for d in [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]: if d == 0.0: series = prices else: series = hz.frac_diff_ffd(prices, d=d, threshold=1e-4) if len(series) >= 3: adf = hz.adf_statistic(series) print(f"d={d:.1f} length={len(series):5d} ADF={adf:8.4f}") ``` ### Using with Information-Driven Bars Combine fractional differentiation with information-driven bars for a complete AFML pipeline: ```python theme={null} import horizon as hz # 1. Build dollar bars from tick data bars = hz.dollar_bars(timestamps, prices, volumes, threshold=50000.0) bar_prices = [b.close for b in bars] # 2. Fractionally differentiate the bar prices d_star, _ = hz.min_frac_diff(bar_prices, max_d=1.0, n_steps=20) stationary = hz.frac_diff_ffd(bar_prices, d=d_star) # 3. Use stationary series for CUSUM event detection events = hz.cusum_filter(stationary, threshold=0.02) # 4. Label events using the original bar prices # (align indices: stationary series is offset by len(weights) - 1) weights = hz.frac_diff_weights(d=d_star, threshold=1e-5) offset = len(weights) - 1 aligned_events = [e + offset for e in events if e + offset < len(bar_prices)] labels = hz.triple_barrier_labels( prices=bar_prices, timestamps=[b.timestamp for b in bars], events=aligned_events, pt_sl=[1.0, 1.0], min_ret=0.005, max_holding=50, vol_span=20, ) ``` *** ## Mathematical Background The fractional differentiation operator of order d is defined by the binomial series: `(1 - B)^d = sum(w_k * B^k, k=0..inf)` where B is the backshift operator and the weights follow: * `w_0 = 1` * `w_k = -w_(k-1) * (d - k + 1) / k` For integer d = 1, this gives w = \[1, -1, 0, 0, ...] (standard first difference). For d = 0.5, the weights decay slowly: \[1, -0.5, -0.125, -0.0625, ...], preserving long-range memory. The **expanding window** method applies all weights from lag 0 to lag t at each point t. This preserves the full information content but means early and late points use different numbers of lags, making the series non-stationary in its construction. The **fixed-width window (FFD)** method truncates weights below a threshold, fixing the window width. Every output point uses the same number of lags, producing a consistently constructed series. The trade-off is losing the first `len(weights) - 1` observations. FFD is preferred for production use because: 1. Consistent lag structure across all output points 2. Faster computation: O(n \* w\_len) vs O(n^2) 3. The truncated weights are negligibly small The Augmented Dickey-Fuller test checks the null hypothesis that a series has a unit root (is non-stationary). The test fits: delta\_y\[t] = alpha + beta \* y\[t-1] + epsilon\[t] The ADF statistic is beta / SE(beta). More negative values provide stronger evidence against the unit root hypothesis. The 5% critical value is approximately -2.862 for series with >100 observations. Horizon implements the simplified version without augmenting lags, which is sufficient for the `min_frac_diff` search where the goal is finding the stationarity threshold rather than precise p-values.