> ## Documentation Index
> Fetch the complete documentation index at: https://mathematicalcompany.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Fractional Differentiation

> Make price series stationary while preserving memory, from Advances in Financial Machine Learning Ch. 5.

<Note>
  **Pro Feature.** Requires a Pro or Ultra subscription. [Get started at api.mathematicalcompany.com](https://api.mathematicalcompany.com)
</Note>

# Fractional Differentiation

Horizon implements fractional differentiation from Chapter 5 of Marcos Lopez de Prado's *Advances in Financial Machine Learning*. All functions run in Rust for maximum performance and are exposed to Python via PyO3.

<CardGroup cols={2}>
  <Card title="FFD (Recommended)" icon="window-maximize">
    Fixed-width window fractional differentiation. Constant lag count per output point. Suitable for modeling.
  </Card>

  <Card title="Expanding Window" icon="arrows-left-right-to-line">
    Full-memory fractional differentiation. Preserves all history but uses variable lag counts.
  </Card>

  <Card title="ADF Test" icon="chart-line">
    Simplified Augmented Dickey-Fuller statistic for stationarity verification.
  </Card>

  <Card title="Minimum d Search" icon="magnifying-glass">
    Automatically find the smallest differentiation order that achieves stationarity.
  </Card>
</CardGroup>

***

## Why Fractional Differentiation?

Integer differencing is the standard tool for making time series stationary:

* **d = 0** (no differencing): preserves all memory but the series is non-stationary
* **d = 1** (first difference): achieves stationarity but destroys long-range memory

The problem is that d = 1 throws away information. In financial time series, memory (autocorrelation structure) is precisely what carries predictive signal. Fractional differentiation with d between 0 and 1 offers a middle ground: make the series stationary while preserving as much memory as possible.

<Note>
  The key insight from AFML Ch. 5: there exists a minimum d\* (typically 0.2 to 0.6 for financial prices) that makes the series just barely stationary. Using d\* instead of d = 1 preserves substantially more predictive signal for downstream ML models.
</Note>

***

## API

### hz.frac\_diff\_weights

Compute the fractional differentiation weights for order `d`. These weights follow the recursion `w_k = -w_(k-1) * (d - k + 1) / k`, starting with `w_0 = 1`. Generation stops when `|w_k| < threshold`.

```python theme={null}
import horizon as hz

weights = hz.frac_diff_weights(d=0.5, threshold=1e-5)
print(f"Number of weights: {len(weights)}")
print(f"First 5: {weights[:5]}")
# w_0=1.0, w_1=-0.5, w_2=-0.125, ...
```

| Parameter   | Type    | Default  | Description                              |
| ----------- | ------- | -------- | ---------------------------------------- |
| `d`         | `float` | required | Differentiation order (typically 0 to 1) |
| `threshold` | `float` | `1e-5`   | Minimum absolute weight to include       |

Returns `list[float]` of weights.

### hz.frac\_diff\_ffd

Fixed-Width Window Fractional Differentiation (FFD): the recommended method from AFML Ch. 5.4.

Computes weights via `frac_diff_weights(d, threshold)` and applies them as a convolution over the series. Every output point uses the same number of lags, making the resulting series suitable for modeling.

```python theme={null}
stationary = hz.frac_diff_ffd(prices, d=0.5, threshold=1e-5)
```

| Parameter   | Type          | Default  | Description                            |
| ----------- | ------------- | -------- | -------------------------------------- |
| `series`    | `list[float]` | required | Input price (or log-price) series      |
| `d`         | `float`       | required | Differentiation order (non-negative)   |
| `threshold` | `float`       | `1e-5`   | Weight truncation threshold (positive) |

Returns `list[float]` of length `len(series) - len(weights) + 1`. The output is shorter than the input because the first entries lack enough history for the full weight window.

<Warning>
  If the series is too short relative to the number of weights generated by `d` and `threshold`, a `ValueError` is raised. Lower the threshold or provide a longer series.
</Warning>

### hz.frac\_diff\_expanding

Expanding-window (full-memory) fractional differentiation. At each point t, uses all weights from lag 0 to lag t. This preserves the full information content of the original series but produces a non-stationary weight structure.

```python theme={null}
stationary = hz.frac_diff_expanding(prices, d=0.5)
```

| Parameter | Type          | Description                          |
| --------- | ------------- | ------------------------------------ |
| `series`  | `list[float]` | Input price series (non-empty)       |
| `d`       | `float`       | Differentiation order (non-negative) |

Returns `list[float]` of the same length as the input.

<Note>
  Expanding window is O(n^2) vs O(n \* w\_len) for FFD. Use FFD for production and expanding window for analysis where you need full-length output.
</Note>

### hz.adf\_statistic

Simplified Augmented Dickey-Fuller test statistic (no augmenting lags). Fits the regression delta\_y\[t] = alpha + beta \* y\[t-1] + epsilon\[t] and returns ADF stat = beta / SE(beta).

More negative values indicate stronger stationarity evidence.

```python theme={null}
t_stat = hz.adf_statistic(prices)
print(f"ADF statistic: {t_stat:.4f}")

# Approximate critical values (n > 100):
#   1%:  -3.43
#   5%:  -2.862
#   10%: -2.567
if t_stat < -2.862:
    print("Stationary at 5% significance level")
```

| Parameter | Type          | Description                                        |
| --------- | ------------- | -------------------------------------------------- |
| `series`  | `list[float]` | Input series (at least 3 observations, all finite) |

Returns `float`: the ADF test statistic.

### hz.min\_frac\_diff

Find the minimum differentiation order `d` that makes the series stationary (AFML Ch. 5.5).

Searches d from 0 to `max_d` in `n_steps` equal increments. For each d, applies `frac_diff_ffd`, then computes the ADF test statistic. Returns the smallest d whose ADF stat is below the 5% critical value (-2.862).

```python theme={null}
d_star, scan_results = hz.min_frac_diff(
    prices,
    p_threshold=0.05,       # reserved for future use
    max_d=1.0,              # upper bound on d
    n_steps=20,             # grid resolution
    weight_threshold=1e-5,  # FFD weight threshold
)
print(f"Minimum d for stationarity: {d_star:.3f}")

# scan_results is a list of (d, adf_stat) tuples
for d, adf in scan_results:
    marker = " <-- d*" if d == d_star else ""
    print(f"  d={d:.2f}  ADF={adf:.4f}{marker}")
```

| Parameter          | Type          | Default  | Description                                |
| ------------------ | ------------- | -------- | ------------------------------------------ |
| `series`           | `list[float]` | required | Price series (at least 10 observations)    |
| `p_threshold`      | `float`       | `0.05`   | Reserved for future p-value based stopping |
| `max_d`            | `float`       | `1.0`    | Upper bound on d search range              |
| `n_steps`          | `int`         | `20`     | Number of grid points between 0 and max\_d |
| `weight_threshold` | `float`       | `1e-5`   | Threshold for FFD weight truncation        |

Returns `(float, list[(float, float)])`: the optimal d and a list of (d, ADF statistic) scan results. If no d in the range achieves stationarity, optimal\_d is set to `max_d`.

***

## Workflow

The typical workflow for fractional differentiation:

```python theme={null}
import horizon as hz

# 1. Raw price series
prices = [...]  # e.g., daily close prices

# 2. Find minimum d for stationarity
d_star, scan = hz.min_frac_diff(
    prices,
    max_d=1.0,
    n_steps=20,
    weight_threshold=1e-5,
)
print(f"Optimal d: {d_star:.3f}")

# 3. Apply FFD with d*
stationary = hz.frac_diff_ffd(prices, d=d_star, threshold=1e-5)

# 4. Verify stationarity
adf = hz.adf_statistic(stationary)
print(f"ADF statistic: {adf:.4f}")
assert adf < -2.862, "Series not stationary at 5% level"

# 5. Use as ML feature
# The stationary series is shorter by len(weights) - 1
# Align with your target labels accordingly
print(f"Original length: {len(prices)}")
print(f"Stationary length: {len(stationary)}")
weights = hz.frac_diff_weights(d=d_star, threshold=1e-5)
print(f"Lost {len(weights) - 1} points to warm-up")
```

### Comparing d Values

```python theme={null}
import horizon as hz

prices = [...]  # raw prices

# Scan across d values
for d in [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]:
    if d == 0.0:
        series = prices
    else:
        series = hz.frac_diff_ffd(prices, d=d, threshold=1e-4)

    if len(series) >= 3:
        adf = hz.adf_statistic(series)
        print(f"d={d:.1f}  length={len(series):5d}  ADF={adf:8.4f}")
```

### Using with Information-Driven Bars

Combine fractional differentiation with information-driven bars for a complete AFML pipeline:

```python theme={null}
import horizon as hz

# 1. Build dollar bars from tick data
bars = hz.dollar_bars(timestamps, prices, volumes, threshold=50000.0)
bar_prices = [b.close for b in bars]

# 2. Fractionally differentiate the bar prices
d_star, _ = hz.min_frac_diff(bar_prices, max_d=1.0, n_steps=20)
stationary = hz.frac_diff_ffd(bar_prices, d=d_star)

# 3. Use stationary series for CUSUM event detection
events = hz.cusum_filter(stationary, threshold=0.02)

# 4. Label events using the original bar prices
# (align indices: stationary series is offset by len(weights) - 1)
weights = hz.frac_diff_weights(d=d_star, threshold=1e-5)
offset = len(weights) - 1
aligned_events = [e + offset for e in events if e + offset < len(bar_prices)]

labels = hz.triple_barrier_labels(
    prices=bar_prices,
    timestamps=[b.timestamp for b in bars],
    events=aligned_events,
    pt_sl=[1.0, 1.0],
    min_ret=0.005,
    max_holding=50,
    vol_span=20,
)
```

***

## Mathematical Background

<AccordionGroup>
  <Accordion title="Weight Recursion">
    The fractional differentiation operator of order d is defined by the binomial series:

    `(1 - B)^d = sum(w_k * B^k, k=0..inf)`

    where B is the backshift operator and the weights follow:

    * `w_0 = 1`
    * `w_k = -w_(k-1) * (d - k + 1) / k`

    For integer d = 1, this gives w = \[1, -1, 0, 0, ...] (standard first difference).
    For d = 0.5, the weights decay slowly: \[1, -0.5, -0.125, -0.0625, ...], preserving long-range memory.
  </Accordion>

  <Accordion title="FFD vs Expanding Window">
    The **expanding window** method applies all weights from lag 0 to lag t at each point t. This preserves the full information content but means early and late points use different numbers of lags, making the series non-stationary in its construction.

    The **fixed-width window (FFD)** method truncates weights below a threshold, fixing the window width. Every output point uses the same number of lags, producing a consistently constructed series. The trade-off is losing the first `len(weights) - 1` observations.

    FFD is preferred for production use because:

    1. Consistent lag structure across all output points
    2. Faster computation: O(n \* w\_len) vs O(n^2)
    3. The truncated weights are negligibly small
  </Accordion>

  <Accordion title="ADF Test">
    The Augmented Dickey-Fuller test checks the null hypothesis that a series has a unit root (is non-stationary). The test fits:

    delta\_y\[t] = alpha + beta \* y\[t-1] + epsilon\[t]

    The ADF statistic is beta / SE(beta). More negative values provide stronger evidence against the unit root hypothesis. The 5% critical value is approximately -2.862 for series with >100 observations.

    Horizon implements the simplified version without augmenting lags, which is sufficient for the `min_frac_diff` search where the goal is finding the stationarity threshold rather than precise p-values.
  </Accordion>
</AccordionGroup>
