> ## Documentation Index
> Fetch the complete documentation index at: https://mathematicalcompany.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# HRP & Denoising

> Hierarchical Risk Parity allocation and Marcenko-Pastur covariance denoising from Machine Learning for Asset Managers.

<Note>
  **Pro Feature.** Requires a Pro or Ultra subscription. [Get started at api.mathematicalcompany.com](https://api.mathematicalcompany.com)
</Note>

# HRP & Denoising

Horizon implements Hierarchical Risk Parity (HRP) and Marcenko-Pastur covariance denoising from Lopez de Prado's *Machine Learning for Asset Managers*. All matrix operations, eigendecomposition (Jacobi method), and clustering are implemented from scratch in Rust. No external linear algebra dependencies.

<CardGroup cols={2}>
  <Card title="HRP Allocation" icon="scale-balanced">
    Correlation-distance clustering with inverse-variance recursive bisection. No matrix inversion required.
  </Card>

  <Card title="Covariance Denoising" icon="broom">
    Marcenko-Pastur eigenvalue clipping to separate signal from noise in sample covariance matrices.
  </Card>

  <Card title="Detoning" icon="minimize">
    Remove the dominant market factor (first principal component) to reveal idiosyncratic structure.
  </Card>

  <Card title="Full Pipeline" icon="diagram-project">
    Combine denoising with HRP for robust allocation from noisy return data.
  </Card>
</CardGroup>

***

## Hierarchical Risk Parity

HRP avoids the pitfalls of traditional mean-variance optimization (matrix inversion instability, concentrated portfolios) by using a hierarchical clustering approach:

1. **Correlation to distance**: convert the correlation matrix to a distance matrix
2. **Hierarchical clustering**: single-linkage agglomerative clustering on the distance matrix
3. **Quasi-diagonalization**: reorder assets so that correlated assets are adjacent (seriation)
4. **Recursive bisection**: allocate weights by splitting the sorted list in half and weighting inversely proportional to cluster variance

The result is a diversified portfolio that respects the correlation structure without requiring matrix inversion.

### hz.hrp\_weights

```python theme={null}
import horizon as hz

cov = [
    [0.04, 0.02, 0.01],
    [0.02, 0.09, 0.03],
    [0.01, 0.03, 0.16],
]

result = hz.hrp_weights(cov)
print(result.weights)         # e.g., [0.52, 0.30, 0.18]
print(result.sorted_indices)  # Quasi-diagonal ordering, e.g., [0, 1, 2]
print(result.linkage)         # Clustering linkage: [(cluster_a, cluster_b, distance), ...]
```

| Parameter    | Type                | Description                                                 |
| ------------ | ------------------- | ----------------------------------------------------------- |
| `covariance` | `list[list[float]]` | N x N covariance matrix (must be square, positive diagonal) |

Returns an `HRPResult` object.

### HRPResult Type

| Field            | Type                      | Description                                                          |
| ---------------- | ------------------------- | -------------------------------------------------------------------- |
| `weights`        | `list[float]`             | Portfolio weights summing to 1.0, one per asset                      |
| `sorted_indices` | `list[int]`               | Asset indices in quasi-diagonalized (seriation) order                |
| `linkage`        | `list[(int, int, float)]` | Clustering linkage: each entry is `(cluster_a, cluster_b, distance)` |

<Note>
  In the linkage list, cluster indices below N are original assets. Indices >= N represent merged clusters formed in earlier steps. There are always N-1 merges for N assets.
</Note>

### How Weights Are Determined

Lower-variance assets receive more weight. When assets form correlated blocks, the algorithm allocates between blocks inversely proportional to their cluster variance, then recurses within each block.

```python theme={null}
import horizon as hz

# Two low-variance assets and two high-variance assets
# with block correlation structure
cov = [
    [0.04, 0.03, 0.001, 0.001],  # asset 0: low var, correlated with 1
    [0.03, 0.04, 0.001, 0.001],  # asset 1: low var, correlated with 0
    [0.001, 0.001, 0.09, 0.06],  # asset 2: high var, correlated with 3
    [0.001, 0.001, 0.06, 0.09],  # asset 3: high var, correlated with 2
]

result = hz.hrp_weights(cov)
print(result.weights)
# Low-var block (assets 0,1) gets more total weight than high-var block (assets 2,3)
low_var_total = result.weights[0] + result.weights[1]
high_var_total = result.weights[2] + result.weights[3]
print(f"Low-var block: {low_var_total:.2%}")
print(f"High-var block: {high_var_total:.2%}")
```

***

## Covariance Denoising (Marcenko-Pastur)

Sample covariance matrices estimated from finite data contain noise. The Marcenko-Pastur distribution provides a theoretical upper bound for the eigenvalues of a random matrix. Eigenvalues below this bound are noise; eigenvalues above it carry signal.

Denoising replaces noise eigenvalues with their average, shrinking the noise while preserving the signal structure.

### hz.denoise\_covariance

```python theme={null}
import horizon as hz

cov = [
    [0.04, 0.02, 0.01],
    [0.02, 0.09, 0.03],
    [0.01, 0.03, 0.16],
]

denoised = hz.denoise_covariance(cov, n_observations=100)
print(denoised.covariance)     # Cleaned N x N covariance matrix
print(denoised.eigenvalues)    # Eigenvalues (sorted descending)
print(denoised.n_signals)      # Number of signal eigenvalues (above MP threshold)
print(denoised.n_noise)        # Number of noise eigenvalues (below MP threshold)
```

| Parameter        | Type                | Description                                                           |
| ---------------- | ------------------- | --------------------------------------------------------------------- |
| `covariance`     | `list[list[float]]` | N x N covariance matrix (square, positive diagonal)                   |
| `n_observations` | `int`               | Number of observations T used to estimate the covariance (at least 2) |

The Marcenko-Pastur upper bound is computed as:

lambda\_+ = sigma^2 \* (1 + sqrt(N/T))^2

where sigma^2 is estimated as the average eigenvalue (trace / N).

### DenoisedCov Type

| Field         | Type                | Description                                                    |
| ------------- | ------------------- | -------------------------------------------------------------- |
| `covariance`  | `list[list[float]]` | Denoised N x N covariance matrix                               |
| `eigenvalues` | `list[float]`       | Original eigenvalues sorted descending                         |
| `n_signals`   | `int`               | Number of signal eigenvalues (above Marcenko-Pastur threshold) |
| `n_noise`     | `int`               | Number of noise eigenvalues (below threshold)                  |

<Tip>
  The ratio N/T (variables to observations) matters. When N/T is large (many assets, few observations), more eigenvalues are classified as noise. Collect more observations or reduce the number of assets to improve the signal-to-noise ratio.
</Tip>

***

## Covariance Detoning

Detoning removes the market factor (first principal component) from a covariance matrix. This reveals the idiosyncratic correlation structure by zeroing out the largest eigenvalue's contribution.

### hz.detone\_covariance

```python theme={null}
import horizon as hz

cov = [
    [0.04, 0.02, 0.01],
    [0.02, 0.09, 0.03],
    [0.01, 0.03, 0.16],
]

detoned = hz.detone_covariance(cov)
# detoned is a list[list[float]] -- the covariance matrix without the market mode
```

| Parameter    | Type                | Description                                         |
| ------------ | ------------------- | --------------------------------------------------- |
| `covariance` | `list[list[float]]` | N x N covariance matrix (square, positive diagonal) |

Returns `list[list[float]]`: the detoned covariance matrix. The trace of the detoned matrix will be smaller than the original (the dominant eigenvalue has been removed).

<Note>
  Detoning is useful for correlation-based clustering (e.g., the clustering step of HRP). When all assets are driven by a common market factor, their correlations are inflated. Removing the market mode makes the idiosyncratic structure more visible, potentially improving cluster quality.
</Note>

***

## Full Pipeline

Combine denoising with HRP for robust portfolio allocation from noisy return data.

```python theme={null}
import horizon as hz

# 1. Compute sample covariance from returns
# returns_matrix: T observations x N assets
returns_matrix = [
    [0.01, -0.02, 0.005, 0.008],
    [0.02,  0.01, -0.01, 0.003],
    [-0.01, 0.03, 0.02, -0.005],
    [0.005, -0.01, 0.01, 0.012],
    # ... more observations
]

T = len(returns_matrix)
N = len(returns_matrix[0])

# Compute sample covariance (or use numpy)
means = [sum(r[j] for r in returns_matrix) / T for j in range(N)]
cov = [[0.0] * N for _ in range(N)]
for i in range(N):
    for j in range(N):
        cov[i][j] = sum(
            (r[i] - means[i]) * (r[j] - means[j])
            for r in returns_matrix
        ) / (T - 1)

# 2. Denoise the covariance matrix
denoised = hz.denoise_covariance(cov, n_observations=T)
print(f"Signal eigenvalues: {denoised.n_signals}")
print(f"Noise eigenvalues: {denoised.n_noise}")

# 3. Compute HRP weights on the denoised covariance
result = hz.hrp_weights(denoised.covariance)
for i, w in enumerate(result.weights):
    print(f"Asset {i}: {w:.2%}")
```

### Pipeline with Detoning

```python theme={null}
import horizon as hz

# For clustering purposes, detone first to remove market factor,
# then run HRP on the original (or denoised) covariance

cov = [
    [0.04, 0.03, 0.001, 0.001],
    [0.03, 0.04, 0.001, 0.001],
    [0.001, 0.001, 0.09, 0.06],
    [0.001, 0.001, 0.06, 0.09],
]

# Option A: Standard HRP
standard = hz.hrp_weights(cov)

# Option B: Denoise then HRP
denoised = hz.denoise_covariance(cov, n_observations=200)
denoised_hrp = hz.hrp_weights(denoised.covariance)

# Option C: Detone (for analysis)
detoned = hz.detone_covariance(cov)

print("Standard HRP weights:", [f"{w:.3f}" for w in standard.weights])
print("Denoised HRP weights:", [f"{w:.3f}" for w in denoised_hrp.weights])
```

### Combining with Fractional Differentiation

Use fractional differentiation to create stationary return features, then compute covariance for HRP:

```python theme={null}
import horizon as hz

# Multiple asset price series
asset_prices = {
    "btc": [...],
    "eth": [...],
    "sol": [...],
}

# Fractionally differentiate each series
stationary = {}
for name, prices in asset_prices.items():
    d_star, _ = hz.min_frac_diff(prices, max_d=1.0, n_steps=20)
    stationary[name] = hz.frac_diff_ffd(prices, d=d_star)
    print(f"{name}: d*={d_star:.3f}, length={len(stationary[name])}")

# Trim to common length
min_len = min(len(s) for s in stationary.values())
names = list(stationary.keys())
returns_matrix = []
for t in range(min_len):
    row = [stationary[name][t] for name in names]
    returns_matrix.append(row)

# Compute covariance and run HRP
T = len(returns_matrix)
N = len(names)
means = [sum(r[j] for r in returns_matrix) / T for j in range(N)]
cov = [[0.0] * N for _ in range(N)]
for i in range(N):
    for j in range(N):
        cov[i][j] = sum(
            (r[i] - means[i]) * (r[j] - means[j])
            for r in returns_matrix
        ) / (T - 1)

denoised = hz.denoise_covariance(cov, n_observations=T)
result = hz.hrp_weights(denoised.covariance)

for name, w in zip(names, result.weights):
    print(f"{name}: {w:.2%}")
```

***

## Mathematical Background

<AccordionGroup>
  <Accordion title="HRP Algorithm">
    HRP proceeds in four steps:

    1. **Correlation distance**: d(i,j) = sqrt(0.5 \* (1 - corr(i,j))). Perfectly correlated assets have distance 0; uncorrelated assets have distance sqrt(0.5).

    2. **Single-linkage clustering**: agglomerative clustering where the distance between two clusters is the minimum distance between any pair of their members. This builds a dendrogram (tree) of N-1 merges.

    3. **Quasi-diagonalization**: traverse the dendrogram to produce a leaf ordering where correlated assets are adjacent. This is the seriation step.

    4. **Recursive bisection**: split the sorted asset list in half. For each half, compute the cluster variance (w' \* Sigma \* w using inverse-variance weights within the cluster). Allocate between halves inversely proportional to their cluster variances. Recurse until single assets remain.

    The result is a portfolio that allocates more weight to lower-variance clusters and respects the hierarchical correlation structure, without ever inverting a matrix.
  </Accordion>

  <Accordion title="Marcenko-Pastur Distribution">
    For a T x N random matrix with i.i.d. entries, the eigenvalues of the sample covariance matrix follow the Marcenko-Pastur distribution as T, N -> infinity with q = N/T fixed.

    The support of this distribution is \[lambda\_-, lambda\_+] where:

    * lambda\_+ = sigma^2 \* (1 + sqrt(q))^2
    * lambda\_- = sigma^2 \* (1 - sqrt(q))^2

    Eigenvalues above lambda\_+ carry genuine signal about the correlation structure. Eigenvalues within \[lambda\_-, lambda\_+] are consistent with random noise.

    Denoising replaces noise eigenvalues with their average, preserving the total variance (trace) while reducing spurious correlations.
  </Accordion>

  <Accordion title="Detoning">
    The first principal component (largest eigenvalue) typically captures the market factor: the common movement that drives all assets together. Removing it by zeroing out the largest eigenvalue and reconstructing the matrix reveals the residual (idiosyncratic) correlation structure.

    This is useful when the market factor inflates correlations and obscures the true clustering of assets. After detoning, assets that move together for idiosyncratic reasons (sector, geography) become more distinguishable.
  </Accordion>

  <Accordion title="Jacobi Eigendecomposition">
    Horizon implements the classical cyclic Jacobi eigenvalue algorithm for real symmetric matrices. For each off-diagonal element above a tolerance, a Givens rotation is applied to zero it out. The algorithm converges for any symmetric matrix and is numerically stable.

    This avoids external LAPACK/BLAS dependencies, keeping the Rust binary self-contained and portable across platforms.
  </Accordion>
</AccordionGroup>

<Warning>
  The covariance matrix must be square with non-negative diagonal entries. Passing a non-square matrix, an empty matrix, or a matrix with negative diagonal entries will raise a `ValueError`.
</Warning>