Skip to main content
Pro Feature. Requires a Pro or Ultra subscription. Get started at api.mathematicalcompany.com

HRP & Denoising

Horizon implements Hierarchical Risk Parity (HRP) and Marcenko-Pastur covariance denoising from Lopez de Prado’s Machine Learning for Asset Managers. All matrix operations, eigendecomposition (Jacobi method), and clustering are implemented from scratch in Rust. No external linear algebra dependencies.

HRP Allocation

Correlation-distance clustering with inverse-variance recursive bisection. No matrix inversion required.

Covariance Denoising

Marcenko-Pastur eigenvalue clipping to separate signal from noise in sample covariance matrices.

Detoning

Remove the dominant market factor (first principal component) to reveal idiosyncratic structure.

Full Pipeline

Combine denoising with HRP for robust allocation from noisy return data.

Hierarchical Risk Parity

HRP avoids the pitfalls of traditional mean-variance optimization (matrix inversion instability, concentrated portfolios) by using a hierarchical clustering approach:
  1. Correlation to distance: convert the correlation matrix to a distance matrix
  2. Hierarchical clustering: single-linkage agglomerative clustering on the distance matrix
  3. Quasi-diagonalization: reorder assets so that correlated assets are adjacent (seriation)
  4. Recursive bisection: allocate weights by splitting the sorted list in half and weighting inversely proportional to cluster variance
The result is a diversified portfolio that respects the correlation structure without requiring matrix inversion.

hz.hrp_weights

import horizon as hz

cov = [
    [0.04, 0.02, 0.01],
    [0.02, 0.09, 0.03],
    [0.01, 0.03, 0.16],
]

result = hz.hrp_weights(cov)
print(result.weights)         # e.g., [0.52, 0.30, 0.18]
print(result.sorted_indices)  # Quasi-diagonal ordering, e.g., [0, 1, 2]
print(result.linkage)         # Clustering linkage: [(cluster_a, cluster_b, distance), ...]
ParameterTypeDescription
covariancelist[list[float]]N x N covariance matrix (must be square, positive diagonal)
Returns an HRPResult object.

HRPResult Type

FieldTypeDescription
weightslist[float]Portfolio weights summing to 1.0, one per asset
sorted_indiceslist[int]Asset indices in quasi-diagonalized (seriation) order
linkagelist[(int, int, float)]Clustering linkage: each entry is (cluster_a, cluster_b, distance)
In the linkage list, cluster indices below N are original assets. Indices >= N represent merged clusters formed in earlier steps. There are always N-1 merges for N assets.

How Weights Are Determined

Lower-variance assets receive more weight. When assets form correlated blocks, the algorithm allocates between blocks inversely proportional to their cluster variance, then recurses within each block.
import horizon as hz

# Two low-variance assets and two high-variance assets
# with block correlation structure
cov = [
    [0.04, 0.03, 0.001, 0.001],  # asset 0: low var, correlated with 1
    [0.03, 0.04, 0.001, 0.001],  # asset 1: low var, correlated with 0
    [0.001, 0.001, 0.09, 0.06],  # asset 2: high var, correlated with 3
    [0.001, 0.001, 0.06, 0.09],  # asset 3: high var, correlated with 2
]

result = hz.hrp_weights(cov)
print(result.weights)
# Low-var block (assets 0,1) gets more total weight than high-var block (assets 2,3)
low_var_total = result.weights[0] + result.weights[1]
high_var_total = result.weights[2] + result.weights[3]
print(f"Low-var block: {low_var_total:.2%}")
print(f"High-var block: {high_var_total:.2%}")

Covariance Denoising (Marcenko-Pastur)

Sample covariance matrices estimated from finite data contain noise. The Marcenko-Pastur distribution provides a theoretical upper bound for the eigenvalues of a random matrix. Eigenvalues below this bound are noise; eigenvalues above it carry signal. Denoising replaces noise eigenvalues with their average, shrinking the noise while preserving the signal structure.

hz.denoise_covariance

import horizon as hz

cov = [
    [0.04, 0.02, 0.01],
    [0.02, 0.09, 0.03],
    [0.01, 0.03, 0.16],
]

denoised = hz.denoise_covariance(cov, n_observations=100)
print(denoised.covariance)     # Cleaned N x N covariance matrix
print(denoised.eigenvalues)    # Eigenvalues (sorted descending)
print(denoised.n_signals)      # Number of signal eigenvalues (above MP threshold)
print(denoised.n_noise)        # Number of noise eigenvalues (below MP threshold)
ParameterTypeDescription
covariancelist[list[float]]N x N covariance matrix (square, positive diagonal)
n_observationsintNumber of observations T used to estimate the covariance (at least 2)
The Marcenko-Pastur upper bound is computed as: lambda_+ = sigma^2 * (1 + sqrt(N/T))^2 where sigma^2 is estimated as the average eigenvalue (trace / N).

DenoisedCov Type

FieldTypeDescription
covariancelist[list[float]]Denoised N x N covariance matrix
eigenvalueslist[float]Original eigenvalues sorted descending
n_signalsintNumber of signal eigenvalues (above Marcenko-Pastur threshold)
n_noiseintNumber of noise eigenvalues (below threshold)
The ratio N/T (variables to observations) matters. When N/T is large (many assets, few observations), more eigenvalues are classified as noise. Collect more observations or reduce the number of assets to improve the signal-to-noise ratio.

Covariance Detoning

Detoning removes the market factor (first principal component) from a covariance matrix. This reveals the idiosyncratic correlation structure by zeroing out the largest eigenvalue’s contribution.

hz.detone_covariance

import horizon as hz

cov = [
    [0.04, 0.02, 0.01],
    [0.02, 0.09, 0.03],
    [0.01, 0.03, 0.16],
]

detoned = hz.detone_covariance(cov)
# detoned is a list[list[float]] -- the covariance matrix without the market mode
ParameterTypeDescription
covariancelist[list[float]]N x N covariance matrix (square, positive diagonal)
Returns list[list[float]]: the detoned covariance matrix. The trace of the detoned matrix will be smaller than the original (the dominant eigenvalue has been removed).
Detoning is useful for correlation-based clustering (e.g., the clustering step of HRP). When all assets are driven by a common market factor, their correlations are inflated. Removing the market mode makes the idiosyncratic structure more visible, potentially improving cluster quality.

Full Pipeline

Combine denoising with HRP for robust portfolio allocation from noisy return data.
import horizon as hz

# 1. Compute sample covariance from returns
# returns_matrix: T observations x N assets
returns_matrix = [
    [0.01, -0.02, 0.005, 0.008],
    [0.02,  0.01, -0.01, 0.003],
    [-0.01, 0.03, 0.02, -0.005],
    [0.005, -0.01, 0.01, 0.012],
    # ... more observations
]

T = len(returns_matrix)
N = len(returns_matrix[0])

# Compute sample covariance (or use numpy)
means = [sum(r[j] for r in returns_matrix) / T for j in range(N)]
cov = [[0.0] * N for _ in range(N)]
for i in range(N):
    for j in range(N):
        cov[i][j] = sum(
            (r[i] - means[i]) * (r[j] - means[j])
            for r in returns_matrix
        ) / (T - 1)

# 2. Denoise the covariance matrix
denoised = hz.denoise_covariance(cov, n_observations=T)
print(f"Signal eigenvalues: {denoised.n_signals}")
print(f"Noise eigenvalues: {denoised.n_noise}")

# 3. Compute HRP weights on the denoised covariance
result = hz.hrp_weights(denoised.covariance)
for i, w in enumerate(result.weights):
    print(f"Asset {i}: {w:.2%}")

Pipeline with Detoning

import horizon as hz

# For clustering purposes, detone first to remove market factor,
# then run HRP on the original (or denoised) covariance

cov = [
    [0.04, 0.03, 0.001, 0.001],
    [0.03, 0.04, 0.001, 0.001],
    [0.001, 0.001, 0.09, 0.06],
    [0.001, 0.001, 0.06, 0.09],
]

# Option A: Standard HRP
standard = hz.hrp_weights(cov)

# Option B: Denoise then HRP
denoised = hz.denoise_covariance(cov, n_observations=200)
denoised_hrp = hz.hrp_weights(denoised.covariance)

# Option C: Detone (for analysis)
detoned = hz.detone_covariance(cov)

print("Standard HRP weights:", [f"{w:.3f}" for w in standard.weights])
print("Denoised HRP weights:", [f"{w:.3f}" for w in denoised_hrp.weights])

Combining with Fractional Differentiation

Use fractional differentiation to create stationary return features, then compute covariance for HRP:
import horizon as hz

# Multiple asset price series
asset_prices = {
    "btc": [...],
    "eth": [...],
    "sol": [...],
}

# Fractionally differentiate each series
stationary = {}
for name, prices in asset_prices.items():
    d_star, _ = hz.min_frac_diff(prices, max_d=1.0, n_steps=20)
    stationary[name] = hz.frac_diff_ffd(prices, d=d_star)
    print(f"{name}: d*={d_star:.3f}, length={len(stationary[name])}")

# Trim to common length
min_len = min(len(s) for s in stationary.values())
names = list(stationary.keys())
returns_matrix = []
for t in range(min_len):
    row = [stationary[name][t] for name in names]
    returns_matrix.append(row)

# Compute covariance and run HRP
T = len(returns_matrix)
N = len(names)
means = [sum(r[j] for r in returns_matrix) / T for j in range(N)]
cov = [[0.0] * N for _ in range(N)]
for i in range(N):
    for j in range(N):
        cov[i][j] = sum(
            (r[i] - means[i]) * (r[j] - means[j])
            for r in returns_matrix
        ) / (T - 1)

denoised = hz.denoise_covariance(cov, n_observations=T)
result = hz.hrp_weights(denoised.covariance)

for name, w in zip(names, result.weights):
    print(f"{name}: {w:.2%}")

Mathematical Background

HRP proceeds in four steps:
  1. Correlation distance: d(i,j) = sqrt(0.5 * (1 - corr(i,j))). Perfectly correlated assets have distance 0; uncorrelated assets have distance sqrt(0.5).
  2. Single-linkage clustering: agglomerative clustering where the distance between two clusters is the minimum distance between any pair of their members. This builds a dendrogram (tree) of N-1 merges.
  3. Quasi-diagonalization: traverse the dendrogram to produce a leaf ordering where correlated assets are adjacent. This is the seriation step.
  4. Recursive bisection: split the sorted asset list in half. For each half, compute the cluster variance (w’ * Sigma * w using inverse-variance weights within the cluster). Allocate between halves inversely proportional to their cluster variances. Recurse until single assets remain.
The result is a portfolio that allocates more weight to lower-variance clusters and respects the hierarchical correlation structure, without ever inverting a matrix.
For a T x N random matrix with i.i.d. entries, the eigenvalues of the sample covariance matrix follow the Marcenko-Pastur distribution as T, N -> infinity with q = N/T fixed.The support of this distribution is [lambda_-, lambda_+] where:
  • lambda_+ = sigma^2 * (1 + sqrt(q))^2
  • lambda_- = sigma^2 * (1 - sqrt(q))^2
Eigenvalues above lambda_+ carry genuine signal about the correlation structure. Eigenvalues within [lambda_-, lambda_+] are consistent with random noise.Denoising replaces noise eigenvalues with their average, preserving the total variance (trace) while reducing spurious correlations.
The first principal component (largest eigenvalue) typically captures the market factor: the common movement that drives all assets together. Removing it by zeroing out the largest eigenvalue and reconstructing the matrix reveals the residual (idiosyncratic) correlation structure.This is useful when the market factor inflates correlations and obscures the true clustering of assets. After detoning, assets that move together for idiosyncratic reasons (sector, geography) become more distinguishable.
Horizon implements the classical cyclic Jacobi eigenvalue algorithm for real symmetric matrices. For each off-diagonal element above a tolerance, a Givens rotation is applied to zero it out. The algorithm converges for any symmetric matrix and is numerically stable.This avoids external LAPACK/BLAS dependencies, keeping the Rust binary self-contained and portable across platforms.
The covariance matrix must be square with non-negative diagonal entries. Passing a non-square matrix, an empty matrix, or a matrix with negative diagonal entries will raise a ValueError.