KL-minimizing Sparse GP Approximations

Sparse GMRF approximations to Gaussian processes defined by kernel (covariance) matrices using KL-divergence minimizing sparse Cholesky factorization.

These functions enable efficient large-scale GP inference by approximating a dense kernel matrix with a sparse precision matrix using spatially-informed Cholesky factorization that minimizes the Kullback-Leibler divergence.

Overview

The KL-divergence minimizing sparse Cholesky approximation constructs a sparse GMRF from a kernel matrix through a four-stage algorithm that exploits spatial structure to determine which precision matrix entries can be safely set to zero.

1. Reverse Maximin Ordering

The algorithm begins by computing a reverse maximin ordering of the input points. This ordering selects points greedily to maximize the distance to previously selected points, creating a natural hierarchical structure where similar points tend to appear consecutively. The result is a permutation vector P that reorders the points for optimal sparsity.

2. Sparsity Pattern Construction

Based on the ordering and spatial locations, the algorithm determines a sparsity pattern using a neighborhood radius ρ × ℓᵢ, where ℓᵢ is the maximin distance for point i. This radius defines which matrix entries will be nonzero. Larger ρ values create denser (and more accurate) approximations at the cost of increased computational expense.

3. Supernodal Clustering (Optional, Default)

By default, the algorithm uses supernodal clustering to group columns with similar sparsity patterns into "supernodes". Rather than solving many small linear systems (one per column), the algorithm solves fewer but larger systems (one per supernode). This tends to be much more efficient since larger matrix operations can take advantage of optimized BLAS routines.

The clustering parameter λ (typically 1.5) controls the similarity threshold: columns are grouped together if their maximin distances differ by less than this factor.

4. Cholesky Factorization

The algorithm fills in the sparse Cholesky factor L such that L * L' ≈ (PKP')⁻¹, where K is the covariance matrix. For each supernode (or individual column when not using supernodes), the algorithm extracts the relevant submatrix of the covariance, solves a small dense Cholesky problem, and fills in the corresponding entries of L. The result is a GMRF with sparse precision matrix Q ≈ K⁻¹ that enables efficient inference.

Algorithm Parameters

The algorithm has two main tuning parameters:

The sparsity radius parameter ρ (default: 2.0) controls the neighborhood size for determining sparsity. Larger values create denser approximations with better accuracy but higher computational cost. Typical values range from 1.5 to 3.0, with 2.0-2.5 being a good balanced choice for most applications.

The supernodal clustering threshold λ (default: 1.5) controls how columns are grouped into supernodes. Setting it to nothing disables supernodal clustering entirely, which can be useful for very small problems or debugging. Higher values create larger supernodes, which can improve performance if the problem structure is favorable.

Main Functions

GaussianMarkovRandomFields.approximate_gmrf_kl Function

julia

approximate_gmrf_kl(kernel_mat::AbstractMatrix, X::AbstractMatrix; ρ = 2.0, λ = 1.5, alg = LinearSolve.CHOLMODFactorization())

Construct a sparse GMRF approximation from a kernel (covariance) matrix.

Given a kernel matrix K (covariance matrix) obtained by evaluating a kernel function at input points X, construct a sparse Gaussian Markov Random Field (GMRF) that approximates the corresponding Gaussian process.

Arguments

kernel_mat::AbstractMatrix: Kernel (covariance) matrix
X::AbstractMatrix: Input point locations (d × n matrix where d is spatial dimension, n is number of points)

Keyword Arguments

ρ::Real = 2.0: Sparsity pattern radius parameter. Larger values create denser, more accurate approximations. Typical range: [1.5, 3.0].
λ::Union{Real,Nothing} = 1.5: Supernodal clustering parameter. If nothing, uses standard column-by-column factorization. If a number, uses supernodal clustering (typically 1.5). Supernodal factorization is usually faster due to improved cache efficiency.
alg = LinearSolve.CHOLMODFactorization(): LinearSolve algorithm for the resulting GMRF

Returns

::GMRF: A sparse GMRF with zero mean and sparse precision matrix approximating K⁻¹

Examples

julia

using KernelFunctions, GaussianMarkovRandomFields

# Create spatial grid
X = hcat([[x, y] for x in 0:0.1:1, y in 0:0.1:1]...)

# Define kernel and compute kernel matrix
kernel = with_lengthscale(Matern32Kernel(), 0.2)
K = kernelmatrix(kernel, X, obsdim=2)

# Create sparse GMRF approximation (uses supernodal by default)
gmrf = approximate_gmrf_kl(K, X; ρ=2.0)

# Use non-supernodal version
gmrf_nonsupernodal = approximate_gmrf_kl(K, X; ρ=2.0, λ=nothing)

# Use for inference
posterior = linear_condition(gmrf; A=A, Q_ϵ=Q_ϵ, y=y)

source

GaussianMarkovRandomFields.sparse_approximate_cholesky Function

julia

sparse_approximate_cholesky(Θ::AbstractMatrix, X::AbstractMatrix; ρ = 2.0, λ = 1.5)

Compute sparse approximate Cholesky factorization with automatic sparsity pattern.

Given a covariance matrix Θ and spatial input points X, compute a sparse approximate Cholesky factor L and permutation P such that L * L' ≈ (Θ[P, P])⁻¹.

Arguments

Θ::AbstractMatrix: Covariance matrix to be inverted approximately
X::AbstractMatrix: Input point locations (d × n matrix where d is spatial dimension, n is number of points)

Keyword Arguments

ρ::Real = 2.0: Sparsity pattern radius parameter. Controls the neighborhood size used to determine sparsity. Larger values create denser (more accurate) approximations. Typical values are in the range [1.5, 3.0].
λ::Union{Real,Nothing} = 1.5: Supernodal clustering parameter. If nothing, uses standard column-by-column factorization. If a number, uses supernodal clustering with the given threshold (typically 1.5). Supernodal factorization groups nearby columns together for improved cache efficiency and is usually faster.

Returns

L::SparseMatrixCSC: Lower-triangular sparse Cholesky factor in permuted coordinates
P::Vector{Int}: Permutation vector from the reverse maximin ordering

Details

The permutation P reorders the points to achieve better sparsity. The returned factor L satisfies L * L' ≈ (Θ[P, P])⁻¹, or equivalently (L * L')[invperm(P), invperm(P)] ≈ Θ⁻¹.

When λ !== nothing, the algorithm uses supernodal clustering to group columns with similar sparsity patterns, solving larger dense systems less frequently rather than many small systems. This typically improves performance through better cache locality.

source

GaussianMarkovRandomFields.sparse_approximate_cholesky! Function

julia

sparse_approximate_cholesky!(Θ::AbstractMatrix, L::SparseMatrixCSC)

In-place computation of sparse approximate Cholesky factorization.

Fill the nonzero values of the lower-triangular sparse matrix L such that L * L' ≈ Θ⁻¹, where Θ is a covariance matrix and L * L' is the approximate precision (inverse covariance) matrix.

This function uses the sparsity pattern already defined in L and fills in the values using local Cholesky decompositions at each step. The approximation quality depends on the sparsity pattern of L.

Arguments

Θ::AbstractMatrix: Covariance matrix to be inverted approximately
L::SparseMatrixCSC: Sparse lower-triangular matrix with predefined sparsity pattern. Values will be overwritten.

Details

The algorithm proceeds column-by-column through L, solving local linear systems using dense Cholesky factorizations on small submatrices of Θ. A small regularization term (1e-6 * I) is added for numerical stability.

source

Supporting Functions

Point Ordering

GaussianMarkovRandomFields.reverse_maximin_ordering Function

julia

reverse_maximin_ordering(X::AbstractMatrix; point_tree = KDTree(X))

Compute a reverse maximin ordering of spatial points.

The reverse maximin ordering selects points greedily to maximize the distance to previously selected points. This creates a hierarchical structure where similar points appear consecutively in the ordering, which is beneficial for sparse Cholesky factorization.

Arguments

X::AbstractMatrix: Input point locations (d × n matrix where d is spatial dimension)

Keyword Arguments

point_tree = KDTree(X): Pre-computed KDTree for efficient nearest neighbor queries

Returns

P::Vector{Int}: Permutation vector (ordering of points)
ℓ::Vector{Float64}: Maximin distances for each point

Details

The algorithm maintains a priority queue of unselected points, ordered by their distance to the nearest selected point. At each step, it selects the point furthest from all previously selected points.

source

GaussianMarkovRandomFields.reverse_maximin_ordering_and_sparsity_pattern Function

julia

reverse_maximin_ordering_and_sparsity_pattern(X::AbstractMatrix, ρ::Real; lower = true)

Compute reverse maximin ordering and determine the sparsity pattern for sparse Cholesky factorization.

This function combines the reverse maximin ordering with sparsity pattern construction based on spatial neighborhoods. The resulting sparsity pattern determines which entries of the Cholesky factor will be nonzero.

Arguments

X::AbstractMatrix: Input point locations (d × n matrix where d is spatial dimension)
ρ::Real: Neighborhood radius multiplier for sparsity pattern construction

Keyword Arguments

lower::Bool = true: If true, return lower triangular pattern; otherwise upper triangular

Returns

S::SparseMatrixCSC: Sparsity pattern matrix (all nonzeros are 0.0, structure only)
P::Vector{Int}: Permutation vector from reverse maximin ordering
ℓ::Vector{Float64}: Maximin distances for each point

Details

For each point i, the algorithm includes entries in the sparsity pattern for all neighbors within distance ρ × ℓᵢ, where ℓᵢ is the maximin distance for point i. Larger ρ values create denser (and more accurate) approximations.

source

Supernodal Clustering

GaussianMarkovRandomFields.SupernodeClustering Type

julia

SupernodeClustering

Represents a clustering of matrix columns into supernodes for sparse Cholesky factorization.

Fields

column_indices::Vector{Vector{Int}}: For each supernode, the column indices belonging to it
row_indices::Vector{SortedSet{Int}}: For each supernode, the row indices in its sparsity pattern

Supernodes group columns with similar sparsity patterns to enable more efficient factorization through larger dense linear algebra operations.

source

GaussianMarkovRandomFields.form_supernodes Function

julia

form_supernodes(S::SparseMatrixCSC, P, ℓ; λ = 1.5)

Cluster columns of a sparse matrix into supernodes based on maximin distances.

Groups columns with similar sparsity patterns and nearby maximin distances into "supernodes" for more efficient sparse Cholesky factorization.

Arguments

S::SparseMatrixCSC: Sparse sparsity pattern matrix
P::Vector{Int}: Permutation vector from reverse maximin ordering
ℓ::Vector{Float64}: Maximin distances for each point

Keyword Arguments

λ::Real = 1.5: Clustering threshold. Columns are grouped if their maximin distances differ by less than this factor. Larger values create larger supernodes.

Returns

SupernodeClustering: The supernodal clustering structure

Details

The algorithm processes columns sequentially in the permuted order. For each unassigned column, it creates a new supernode and assigns all unassigned child columns (in the sparsity pattern) whose maximin distances satisfy ℓ[i] <= λ * ℓ[j].

source

The form_supernodes function takes the sparsity pattern and maximin ordering and creates a supernodal clustering. This is used internally when λ !== nothing.

Matrix Utilities

GaussianMarkovRandomFields.PermutedMatrix Type

julia

PermutedMatrix{T, M <: AbstractMatrix{T}} <: AbstractMatrix{T}

A memory-efficient wrapper that represents a symmetrically permuted matrix without materializing it. For a matrix A and permutation vector p, PermutedMatrix(A, p) represents A[p, p] without forming it explicitly.

This is particularly useful for accessing elements of permuted matrices in sparse matrix algorithms without the memory overhead of creating a full permuted copy.

Type Parameters

T: Element type of the matrix
M: Type of the underlying matrix (must be <: AbstractMatrix{T})

Fields

matrix::M: The underlying matrix
perm::Vector{Int}: Permutation vector (must have length equal to matrix dimensions)

Examples

julia

A = [1 2 3; 4 5 6; 7 8 9]
p = [3, 1, 2]
PA = PermutedMatrix(A, p)

# Access individual elements (applies permutation to both indices)
PA[1, 1] == A[3, 3]  # true
PA[1, 2] == A[3, 1]  # true

# Access blocks
PA[1:2, 1:2] == A[[3, 1], [3, 1]]  # true

# Convert to explicit matrix
to_matrix(PA) == A[p, p]  # true

source

The PermutedMatrix wrapper enables efficient access to permuted covariance matrices without materializing the full permuted matrix in memory.

Performance Considerations

When to Use Supernodal Factorization

The default supernodal factorization (λ=1.5) is recommended for most use cases, as it's typically faster and more accurate. You should only consider disabling it (λ=nothing) for very small problems (fewer than 100 points), extremely memory-constrained environments, or when debugging or comparing against a reference implementation.

KL-minimizing Sparse GP Approximations ​

Overview ​

1. Reverse Maximin Ordering ​

2. Sparsity Pattern Construction ​

3. Supernodal Clustering (Optional, Default) ​

4. Cholesky Factorization ​

Algorithm Parameters ​

Main Functions ​

Supporting Functions ​

Point Ordering ​

Supernodal Clustering ​

Matrix Utilities ​

Performance Considerations ​

When to Use Supernodal Factorization ​

See Also ​

KL-minimizing Sparse GP Approximations

Overview

1. Reverse Maximin Ordering

2. Sparsity Pattern Construction

3. Supernodal Clustering (Optional, Default)

4. Cholesky Factorization

Algorithm Parameters

Main Functions

Supporting Functions

Point Ordering

Supernodal Clustering

Matrix Utilities

Performance Considerations

When to Use Supernodal Factorization

See Also