KL-minimizing Sparse GP Approximations
Sparse GMRF approximations to Gaussian processes defined by kernel (covariance) matrices using KL-divergence minimizing sparse Cholesky factorization.
These functions enable efficient large-scale GP inference by approximating a dense kernel matrix with a sparse precision matrix using spatially-informed Cholesky factorization that minimizes the Kullback-Leibler divergence.
Overview
The KL-divergence minimizing sparse Cholesky approximation constructs a sparse GMRF from a kernel matrix through a four-stage algorithm that exploits spatial structure to determine which precision matrix entries can be safely set to zero.
1. Reverse Maximin Ordering
The algorithm begins by computing a reverse maximin ordering of the input points. This ordering selects points greedily to maximize the distance to previously selected points, creating a natural hierarchical structure where similar points tend to appear consecutively. The result is a permutation vector P that reorders the points for optimal sparsity.
2. Sparsity Pattern Construction
Based on the ordering and spatial locations, the algorithm determines a sparsity pattern using a neighborhood radius ρ × ℓᵢ, where ℓᵢ is the maximin distance for point i. This radius defines which matrix entries will be nonzero. Larger ρ values create denser (and more accurate) approximations at the cost of increased computational expense.
3. Supernodal Clustering (Optional, Default)
By default, the algorithm uses supernodal clustering to group columns with similar sparsity patterns into "supernodes". Rather than solving many small linear systems (one per column), the algorithm solves fewer but larger systems (one per supernode). This tends to be much more efficient since larger matrix operations can take advantage of optimized BLAS routines.
The clustering parameter λ (typically 1.5) controls the similarity threshold: columns are grouped together if their maximin distances differ by less than this factor.
4. Cholesky Factorization
The algorithm fills in the sparse Cholesky factor L such that L * L' ≈ (PKP')⁻¹, where K is the covariance matrix. For each supernode (or individual column when not using supernodes), the algorithm extracts the relevant submatrix of the covariance, solves a small dense Cholesky problem, and fills in the corresponding entries of L. The result is a GMRF with sparse precision matrix Q ≈ K⁻¹ that enables efficient inference.
Algorithm Parameters
The algorithm has two main tuning parameters:
The sparsity radius parameter ρ (default: 2.0) controls the neighborhood size for determining sparsity. Larger values create denser approximations with better accuracy but higher computational cost. Typical values range from 1.5 to 3.0, with 2.0-2.5 being a good balanced choice for most applications.
The supernodal clustering threshold λ (default: 1.5) controls how columns are grouped into supernodes. Setting it to nothing disables supernodal clustering entirely, which can be useful for very small problems or debugging. Higher values create larger supernodes, which can improve performance if the problem structure is favorable.
Main Functions
GaussianMarkovRandomFields.approximate_gmrf_kl Function
approximate_gmrf_kl(kernel_mat::AbstractMatrix, X::AbstractMatrix; ρ = 2.0, λ = 1.5, alg = LinearSolve.CHOLMODFactorization())Construct a sparse GMRF approximation from a kernel (covariance) matrix.
Given a kernel matrix K (covariance matrix) obtained by evaluating a kernel function at input points X, construct a sparse Gaussian Markov Random Field (GMRF) that approximates the corresponding Gaussian process.
Arguments
kernel_mat::AbstractMatrix: Kernel (covariance) matrixX::AbstractMatrix: Input point locations (d × n matrix where d is spatial dimension, n is number of points)
Keyword Arguments
ρ::Real = 2.0: Sparsity pattern radius parameter. Larger values create denser, more accurate approximations. Typical range: [1.5, 3.0].λ::Union{Real,Nothing} = 1.5: Supernodal clustering parameter. Ifnothing, uses standard column-by-column factorization. If a number, uses supernodal clustering (typically 1.5). Supernodal factorization is usually faster due to improved cache efficiency.alg = LinearSolve.CHOLMODFactorization(): LinearSolve algorithm for the resulting GMRF
Returns
::GMRF: A sparse GMRF with zero mean and sparse precision matrix approximatingK⁻¹
Examples
using KernelFunctions, GaussianMarkovRandomFields
# Create spatial grid
X = hcat([[x, y] for x in 0:0.1:1, y in 0:0.1:1]...)
# Define kernel and compute kernel matrix
kernel = with_lengthscale(Matern32Kernel(), 0.2)
K = kernelmatrix(kernel, X, obsdim=2)
# Create sparse GMRF approximation (uses supernodal by default)
gmrf = approximate_gmrf_kl(K, X; ρ=2.0)
# Use non-supernodal version
gmrf_nonsupernodal = approximate_gmrf_kl(K, X; ρ=2.0, λ=nothing)
# Use for inference
posterior = linear_condition(gmrf; A=A, Q_ϵ=Q_ϵ, y=y)See also
sparse_approximate_cholesky, GMRF, linear_condition
GaussianMarkovRandomFields.sparse_approximate_cholesky Function
sparse_approximate_cholesky(Θ::AbstractMatrix, X::AbstractMatrix; ρ = 2.0, λ = 1.5)Compute sparse approximate Cholesky factorization with automatic sparsity pattern.
Given a covariance matrix Θ and spatial input points X, compute a sparse approximate Cholesky factor L and permutation P such that L * L' ≈ (Θ[P, P])⁻¹.
Arguments
Θ::AbstractMatrix: Covariance matrix to be inverted approximatelyX::AbstractMatrix: Input point locations (d × n matrix where d is spatial dimension, n is number of points)
Keyword Arguments
ρ::Real = 2.0: Sparsity pattern radius parameter. Controls the neighborhood size used to determine sparsity. Larger values create denser (more accurate) approximations. Typical values are in the range [1.5, 3.0].λ::Union{Real,Nothing} = 1.5: Supernodal clustering parameter. Ifnothing, uses standard column-by-column factorization. If a number, uses supernodal clustering with the given threshold (typically 1.5). Supernodal factorization groups nearby columns together for improved cache efficiency and is usually faster.
Returns
L::SparseMatrixCSC: Lower-triangular sparse Cholesky factor in permuted coordinatesP::Vector{Int}: Permutation vector from the reverse maximin ordering
Details
The permutation P reorders the points to achieve better sparsity. The returned factor L satisfies L * L' ≈ (Θ[P, P])⁻¹, or equivalently (L * L')[invperm(P), invperm(P)] ≈ Θ⁻¹.
When λ !== nothing, the algorithm uses supernodal clustering to group columns with similar sparsity patterns, solving larger dense systems less frequently rather than many small systems. This typically improves performance through better cache locality.
See also
sparse_approximate_cholesky!, approximate_gmrf_kl, reverse_maximin_ordering
GaussianMarkovRandomFields.sparse_approximate_cholesky! Function
sparse_approximate_cholesky!(Θ::AbstractMatrix, L::SparseMatrixCSC)In-place computation of sparse approximate Cholesky factorization.
Fill the nonzero values of the lower-triangular sparse matrix L such that L * L' ≈ Θ⁻¹, where Θ is a covariance matrix and L * L' is the approximate precision (inverse covariance) matrix.
This function uses the sparsity pattern already defined in L and fills in the values using local Cholesky decompositions at each step. The approximation quality depends on the sparsity pattern of L.
Arguments
Θ::AbstractMatrix: Covariance matrix to be inverted approximatelyL::SparseMatrixCSC: Sparse lower-triangular matrix with predefined sparsity pattern. Values will be overwritten.
Details
The algorithm proceeds column-by-column through L, solving local linear systems using dense Cholesky factorizations on small submatrices of Θ. A small regularization term (1e-6 * I) is added for numerical stability.
See also
sparse_approximate_cholesky, approximate_gmrf_kl
Supporting Functions
Point Ordering
GaussianMarkovRandomFields.reverse_maximin_ordering Function
reverse_maximin_ordering(X::AbstractMatrix; point_tree = KDTree(X))Compute a reverse maximin ordering of spatial points.
The reverse maximin ordering selects points greedily to maximize the distance to previously selected points. This creates a hierarchical structure where similar points appear consecutively in the ordering, which is beneficial for sparse Cholesky factorization.
Arguments
X::AbstractMatrix: Input point locations (d × n matrix where d is spatial dimension)
Keyword Arguments
point_tree = KDTree(X): Pre-computed KDTree for efficient nearest neighbor queries
Returns
P::Vector{Int}: Permutation vector (ordering of points)ℓ::Vector{Float64}: Maximin distances for each point
Details
The algorithm maintains a priority queue of unselected points, ordered by their distance to the nearest selected point. At each step, it selects the point furthest from all previously selected points.
See also
reverse_maximin_ordering_and_sparsity_pattern, sparse_approximate_cholesky
GaussianMarkovRandomFields.reverse_maximin_ordering_and_sparsity_pattern Function
reverse_maximin_ordering_and_sparsity_pattern(X::AbstractMatrix, ρ::Real; lower = true)Compute reverse maximin ordering and determine the sparsity pattern for sparse Cholesky factorization.
This function combines the reverse maximin ordering with sparsity pattern construction based on spatial neighborhoods. The resulting sparsity pattern determines which entries of the Cholesky factor will be nonzero.
Arguments
X::AbstractMatrix: Input point locations (d × n matrix where d is spatial dimension)ρ::Real: Neighborhood radius multiplier for sparsity pattern construction
Keyword Arguments
lower::Bool = true: If true, return lower triangular pattern; otherwise upper triangular
Returns
S::SparseMatrixCSC: Sparsity pattern matrix (all nonzeros are 0.0, structure only)P::Vector{Int}: Permutation vector from reverse maximin orderingℓ::Vector{Float64}: Maximin distances for each point
Details
For each point i, the algorithm includes entries in the sparsity pattern for all neighbors within distance ρ × ℓᵢ, where ℓᵢ is the maximin distance for point i. Larger ρ values create denser (and more accurate) approximations.
See also
reverse_maximin_ordering, sparse_approximate_cholesky
Supernodal Clustering
GaussianMarkovRandomFields.SupernodeClustering Type
SupernodeClusteringRepresents a clustering of matrix columns into supernodes for sparse Cholesky factorization.
Fields
column_indices::Vector{Vector{Int}}: For each supernode, the column indices belonging to itrow_indices::Vector{SortedSet{Int}}: For each supernode, the row indices in its sparsity pattern
Supernodes group columns with similar sparsity patterns to enable more efficient factorization through larger dense linear algebra operations.
sourceGaussianMarkovRandomFields.form_supernodes Function
form_supernodes(S::SparseMatrixCSC, P, ℓ; λ = 1.5)Cluster columns of a sparse matrix into supernodes based on maximin distances.
Groups columns with similar sparsity patterns and nearby maximin distances into "supernodes" for more efficient sparse Cholesky factorization.
Arguments
S::SparseMatrixCSC: Sparse sparsity pattern matrixP::Vector{Int}: Permutation vector from reverse maximin orderingℓ::Vector{Float64}: Maximin distances for each point
Keyword Arguments
λ::Real = 1.5: Clustering threshold. Columns are grouped if their maximin distances differ by less than this factor. Larger values create larger supernodes.
Returns
SupernodeClustering: The supernodal clustering structure
Details
The algorithm processes columns sequentially in the permuted order. For each unassigned column, it creates a new supernode and assigns all unassigned child columns (in the sparsity pattern) whose maximin distances satisfy ℓ[i] <= λ * ℓ[j].
See also
SupernodeClustering, sparse_approximate_cholesky
The form_supernodes function takes the sparsity pattern and maximin ordering and creates a supernodal clustering. This is used internally when λ !== nothing.
Matrix Utilities
GaussianMarkovRandomFields.PermutedMatrix Type
PermutedMatrix{T, M <: AbstractMatrix{T}} <: AbstractMatrix{T}A memory-efficient wrapper that represents a symmetrically permuted matrix without materializing it. For a matrix A and permutation vector p, PermutedMatrix(A, p) represents A[p, p] without forming it explicitly.
This is particularly useful for accessing elements of permuted matrices in sparse matrix algorithms without the memory overhead of creating a full permuted copy.
Type Parameters
T: Element type of the matrixM: Type of the underlying matrix (must be <: AbstractMatrix{T})
Fields
matrix::M: The underlying matrixperm::Vector{Int}: Permutation vector (must have length equal to matrix dimensions)
Examples
A = [1 2 3; 4 5 6; 7 8 9]
p = [3, 1, 2]
PA = PermutedMatrix(A, p)
# Access individual elements (applies permutation to both indices)
PA[1, 1] == A[3, 3] # true
PA[1, 2] == A[3, 1] # true
# Access blocks
PA[1:2, 1:2] == A[[3, 1], [3, 1]] # true
# Convert to explicit matrix
to_matrix(PA) == A[p, p] # trueThe PermutedMatrix wrapper enables efficient access to permuted covariance matrices without materializing the full permuted matrix in memory.
Performance Considerations
When to Use Supernodal Factorization
The default supernodal factorization (λ=1.5) is recommended for most use cases, as it's typically faster and more accurate. You should only consider disabling it (λ=nothing) for very small problems (fewer than 100 points), extremely memory-constrained environments, or when debugging or comparing against a reference implementation.
See Also
KL-minimizing Sparse GMRF Approximations to Gaussian Processes tutorial demonstrates the full workflow with examples
[3] for the full paper describing these algorithms in detail