Learning GMRFs from data with the graphical lasso
Introduction
In many applications, we observe data that we believe has been generated by a process with sparse conditional independence structure, but we do not know the exact precision matrix. The graphical lasso lets us learn a GMRF directly from sample data by estimating a sparse precision matrix.
The graphical lasso solves the following optimization problem:
where
The implementation follows [4], combining soft-thresholding of the sample covariance with maximum-determinant positive-definite matrix completion via chordal graph techniques.
Synthetic example
Let's generate data from a known sparse precision matrix and see if we can recover it.
using GaussianMarkovRandomFields
using LinearAlgebra
using SparseArrays
using Random
rng = Random.MersenneTwister(42)Random.MersenneTwister(42)We construct a diagonally dominant sparse precision matrix.
n = 200
A = sprand(rng, n, n, 0.02)
A = A + A'
for i in 1:n
A[i, i] = sum(abs, A[:, i]) + 1.0
end
M = Hermitian(A, :L)
A[1:50, 1:50]50×50 SparseArrays.SparseMatrixCSC{Float64, Int64} with 144 stored entries:
⎡⠑⢄⠀⠀⡀⠀⠀⠀⠀⠀⠂⠀⠀⠐⠀⠀⠀⠄⠠⠘⠀⠂⠀⠀⠀⎤
⎢⠀⠀⠑⢄⠀⠀⠠⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠠⠀⠀⠀⎥
⎢⠀⠈⠀⠀⠑⢄⠀⠘⢀⠀⠀⠢⠀⠀⠀⠀⠀⡀⠀⠀⠐⠀⡄⠀⠀⎥
⎢⠀⠀⠀⠂⣀⠀⠑⢄⢀⠆⠀⠀⠈⡁⠀⠀⢐⠀⠀⠀⠀⠠⠀⠁⠀⎥
⎢⠀⠀⠀⠀⠀⠐⠠⠔⠑⢄⠀⠀⠀⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠈⠀⠀⠀⠠⡀⠀⠀⠀⠀⢑⢔⠀⠀⡀⠀⠀⠀⢀⠀⠀⠀⠀⠀⠀⎥
⎢⢀⠀⠂⠀⠀⠀⠆⠠⠤⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠈⠀⡐⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠕⢅⠀⡀⢀⡀⠀⠀⠀⠀⠀⎥
⎢⠀⠄⠀⠀⠀⠠⠐⠐⠀⠀⠀⠀⠀⠀⠀⠠⠑⢄⠀⠀⠂⠀⠀⠐⠀⎥
⎢⣀⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⠀⠀⠀⠰⠀⠀⠑⢄⠀⠀⠀⠀⠀⎥
⎢⠠⠀⠀⡀⠐⠀⠀⡀⠀⠀⠀⠀⠂⠀⠀⠀⠈⠀⠀⠀⠑⢄⠁⠀⠠⎥
⎢⠀⠀⠀⠀⠀⠉⠄⠀⠀⠀⠀⠀⠐⠈⠀⠀⢀⠀⠀⠀⠁⠀⠑⢄⠄⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂⠀⠁⠑⎦Now we draw samples from the corresponding Gaussian distribution.
m = 2000
X = copy(transpose(cholesky(M).L \ randn(rng, n, m)))2000×200 Matrix{Float64}:
1.2188 0.433293 0.0872692 … 0.16227 -0.27477 0.0269097
-0.23143 -0.26113 0.49399 1.37409 0.615275 -0.715074
0.195675 0.834869 0.616594 0.873725 0.420221 -1.17794
0.748327 0.108176 6.14118e-5 -1.11389 0.844318 -0.537801
0.244221 -0.328863 -1.52851 0.428738 -0.156548 -0.214896
-0.563765 0.714826 0.672041 … -0.213614 0.218066 0.293428
-0.42586 -0.14288 -0.677766 -0.0879707 -0.451334 0.742756
0.150954 0.28194 -0.0537366 -0.272615 0.657773 0.571332
-0.229917 -0.587199 -0.0851724 -1.08565 0.195235 -0.739545
-0.0103992 0.716158 0.479093 -0.0407661 -0.556104 0.0179981
⋮ ⋱
0.0566804 -0.156652 -0.277375 -0.11705 -0.60389 0.372713
-0.189261 -0.190188 -0.0844896 0.133053 -0.182426 0.403681
0.0522705 -0.473272 -1.28802 -0.177731 0.071729 0.699619
0.567949 0.620832 -0.581934 0.224797 0.379279 -0.454019
-0.0505654 -0.601106 -0.115571 … -0.929847 -0.296611 0.105872
-0.175869 -1.07099 0.596626 -0.341803 0.234606 0.313964
1.01558 0.394615 -0.849781 0.836315 -0.132832 -0.634821
-0.028868 0.223255 0.319741 0.478567 0.81137 -0.499558
0.273038 0.0938683 -0.428094 1.50992 0.340255 -0.666665Estimating the precision matrix
We apply graphical_lasso with a scalar threshold
lambda = 0.03
gmrf = graphical_lasso(X, lambda)GMRF{Float64} with 200 variables
Algorithm: LinearSolve.DefaultLinearSolver
Mean: [0.014636114770724858, -0.017547967132351957, -0.012580255011453947, ..., -0.011750710929450906, -0.0010391584050512264, -0.014889413925887158]
Q_sqrt: not availableThe result is a GMRF whose precision matrix is a sparse estimate of the true precision.
P_est = precision_matrix(gmrf)
P_est[1:50, 1:50]50×50 SparseArrays.SparseMatrixCSC{Float64, Int64} with 170 stored entries:
⎡⠛⢄⠀⠀⠀⢀⠈⠈⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⢀⠀⠀⢱⣶⢰⠀⠠⡆⠀⠀⡂⠀⠀⠀⠀⠀⠀⠀⣒⠀⠀⠀⢐⎥
⎢⡂⠀⠀⠀⠐⠒⠱⢆⠘⠀⠀⠀⠂⠀⠀⠀⠀⠀⠀⠀⠒⠀⠂⠀⠐⎥
⎢⠀⠀⠀⠀⠠⠦⠒⠀⠑⢄⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠅⠀⠀⠀⠀⠀⠀⠀⡀⠂⠀⠀⠀⎥
⎢⠁⠀⠀⠀⠈⠈⠈⠀⡀⠀⠁⠁⠑⢄⡠⠀⠀⠀⠀⠁⠩⠀⡀⠀⠈⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠊⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡕⢍⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⠀⠀⠀⠀⠀⢑⢔⡢⡂⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠘⠘⠘⠀⠀⠀⠠⠈⠃⠂⠀⠀⠀⠀⠨⠪⢛⢔⠀⠀⠘⎥
⎢⠀⠀⠀⠀⠀⠀⠈⠀⠁⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⢈⎥
⎣⠀⠀⠀⠀⠐⠐⠐⠀⠀⠀⠀⠀⠂⠀⠀⠀⠀⠀⠀⠀⠒⠀⠂⠐⠛⎦Restricted graphical lasso
If we have prior knowledge about the sparsity pattern (e.g., from a graph structure), we can pass a sparse matrix as the threshold. The algorithm then only searches for precision matrices within that sparsity pattern, with per-entry penalty values.
Lambda = copy(A)
Lambda.nzval .= lambda
gmrf_restricted = graphical_lasso(X, Lambda)
P_restricted = precision_matrix(gmrf_restricted)
P_restricted[1:50, 1:50]50×50 SparseArrays.SparseMatrixCSC{Float64, Int64} with 50 stored entries:
⎡⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⎦This page was generated using Literate.jl.