Formula Interface
The formula interface lets you compose models succinctly using StatsModels.jl syntax. It maps terms on the right-hand side to latent components and assembles a sparse design matrix that connects observations to the combined latent field. To use the formula syntax, load StatsModels first to activate the corresponding package extension.
Quick Example
using GaussianMarkovRandomFields, StatsModels, Distributions, SparseArrays
# Suppose W is a spatial adjacency for regions; y are counts with an offset
besag = Besag(W) # structured spatial effect
iid = IID() # unstructured random effect
f = @formula(y ~ 1 + x + iid(region) + besag(region))
comp = build_formula_components(f, data; family = Poisson)
lik = comp.obs_model(data.y; offset = data.logE)
prior = comp.combined_model(; τ_besag=1.0, τ_iid=1.0)
post = gaussian_approximation(prior, lik)Specifying Constraints
Random effect terms support constraint specifications for identifiability and other modeling requirements. This is essential for models like BYM where both spatial and IID effects need sum-to-zero constraints.
Basic Usage
Create functor instances with constraints before using them in formulas:
using GaussianMarkovRandomFields, StatsModels
# Data
data = (
y = randn(20),
hospital = repeat(["A", "B", "C", "D"], 5),
time = collect(1:20)
)
# Create constrained functors
iid_sz = IID(constraint=:sumtozero)
ar1_sz = AR1(constraint=:sumtozero)
rw1 = RandomWalk() # Built-in sum-to-zero
# Use in formula
comp = build_formula_components(
@formula(y ~ 0 + iid_sz(hospital) + ar1_sz(time)),
data;
family = Normal
)
# Result is a ConstrainedGMRF
gmrf = comp.combined_model(τ_iid=1.0, τ_ar1=1.0, ρ_ar1=0.7)BYM Model with Constraints
The BYM (Besag-York-Mollié) model uses sum-to-zero constraints for identifiability:
# Spatial adjacency matrix
W = sparse([...])
# Classic BYM: Both components need sum-to-zero constraints
besag = Besag(W) # Spatial (built-in constraint)
iid_sz = IID(constraint=:sumtozero) # Unstructured (explicit constraint)
comp = build_formula_components(
@formula(y ~ 1 + besag(region) + iid_sz(region)),
data;
family = Poisson
)BYM2 Model (Recommended)
The BYM2 model offers improved parameterization with a single precision τ and mixing parameter φ:
# BYM2 without intercept (no constraint needed)
bym2 = BYM2(W)
comp = build_formula_components(
@formula(y ~ 0 + x + bym2(region)),
data;
family = Poisson
)
gmrf = comp.combined_model(τ_bym2=1.0, φ_bym2=0.5)
# BYM2 with intercept (constrain IID component for identifiability)
bym2_constrained = BYM2(W; iid_constraint=:sumtozero)
comp = build_formula_components(
@formula(y ~ 1 + x + bym2_constrained(region)),
data;
family = Poisson
)
gmrf = comp.combined_model(τ_bym2=1.0, φ_bym2=0.5)Custom Constraints
Specify custom linear constraints as (A, e) tuples where Ax = e:
# Constraint: first two groups have equal effects (x₁ = x₂)
A = [1.0 -1.0 0.0 0.0] # x₁ - x₂ = 0
e = [0.0]
iid_custom = IID(constraint=(A, e))
comp = build_formula_components(@formula(y ~ 0 + iid_custom(group)), data; family=Normal)Constraint Support by Term
IID(constraint=...): Optionalconstraintparameternothing(default): unconstrained:sumtozero: sum-to-zero constraint(A, e): custom constraint matrix and vector
AR1(constraint=...): Same as IIDRandomWalk(additional_constraints=...):Always has built-in sum-to-zero
Optional
additional_constraintsfor extra constraints beyond sum-to-zero
Matern(constraint=...): Same as IIDBesag(...): Always has built-in sum-to-zero constraints per connected componentBYM2(W; iid_constraint=..., additional_constraints=...):Spatial component: always has built-in sum-to-zero per connected component
IID component: optional
iid_constraintparameter (default: unconstrained)- Use
iid_constraint=:sumtozerowhen including a fixed intercept for identifiability
- Use
Optional
additional_constraintsfor spatial component (beyond sum-to-zero)
Unconstrained vs. Constrained
# Unconstrained
iid = IID()
comp1 = build_formula_components(@formula(y ~ 0 + iid(group)), data; family=Normal)
gmrf1 = comp1.combined_model(τ_iid=1.0) # Standard GMRF
# Constrained
iid_sz = IID(constraint=:sumtozero)
comp2 = build_formula_components(@formula(y ~ 0 + iid_sz(group)), data; family=Normal)
gmrf2 = comp2.combined_model(τ_iid=1.0) # ConstrainedGMRFSpatial SPDE Models (Matérn)
For continuous spatial random effects based on the Matérn SPDE, use the Matern functor. It automatically handles mesh generation, FEM discretization, and evaluation matrix construction — all the boilerplate that the manual MaternModel + evaluation_matrix + LinearlyTransformedObservationModel workflow requires.
Basic Usage
using GaussianMarkovRandomFields, StatsModels, Distributions
# Coordinate columns in your DataFrame
data = (
y = randn(100),
x_coord = randn(100),
y_coord = randn(100),
)
# Auto-mesh from coordinates (recommended for most cases)
matern = Matern(smoothness=1)
comp = build_formula_components(
@formula(y ~ 1 + matern(x_coord, y_coord)),
data;
family = Normal
)
# Instantiate the GMRF — τ controls field precision, range controls correlation distance
gmrf = comp.combined_model(τ_matern=1.0, range_matern=2.0)Pre-built Discretization
When you need full control over the mesh (e.g., custom domain, element order), pass a FEMDiscretization directly:
using Ferrite
# Build your own mesh and discretization
disc = FEMDiscretization(grid, Lagrange{RefTriangle, 1}(), QuadratureRule{RefTriangle}(2))
matern = Matern(disc; smoothness=2)
comp = build_formula_components(
@formula(y ~ 1 + matern(x_coord, y_coord)),
data;
family = Normal
)Configuration Options
smoothness: Matérn smoothness parameter (integer ≥ 0, default: 1)element_order: FEM element order for auto-mesh (default: 1)alg: LinearSolve algorithm (default:CHOLMODFactorization())constraint: Optional constraint (e.g.,:sumtozero,(A, e))
Hyperparameters
The Matérn formula term exposes two hyperparameters:
τ_matern: Field precision (τ > 0) controlling marginal variance (~1/τ)range_matern: Range parameter (range > 0) controlling spatial correlation distance
When to Use Manual vs Formula
Use formula (Matern) | Use manual (MaternModel) |
|---|---|
| INLA-style inference pipelines | Custom meshes with boundary conditions |
| Quick spatial modeling | Prediction at new locations via VTK |
| Combined with other formula terms | Need direct access to discretization |
For the manual workflow, see the Spatial Modelling with SPDEs tutorial.
Separable Models (Kronecker Products)
For multi-dimensional processes with separable structure, use the Separable functor to compose multiple random effects via Kronecker products. This is particularly useful for space-time models where the precision matrix is Q = Q_time ⊗ Q_space with space varying fastest.
Basic Space-Time Example
using GaussianMarkovRandomFields, StatsModels, SparseArrays
# Create data with space-time structure
n_time, n_space = 10, 5
data = (
y = randn(n_time * n_space),
time = repeat(1:n_time, outer=n_space),
region = repeat(1:n_space, inner=n_time),
)
# Spatial adjacency (e.g., a chain or grid)
W = spzeros(n_space, n_space)
for i in 1:(n_space-1)
W[i, i+1] = 1
W[i+1, i] = 1
end
# Create separable model: Q = Q_time ⊗ Q_space
# RW1 for temporal smoothing, Besag for spatial smoothing
rw1 = RandomWalk() # Temporal: RW1 with built-in sum-to-zero
besag = Besag(W) # Spatial: Besag (intrinsic CAR)
st = Separable(rw1, besag) # Separable: Kronecker product
# Build formula
comp = build_formula_components(
@formula(y ~ 1 + st(time, region)),
data;
family = Normal
)
# Instantiate GMRF
gmrf = comp.combined_model(τ_rw1_separable=1.0, τ_besag_separable=2.0)The resulting precision matrix has a block-tridiagonal structure:
Q = Q_time ⊗ Q_space = [
Q_space -Q_space 0 0 ...
-Q_space 2Q_space -Q_space 0 ...
0 -Q_space 2Q_space -Q_space ...
...
]Constraint Composition
Constraints from each component are automatically composed and redundant constraints removed:
# Both RW1 (sum-to-zero in time) and Besag (sum-to-zero in space)
# produce constraints that are automatically handled
rw1 = RandomWalk()
besag = Besag(W)
st = Separable(rw1, besag)
comp = build_formula_components(@formula(y ~ 0 + st(time, region)), data; family=Normal)
# The resulting GMRF handles both constraints correctly with redundancy removal
gmrf = comp.combined_model(τ_rw1_separable=1.0, τ_besag_separable=1.0)N-way Separable Models
Extend beyond 2D with three or more components:
# 3D space-time-group model: Q = Q_time ⊗ Q_space ⊗ Q_group
n_group = 3
data_3d = (
y = randn(n_time * n_space * n_group),
time = repeat(repeat(1:n_time, outer=n_space), outer=n_group),
region = repeat(repeat(1:n_space, inner=n_time), outer=n_group),
group = repeat(1:n_group, inner=n_time*n_space),
)
rw1 = RandomWalk()
besag = Besag(W)
iid_group = IID()
# Separable with 3 components
sep3 = Separable(rw1, besag, iid_group)
comp = build_formula_components(
@formula(y ~ 0 + sep3(time, region, group)),
data_3d;
family = Normal
)
gmrf = comp.combined_model(
τ_rw1_separable = 1.0,
τ_besag_separable = 1.0,
τ_iid_separable = 0.5
)Component Ordering
The order of components in Separable determines the Kronecker order:
Separable(rw1, besag)→ Q = Q_rw1 ⊗ Q_besag (besag varies fastest)Separable(besag, rw1)→ Q = Q_besag ⊗ Q_rw1 (rw1 varies fastest)
This affects the vectorization pattern but not which observations map to which latent states (the indicator matrix handles that mapping appropriately).
Hyperparameter Naming
Separable models use prefixed hyperparameter names for clarity:
# For Separable(rw1, besag):
# - τ_rw1_separable: precision for temporal component
# - τ_besag_separable: precision for spatial component
gmrf = comp.combined_model(τ_rw1_separable=1.0, τ_besag_separable=2.0)When there are multiple components of the same type, indices are added:
rw1_a = RandomWalk()
rw1_b = RandomWalk()
sep = Separable(rw1_a, rw1_b)
# Hyperparameters are:
# - τ_rw1_separable: first RW1
# - τ_rw1_2_separable: second RW1
gmrf = comp.combined_model(τ_rw1_separable=1.0, τ_rw1_2_separable=1.5)Prediction at New Data
After fitting a model, use predict_cols to build a projection matrix onto new data points. Unlike modelcols (which builds both model and projection from scratch), predict_cols reuses the trained model — critical for Matérn terms where you want to project through the existing mesh rather than generating a new one.
# Fit model
matern = Matern(smoothness=1)
f = @formula(y ~ 1 + matern(x_coord, y_coord))
comp = build_formula_components(f, train_data; family = Normal)
# Get the trained model and its formula term
model = comp.combined_model.matern # via dot syntax
# Build prediction matrix at new locations
# (term is obtained from schema application — see predict_cols docstring)
A_pred = predict_cols(term, model, test_data)For non-spatial terms (IID, RW, AR1, Besag), predict_cols builds an indicator matrix similar to modelcols, but always sized to match the trained model dimension. This means prediction data with a subset of the training levels still produces a correctly-shaped projection matrix. An error is thrown if the prediction data contains levels outside the training range.
Terms and Builders
GaussianMarkovRandomFields.IID Type
IID(; constraint = nothing)Formula functor for IID (independent and identically distributed) random effects.
Arguments
constraint: Optional constraint specification. Can be:nothing(default): No constraints:sumtozero: Sum-to-zero constraint for identifiability(A, e): Custom linear constraint matrix and vector whereAx = e
Usage
# Unconstrained IID effect
iid = IID()
@formula(y ~ 0 + iid(group))
# With sum-to-zero constraint
iid_sz = IID(constraint=:sumtozero)
@formula(y ~ 0 + iid_sz(group))
# With custom constraint
A = [1.0 1.0 0.0] # x1 + x2 = 0
e = [0.0]
iid_custom = IID(constraint=(A, e))
@formula(y ~ 0 + iid_custom(group))Notes
You must create an IID instance before using it in a formula
Calling the functor directly is unsupported outside formula parsing
GaussianMarkovRandomFields.RandomWalk Type
RandomWalk(order=1; additional_constraints = nothing)Formula functor for RandomWalk random effects of any order.
Arguments
order: Order of the random walk (default: 1). Order 1 gives RW1 (tridiagonal precision), order 2 gives RW2 (pentadiagonal precision), and higher orders are supported.additional_constraints: Optional additional constraints beyond the built-in null space constraints. Can be:nothing(default): Only the built-in null space constraints(A, e): Custom additional linear constraint matrix and vector whereAx = e
Usage
# RW1 (order can be omitted, defaults to 1)
rw1 = RandomWalk()
@formula(y ~ 0 + rw1(time))
# RW2 for smoother trends
rw2 = RandomWalk(2)
@formula(y ~ 0 + rw2(time))
# Used in separable models
rw1 = RandomWalk()
besag = Besag(W)
st = Separable(rw1, besag)
@formula(y ~ 1 + st(time, region))Notes
Random walk models are intrinsic GMRFs with k null space constraints (k = order)
Use
additional_constraintsto specify constraints beyond the built-in onesWhen used in Separable models, order is specified in constructor, not in formula
You must create a RandomWalk instance before using it in a formula
Calling the functor directly is unsupported outside formula parsing
GaussianMarkovRandomFields.AR1 Type
AR1(; constraint = nothing)Formula functor for AR1 (first-order autoregressive) random effects.
Arguments
constraint: Optional constraint specification. Can be:nothing(default): No constraints:sumtozero: Sum-to-zero constraint for identifiability(A, e): Custom linear constraint matrix and vector whereAx = e
Usage
# Unconstrained AR1 effect
ar1 = AR1()
@formula(y ~ 0 + ar1(time))
# With sum-to-zero constraint
ar1_sz = AR1(constraint=:sumtozero)
@formula(y ~ 0 + ar1_sz(time))
# With custom constraint
A = [1.0 1.0 zeros(8)...] # First two time points sum to zero
e = [0.0]
ar1_custom = AR1(constraint=(A, e))
@formula(y ~ 0 + ar1_custom(time))Notes
You must create an AR1 instance before using it in a formula
Calling the functor directly is unsupported outside formula parsing
GaussianMarkovRandomFields.AR Type
AR(order=1; constraint = nothing)Formula functor for autoregressive random effects of any order, using the PACF parameterization.
Arguments
order: Order of the AR process (default: 1). Order 1 usesρas the correlation parameter; order 2+ usespacf1, pacf2, ...(partial autocorrelations, each in (-1, 1)).constraint: Optional constraint specification (:sumtozero,(A, e), ornothing).
Usage
# AR(1) — equivalent to AR1()
ar1 = AR(1)
@formula(y ~ 0 + ar1(time))
# AR(2) for cyclical patterns
ar2 = AR(2)
@formula(y ~ 0 + ar2(time))Notes
The k-th PACF value is the correlation between x[t] and x[t-k] after removing the linear effect of the intervening lags. They are converted to AR coefficients internally via the Durbin-Levinson recursion.
Stationarity is guaranteed when all PACF values are in (-1, 1)
You must create an AR instance before using it in a formula
Calling the functor directly is unsupported outside formula parsing
GaussianMarkovRandomFields.Besag Type
Besag(W; id_to_node = nothing, normalize_var = true, singleton_policy = :gaussian)Formula functor for Besag (intrinsic CAR) random effects.
Usage:
Create a functor instance:
besag = Besag(W)With string/categorical region IDs:
besag = Besag(W; id_to_node = Dict("WesternIsles" => 11, ...))Use in a formula:
@formula(y ~ 0 + besag(region))
Notes
id_to_nodemaps arbitrary region identifiers to integer node indices (1-based) ofW.Calling the functor directly is unsupported outside formula parsing.
GaussianMarkovRandomFields.BYM2 Type
BYM2(W; id_to_node = nothing, normalize_var = true, singleton_policy = :gaussian, additional_constraints = nothing, iid_constraint = nothing)Formula functor for BYM2 (Besag-York-Mollié with improved parameterization) random effects.
The BYM2 model combines spatial (ICAR) and unstructured (IID) random effects with intuitive mixing and precision parameters. It is a reparameterization of the classic BYM model that facilitates prior specification (Riebler et al. 2016).
Arguments
W: Adjacency matrix for the spatial structureid_to_node: Optional mapping from region identifiers to integer node indices (1-based)normalize_var: Whether to normalize variance (default: true, required for BYM2)singleton_policy: How to handle isolated nodes (:gaussianor:degenerate)additional_constraints: Optional additional constraints on spatial component (beyond sum-to-zero)iid_constraint: Optional constraint on unstructured component (e.g.,:sumtozerofor identifiability with intercept)
Model Structure
The BYM2 model creates a 2n-dimensional latent field:
Components 1:n are spatial effects (variance-normalized ICAR)
Components (n+1):2n are unstructured effects (IID)
In the linear predictor: η[i] = ... + u_[i] + v_[i]
Hyperparameters
τ: Overall precision (τ > 0)φ: Mixing parameter (0 < φ < 1), proportion of unstructured varianceφ = 0: pure spatial model
φ = 1: pure unstructured model
φ = 0.5: equal spatial and unstructured variance
Identifiability
When including a fixed intercept, the model can be unidentifiable because the IID component can absorb constant shifts. Handle this by either:
Not including an intercept:
@formula(y ~ 0 + bym2(region))Constraining the IID component:
BYM2(W; iid_constraint=:sumtozero)
Usage
# Standard BYM2 without intercept (recommended)
W = adjacency_matrix
bym2 = BYM2(W)
@formula(y ~ 0 + bym2(region))
# BYM2 with intercept (constrain IID for identifiability)
bym2_constrained = BYM2(W; iid_constraint=:sumtozero)
@formula(y ~ 1 + bym2_constrained(region))
# With categorical region IDs
id_map = Dict("WesternIsles" => 11, "Highland" => 12, ...)
bym2_mapped = BYM2(W; id_to_node = id_map)
@formula(y ~ 0 + bym2_mapped(region))
# With custom singleton policy
bym2_deg = BYM2(W; singleton_policy = :degenerate)
@formula(y ~ 0 + bym2_deg(region))Notes
BYM2 always uses variance normalization (normalize_var = true)
Sum-to-zero constraint is automatically applied to the spatial component
You must create a BYM2 instance before using it in a formula
Calling the functor directly is unsupported outside formula parsing
References
Riebler, A., Sørbye, S. H., Simpson, D., & Rue, H. (2016). An intuitive Bayesian spatial model for disease mapping that accounts for scaling. Statistical Methods in Medical Research, 25(4), 1145-1165.
sourceGaussianMarkovRandomFields.Matern Type
Matern(; smoothness = 1, element_order = 1, alg = CHOLMODFactorization(), constraint = nothing)
Matern(discretization::FEMDiscretization; smoothness = 1, alg = CHOLMODFactorization(), constraint = nothing)Formula functor for Matérn SPDE spatial random effects.
Two Construction Modes
Automatic mesh: Pass only keyword arguments; the mesh is generated from coordinate columns at formula evaluation time.
Pre-built discretization: Pass a
FEMDiscretizationfor full control over the mesh.
Arguments
discretization: Optional pre-builtFEMDiscretizationsmoothness: Smoothness parameter (integer ≥ 0, default: 1)element_order: FEM element order for automatic mesh generation (default: 1)alg: LinearSolve algorithm (default:CHOLMODFactorization())constraint: Optional constraint specification (default: nothing)
Usage
# Automatic mesh from coordinate columns (recommended)
matern = Matern(smoothness=1)
@formula(y ~ 1 + matern(x_coord, y_coord))
# With pre-built discretization
disc = FEMDiscretization(grid, interp, quad)
matern = Matern(disc; smoothness=2)
@formula(y ~ 1 + matern(x_coord, y_coord))Notes
The formula term expects exactly two coordinate column arguments (2D spatial)
Coordinates are extracted from the DataFrame at formula evaluation time
The evaluation matrix mapping observations to FEM nodes is built automatically
You must create a Matern instance before using it in a formula
Calling the functor directly is unsupported outside formula parsing
GaussianMarkovRandomFields.Separable Type
Separable(components...)Formula functor for separable (Kronecker product) random effects.
Creates a multi-dimensional random effect where the precision matrix is a Kronecker product of the component precision matrices. Useful for space-time models, age-period-cohort models, and other separable multi-dimensional processes.
Arguments
components: Two or more formula term functors (IID, AR1, RandomWalk, Besag, etc.)
Usage
# Space-time separable model: Q = Q_time ⊗ Q_space
rw1 = RandomWalk()
besag = Besag(W)
st = Separable(rw1, besag)
@formula(y ~ 1 + st(time, region))
# Three-way separable model: Q = Q_time ⊗ Q_space ⊗ Q_age
rw_time = RandomWalk()
besag_space = Besag(W)
iid_age = IID()
model3d = Separable(rw_time, besag_space, iid_age)
@formula(y ~ 1 + model3d(time, region, age))Notes
Requires at least 2 components
Formula variables must match component order and count
Component ordering: Q = Q_1 ⊗ Q_2 ⊗ ... ⊗ Q_N (rightmost varies fastest)
Follows R-INLA convention: Q = Q_group ⊗ Q_main
Hyperparameters are named:
τ_{modelname}_separable, with indices for duplicatesConstraints from each component are automatically composed via Kronecker products
Redundant constraints are removed to ensure identifiability
GaussianMarkovRandomFields.build_formula_components Function
build_formula_components(formula, data; kwargs...)Placeholder function for the formula interface. A concrete method is provided by the GaussianMarkovRandomFieldsFormula extension when StatsModels is loaded.
GaussianMarkovRandomFields.predict_cols Function
predict_cols(term, component, data)Build a prediction projection matrix from a formula term and trained component onto new data. Unlike modelcols which builds both the model and projection from scratch, predict_cols reuses the existing trained model structure.
Concrete methods are provided by the GaussianMarkovRandomFieldsFormula extension.
See Also
- The BYM + fixed effects Poisson tutorial shows this in practice: Advanced GMRF modelling for disease mapping