Formula Interface
The formula interface lets you compose models succinctly using StatsModels.jl syntax. It maps terms on the right-hand side to latent components and assembles a sparse design matrix that connects observations to the combined latent field. To use the formula syntax, load StatsModels first to activate the corresponding package extension.
Quick Example
using GaussianMarkovRandomFields, StatsModels, Distributions, SparseArrays
# Suppose W is a spatial adjacency for regions; y are counts with an offset
besag = Besag(W) # structured spatial effect
iid = IID() # unstructured random effect
f = @formula(y ~ 1 + x + iid(region) + besag(region))
comp = build_formula_components(f, data; family = Poisson)
lik = comp.obs_model(data.y; offset = data.logE)
prior = comp.combined_model(; τ_besag=1.0, τ_iid=1.0)
post = gaussian_approximation(prior, lik)Specifying Constraints
Random effect terms support constraint specifications for identifiability and other modeling requirements. This is essential for models like BYM where both spatial and IID effects need sum-to-zero constraints.
Basic Usage
Create functor instances with constraints before using them in formulas:
using GaussianMarkovRandomFields, StatsModels
# Data
data = (
y = randn(20),
hospital = repeat(["A", "B", "C", "D"], 5),
time = collect(1:20)
)
# Create constrained functors
iid_sz = IID(constraint=:sumtozero)
ar1_sz = AR1(constraint=:sumtozero)
rw1 = RandomWalk() # Built-in sum-to-zero
# Use in formula
comp = build_formula_components(
@formula(y ~ 0 + iid_sz(hospital) + ar1_sz(time)),
data;
family = Normal
)
# Result is a ConstrainedGMRF
gmrf = comp.combined_model(τ_iid=1.0, τ_ar1=1.0, ρ_ar1=0.7)BYM Model with Constraints
The BYM (Besag-York-Mollié) model uses sum-to-zero constraints for identifiability:
# Spatial adjacency matrix
W = sparse([...])
# Classic BYM: Both components need sum-to-zero constraints
besag = Besag(W) # Spatial (built-in constraint)
iid_sz = IID(constraint=:sumtozero) # Unstructured (explicit constraint)
comp = build_formula_components(
@formula(y ~ 1 + besag(region) + iid_sz(region)),
data;
family = Poisson
)BYM2 Model (Recommended)
The BYM2 model offers improved parameterization with a single precision τ and mixing parameter φ:
# BYM2 without intercept (no constraint needed)
bym2 = BYM2(W)
comp = build_formula_components(
@formula(y ~ 0 + x + bym2(region)),
data;
family = Poisson
)
gmrf = comp.combined_model(τ_bym2=1.0, φ_bym2=0.5)
# BYM2 with intercept (constrain IID component for identifiability)
bym2_constrained = BYM2(W; iid_constraint=:sumtozero)
comp = build_formula_components(
@formula(y ~ 1 + x + bym2_constrained(region)),
data;
family = Poisson
)
gmrf = comp.combined_model(τ_bym2=1.0, φ_bym2=0.5)Custom Constraints
Specify custom linear constraints as (A, e) tuples where Ax = e:
# Constraint: first two groups have equal effects (x₁ = x₂)
A = [1.0 -1.0 0.0 0.0] # x₁ - x₂ = 0
e = [0.0]
iid_custom = IID(constraint=(A, e))
comp = build_formula_components(@formula(y ~ 0 + iid_custom(group)), data; family=Normal)Constraint Support by Term
IID(constraint=...): Optionalconstraintparameternothing(default): unconstrained:sumtozero: sum-to-zero constraint(A, e): custom constraint matrix and vector
AR1(constraint=...): Same as IIDRandomWalk(additional_constraints=...):Always has built-in sum-to-zero
Optional
additional_constraintsfor extra constraints beyond sum-to-zero
Besag(...): Always has built-in sum-to-zero constraints per connected componentBYM2(W; iid_constraint=..., additional_constraints=...):Spatial component: always has built-in sum-to-zero per connected component
IID component: optional
iid_constraintparameter (default: unconstrained)- Use
iid_constraint=:sumtozerowhen including a fixed intercept for identifiability
- Use
Optional
additional_constraintsfor spatial component (beyond sum-to-zero)
Unconstrained vs. Constrained
# Unconstrained
iid = IID()
comp1 = build_formula_components(@formula(y ~ 0 + iid(group)), data; family=Normal)
gmrf1 = comp1.combined_model(τ_iid=1.0) # Standard GMRF
# Constrained
iid_sz = IID(constraint=:sumtozero)
comp2 = build_formula_components(@formula(y ~ 0 + iid_sz(group)), data; family=Normal)
gmrf2 = comp2.combined_model(τ_iid=1.0) # ConstrainedGMRFSeparable Models (Kronecker Products)
For multi-dimensional processes with separable structure, use the Separable functor to compose multiple random effects via Kronecker products. This is particularly useful for space-time models where the precision matrix is Q = Q_time ⊗ Q_space with space varying fastest.
Basic Space-Time Example
using GaussianMarkovRandomFields, StatsModels, SparseArrays
# Create data with space-time structure
n_time, n_space = 10, 5
data = (
y = randn(n_time * n_space),
time = repeat(1:n_time, outer=n_space),
region = repeat(1:n_space, inner=n_time),
)
# Spatial adjacency (e.g., a chain or grid)
W = spzeros(n_space, n_space)
for i in 1:(n_space-1)
W[i, i+1] = 1
W[i+1, i] = 1
end
# Create separable model: Q = Q_time ⊗ Q_space
# RW1 for temporal smoothing, Besag for spatial smoothing
rw1 = RandomWalk() # Temporal: RW1 with built-in sum-to-zero
besag = Besag(W) # Spatial: Besag (intrinsic CAR)
st = Separable(rw1, besag) # Separable: Kronecker product
# Build formula
comp = build_formula_components(
@formula(y ~ 1 + st(time, region)),
data;
family = Normal
)
# Instantiate GMRF
gmrf = comp.combined_model(τ_rw1_separable=1.0, τ_besag_separable=2.0)The resulting precision matrix has a block-tridiagonal structure:
Q = Q_time ⊗ Q_space = [
Q_space -Q_space 0 0 ...
-Q_space 2Q_space -Q_space 0 ...
0 -Q_space 2Q_space -Q_space ...
...
]Constraint Composition
Constraints from each component are automatically composed and redundant constraints removed:
# Both RW1 (sum-to-zero in time) and Besag (sum-to-zero in space)
# produce constraints that are automatically handled
rw1 = RandomWalk()
besag = Besag(W)
st = Separable(rw1, besag)
comp = build_formula_components(@formula(y ~ 0 + st(time, region)), data; family=Normal)
# The resulting GMRF handles both constraints correctly with redundancy removal
gmrf = comp.combined_model(τ_rw1_separable=1.0, τ_besag_separable=1.0)N-way Separable Models
Extend beyond 2D with three or more components:
# 3D space-time-group model: Q = Q_time ⊗ Q_space ⊗ Q_group
n_group = 3
data_3d = (
y = randn(n_time * n_space * n_group),
time = repeat(repeat(1:n_time, outer=n_space), outer=n_group),
region = repeat(repeat(1:n_space, inner=n_time), outer=n_group),
group = repeat(1:n_group, inner=n_time*n_space),
)
rw1 = RandomWalk()
besag = Besag(W)
iid_group = IID()
# Separable with 3 components
sep3 = Separable(rw1, besag, iid_group)
comp = build_formula_components(
@formula(y ~ 0 + sep3(time, region, group)),
data_3d;
family = Normal
)
gmrf = comp.combined_model(
τ_rw1_separable = 1.0,
τ_besag_separable = 1.0,
τ_iid_separable = 0.5
)Component Ordering
The order of components in Separable determines the Kronecker order:
Separable(rw1, besag)→ Q = Q_rw1 ⊗ Q_besag (besag varies fastest)Separable(besag, rw1)→ Q = Q_besag ⊗ Q_rw1 (rw1 varies fastest)
This affects the vectorization pattern but not which observations map to which latent states (the indicator matrix handles that mapping appropriately).
Hyperparameter Naming
Separable models use prefixed hyperparameter names for clarity:
# For Separable(rw1, besag):
# - τ_rw1_separable: precision for temporal component
# - τ_besag_separable: precision for spatial component
gmrf = comp.combined_model(τ_rw1_separable=1.0, τ_besag_separable=2.0)When there are multiple components of the same type, indices are added:
rw1_a = RandomWalk()
rw1_b = RandomWalk()
sep = Separable(rw1_a, rw1_b)
# Hyperparameters are:
# - τ_rw1_separable: first RW1
# - τ_rw1_2_separable: second RW1
gmrf = comp.combined_model(τ_rw1_separable=1.0, τ_rw1_2_separable=1.5)Terms and Builders
GaussianMarkovRandomFields.IID Type
IID(; constraint = nothing)Formula functor for IID (independent and identically distributed) random effects.
Arguments
constraint: Optional constraint specification. Can be:nothing(default): No constraints:sumtozero: Sum-to-zero constraint for identifiability(A, e): Custom linear constraint matrix and vector whereAx = e
Usage
# Unconstrained IID effect
iid = IID()
@formula(y ~ 0 + iid(group))
# With sum-to-zero constraint
iid_sz = IID(constraint=:sumtozero)
@formula(y ~ 0 + iid_sz(group))
# With custom constraint
A = [1.0 1.0 0.0] # x1 + x2 = 0
e = [0.0]
iid_custom = IID(constraint=(A, e))
@formula(y ~ 0 + iid_custom(group))Notes
You must create an IID instance before using it in a formula
Calling the functor directly is unsupported outside formula parsing
GaussianMarkovRandomFields.RandomWalk Type
RandomWalk(order=1; additional_constraints = nothing)Formula functor for RandomWalk random effects.
Arguments
order: Order of the random walk (default: 1). Currently only order=1 is supported.additional_constraints: Optional additional constraints beyond the built-in sum-to-zero constraint. Can be:nothing(default): Only the built-in sum-to-zero constraint(A, e): Custom additional linear constraint matrix and vector whereAx = e
Usage
# RW1 with built-in sum-to-zero constraint (order can be omitted, defaults to 1)
rw1 = RandomWalk()
@formula(y ~ 0 + rw1(time))
# Explicitly specifying order=1
rw1 = RandomWalk(1)
@formula(y ~ 0 + rw1(time))
# Used in separable models
rw1 = RandomWalk()
besag = Besag(W)
st = Separable(rw1, besag)
@formula(y ~ 1 + st(time, region))
# RW1 with additional constraints
A = [1.0 0.0 1.0 zeros(7)...] # x1 + x3 = 0 (in addition to sum-to-zero)
e = [0.0]
rw1_extra = RandomWalk(1, additional_constraints=(A, e))
@formula(y ~ 0 + rw1_extra(time))Notes
RW1 always has a built-in sum-to-zero constraint for identifiability
Use
additional_constraintsto specify constraints beyond sum-to-zeroWhen used in Separable models, order is specified in constructor, not in formula
You must create a RandomWalk instance before using it in a formula
Calling the functor directly is unsupported outside formula parsing
GaussianMarkovRandomFields.AR1 Type
AR1(; constraint = nothing)Formula functor for AR1 (first-order autoregressive) random effects.
Arguments
constraint: Optional constraint specification. Can be:nothing(default): No constraints:sumtozero: Sum-to-zero constraint for identifiability(A, e): Custom linear constraint matrix and vector whereAx = e
Usage
# Unconstrained AR1 effect
ar1 = AR1()
@formula(y ~ 0 + ar1(time))
# With sum-to-zero constraint
ar1_sz = AR1(constraint=:sumtozero)
@formula(y ~ 0 + ar1_sz(time))
# With custom constraint
A = [1.0 1.0 zeros(8)...] # First two time points sum to zero
e = [0.0]
ar1_custom = AR1(constraint=(A, e))
@formula(y ~ 0 + ar1_custom(time))Notes
You must create an AR1 instance before using it in a formula
Calling the functor directly is unsupported outside formula parsing
GaussianMarkovRandomFields.Besag Type
Besag(W; id_to_node = nothing, normalize_var = true, singleton_policy = :gaussian)Formula functor for Besag (intrinsic CAR) random effects.
Usage:
Create a functor instance:
besag = Besag(W)With string/categorical region IDs:
besag = Besag(W; id_to_node = Dict("WesternIsles" => 11, ...))Use in a formula:
@formula(y ~ 0 + besag(region))
Notes
id_to_nodemaps arbitrary region identifiers to integer node indices (1-based) ofW.Calling the functor directly is unsupported outside formula parsing.
GaussianMarkovRandomFields.BYM2 Type
BYM2(W; id_to_node = nothing, normalize_var = true, singleton_policy = :gaussian, additional_constraints = nothing, iid_constraint = nothing)Formula functor for BYM2 (Besag-York-Mollié with improved parameterization) random effects.
The BYM2 model combines spatial (ICAR) and unstructured (IID) random effects with intuitive mixing and precision parameters. It is a reparameterization of the classic BYM model that facilitates prior specification (Riebler et al. 2016).
Arguments
W: Adjacency matrix for the spatial structureid_to_node: Optional mapping from region identifiers to integer node indices (1-based)normalize_var: Whether to normalize variance (default: true, required for BYM2)singleton_policy: How to handle isolated nodes (:gaussianor:degenerate)additional_constraints: Optional additional constraints on spatial component (beyond sum-to-zero)iid_constraint: Optional constraint on unstructured component (e.g.,:sumtozerofor identifiability with intercept)
Model Structure
The BYM2 model creates a 2n-dimensional latent field:
Components 1:n are spatial effects (variance-normalized ICAR)
Components (n+1):2n are unstructured effects (IID)
In the linear predictor: η[i] = ... + u_[i] + v_[i]
Hyperparameters
τ: Overall precision (τ > 0)φ: Mixing parameter (0 < φ < 1), proportion of unstructured varianceφ = 0: pure spatial model
φ = 1: pure unstructured model
φ = 0.5: equal spatial and unstructured variance
Identifiability
When including a fixed intercept, the model can be unidentifiable because the IID component can absorb constant shifts. Handle this by either:
Not including an intercept:
@formula(y ~ 0 + bym2(region))Constraining the IID component:
BYM2(W; iid_constraint=:sumtozero)
Usage
# Standard BYM2 without intercept (recommended)
W = adjacency_matrix
bym2 = BYM2(W)
@formula(y ~ 0 + bym2(region))
# BYM2 with intercept (constrain IID for identifiability)
bym2_constrained = BYM2(W; iid_constraint=:sumtozero)
@formula(y ~ 1 + bym2_constrained(region))
# With categorical region IDs
id_map = Dict("WesternIsles" => 11, "Highland" => 12, ...)
bym2_mapped = BYM2(W; id_to_node = id_map)
@formula(y ~ 0 + bym2_mapped(region))
# With custom singleton policy
bym2_deg = BYM2(W; singleton_policy = :degenerate)
@formula(y ~ 0 + bym2_deg(region))Notes
BYM2 always uses variance normalization (normalize_var = true)
Sum-to-zero constraint is automatically applied to the spatial component
You must create a BYM2 instance before using it in a formula
Calling the functor directly is unsupported outside formula parsing
References
Riebler, A., Sørbye, S. H., Simpson, D., & Rue, H. (2016). An intuitive Bayesian spatial model for disease mapping that accounts for scaling. Statistical Methods in Medical Research, 25(4), 1145-1165.
sourceGaussianMarkovRandomFields.Separable Type
Separable(components...)Formula functor for separable (Kronecker product) random effects.
Creates a multi-dimensional random effect where the precision matrix is a Kronecker product of the component precision matrices. Useful for space-time models, age-period-cohort models, and other separable multi-dimensional processes.
Arguments
components: Two or more formula term functors (IID, AR1, RandomWalk, Besag, etc.)
Usage
# Space-time separable model: Q = Q_time ⊗ Q_space
rw1 = RandomWalk()
besag = Besag(W)
st = Separable(rw1, besag)
@formula(y ~ 1 + st(time, region))
# Three-way separable model: Q = Q_time ⊗ Q_space ⊗ Q_age
rw_time = RandomWalk()
besag_space = Besag(W)
iid_age = IID()
model3d = Separable(rw_time, besag_space, iid_age)
@formula(y ~ 1 + model3d(time, region, age))Notes
Requires at least 2 components
Formula variables must match component order and count
Component ordering: Q = Q_1 ⊗ Q_2 ⊗ ... ⊗ Q_N (rightmost varies fastest)
Follows R-INLA convention: Q = Q_group ⊗ Q_main
Hyperparameters are named:
τ_{modelname}_separable, with indices for duplicatesConstraints from each component are automatically composed via Kronecker products
Redundant constraints are removed to ensure identifiability
GaussianMarkovRandomFields.build_formula_components Function
build_formula_components(formula, data; kwargs...)Placeholder function for the formula interface. A concrete method is provided by the GaussianMarkovRandomFieldsFormula extension when StatsModels is loaded.
See Also
- The BYM + fixed effects Poisson tutorial shows this in practice: Advanced GMRF modelling for disease mapping