Skip to content

Formula Interface

The formula interface lets you compose models succinctly using StatsModels.jl syntax. It maps terms on the right-hand side to latent components and assembles a sparse design matrix that connects observations to the combined latent field. To use the formula syntax, load StatsModels first to activate the corresponding package extension.

Quick Example

julia
using GaussianMarkovRandomFields, StatsModels, Distributions, SparseArrays

# Suppose W is a spatial adjacency for regions; y are counts with an offset
besag = Besag(W)                         # structured spatial effect
iid = IID()                              # unstructured random effect
f = @formula(y ~ 1 + x + iid(region) + besag(region))
comp = build_formula_components(f, data; family = Poisson)
lik  = comp.obs_model(data.y; offset = data.logE)
prior = comp.combined_model(; τ_besag=1.0, τ_iid=1.0)
post  = gaussian_approximation(prior, lik)

Specifying Constraints

Random effect terms support constraint specifications for identifiability and other modeling requirements. This is essential for models like BYM where both spatial and IID effects need sum-to-zero constraints.

Basic Usage

Create functor instances with constraints before using them in formulas:

julia
using GaussianMarkovRandomFields, StatsModels

# Data
data = (
    y = randn(20),
    hospital = repeat(["A", "B", "C", "D"], 5),
    time = collect(1:20)
)

# Create constrained functors
iid_sz = IID(constraint=:sumtozero)
ar1_sz = AR1(constraint=:sumtozero)
rw1 = RandomWalk()  # Built-in sum-to-zero

# Use in formula
comp = build_formula_components(
    @formula(y ~ 0 + iid_sz(hospital) + ar1_sz(time)),
    data;
    family = Normal
)

# Result is a ConstrainedGMRF
gmrf = comp.combined_model(τ_iid=1.0, τ_ar1=1.0, ρ_ar1=0.7)

BYM Model with Constraints

The BYM (Besag-York-Mollié) model uses sum-to-zero constraints for identifiability:

julia
# Spatial adjacency matrix
W = sparse([...])

# Classic BYM: Both components need sum-to-zero constraints
besag = Besag(W)                      # Spatial (built-in constraint)
iid_sz = IID(constraint=:sumtozero)   # Unstructured (explicit constraint)

comp = build_formula_components(
    @formula(y ~ 1 + besag(region) + iid_sz(region)),
    data;
    family = Poisson
)

The BYM2 model offers improved parameterization with a single precision τ and mixing parameter φ:

julia
# BYM2 without intercept (no constraint needed)
bym2 = BYM2(W)
comp = build_formula_components(
    @formula(y ~ 0 + x + bym2(region)),
    data;
    family = Poisson
)
gmrf = comp.combined_model(τ_bym2=1.0, φ_bym2=0.5)

# BYM2 with intercept (constrain IID component for identifiability)
bym2_constrained = BYM2(W; iid_constraint=:sumtozero)
comp = build_formula_components(
    @formula(y ~ 1 + x + bym2_constrained(region)),
    data;
    family = Poisson
)
gmrf = comp.combined_model(τ_bym2=1.0, φ_bym2=0.5)

Custom Constraints

Specify custom linear constraints as (A, e) tuples where Ax = e:

julia
# Constraint: first two groups have equal effects (x₁ = x₂)
A = [1.0 -1.0 0.0 0.0]  # x₁ - x₂ = 0
e = [0.0]

iid_custom = IID(constraint=(A, e))
comp = build_formula_components(@formula(y ~ 0 + iid_custom(group)), data; family=Normal)

Constraint Support by Term

  • IID(constraint=...): Optional constraint parameter

    • nothing (default): unconstrained

    • :sumtozero: sum-to-zero constraint

    • (A, e): custom constraint matrix and vector

  • AR1(constraint=...): Same as IID

  • RandomWalk(additional_constraints=...):

    • Always has built-in sum-to-zero

    • Optional additional_constraints for extra constraints beyond sum-to-zero

  • Besag(...): Always has built-in sum-to-zero constraints per connected component

  • BYM2(W; iid_constraint=..., additional_constraints=...):

    • Spatial component: always has built-in sum-to-zero per connected component

    • IID component: optional iid_constraint parameter (default: unconstrained)

      • Use iid_constraint=:sumtozero when including a fixed intercept for identifiability
    • Optional additional_constraints for spatial component (beyond sum-to-zero)

Unconstrained vs. Constrained

julia
# Unconstrained
iid = IID()
comp1 = build_formula_components(@formula(y ~ 0 + iid(group)), data; family=Normal)
gmrf1 = comp1.combined_model(τ_iid=1.0)  # Standard GMRF

# Constrained
iid_sz = IID(constraint=:sumtozero)
comp2 = build_formula_components(@formula(y ~ 0 + iid_sz(group)), data; family=Normal)
gmrf2 = comp2.combined_model(τ_iid=1.0)  # ConstrainedGMRF

Separable Models (Kronecker Products)

For multi-dimensional processes with separable structure, use the Separable functor to compose multiple random effects via Kronecker products. This is particularly useful for space-time models where the precision matrix is Q = Q_time ⊗ Q_space with space varying fastest.

Basic Space-Time Example

julia
using GaussianMarkovRandomFields, StatsModels, SparseArrays

# Create data with space-time structure
n_time, n_space = 10, 5
data = (
    y = randn(n_time * n_space),
    time = repeat(1:n_time, outer=n_space),
    region = repeat(1:n_space, inner=n_time),
)

# Spatial adjacency (e.g., a chain or grid)
W = spzeros(n_space, n_space)
for i in 1:(n_space-1)
    W[i, i+1] = 1
    W[i+1, i] = 1
end

# Create separable model: Q = Q_time ⊗ Q_space
# RW1 for temporal smoothing, Besag for spatial smoothing
rw1 = RandomWalk()         # Temporal: RW1 with built-in sum-to-zero
besag = Besag(W)           # Spatial: Besag (intrinsic CAR)
st = Separable(rw1, besag) # Separable: Kronecker product

# Build formula
comp = build_formula_components(
    @formula(y ~ 1 + st(time, region)),
    data;
    family = Normal
)

# Instantiate GMRF
gmrf = comp.combined_model(τ_rw1_separable=1.0, τ_besag_separable=2.0)

The resulting precision matrix has a block-tridiagonal structure:

Q = Q_time ⊗ Q_space = [
    Q_space  -Q_space     0        0     ...
    -Q_space  2Q_space  -Q_space    0     ...
       0     -Q_space  2Q_space  -Q_space ...
       ...
]

Constraint Composition

Constraints from each component are automatically composed and redundant constraints removed:

julia
# Both RW1 (sum-to-zero in time) and Besag (sum-to-zero in space)
# produce constraints that are automatically handled
rw1 = RandomWalk()
besag = Besag(W)
st = Separable(rw1, besag)

comp = build_formula_components(@formula(y ~ 0 + st(time, region)), data; family=Normal)

# The resulting GMRF handles both constraints correctly with redundancy removal
gmrf = comp.combined_model(τ_rw1_separable=1.0, τ_besag_separable=1.0)

N-way Separable Models

Extend beyond 2D with three or more components:

julia
# 3D space-time-group model: Q = Q_time ⊗ Q_space ⊗ Q_group
n_group = 3
data_3d = (
    y = randn(n_time * n_space * n_group),
    time = repeat(repeat(1:n_time, outer=n_space), outer=n_group),
    region = repeat(repeat(1:n_space, inner=n_time), outer=n_group),
    group = repeat(1:n_group, inner=n_time*n_space),
)

rw1 = RandomWalk()
besag = Besag(W)
iid_group = IID()

# Separable with 3 components
sep3 = Separable(rw1, besag, iid_group)

comp = build_formula_components(
    @formula(y ~ 0 + sep3(time, region, group)),
    data_3d;
    family = Normal
)

gmrf = comp.combined_model(
    τ_rw1_separable = 1.0,
    τ_besag_separable = 1.0,
    τ_iid_separable = 0.5
)

Component Ordering

The order of components in Separable determines the Kronecker order:

  • Separable(rw1, besag) → Q = Q_rw1 ⊗ Q_besag (besag varies fastest)

  • Separable(besag, rw1) → Q = Q_besag ⊗ Q_rw1 (rw1 varies fastest)

This affects the vectorization pattern but not which observations map to which latent states (the indicator matrix handles that mapping appropriately).

Hyperparameter Naming

Separable models use prefixed hyperparameter names for clarity:

julia
# For Separable(rw1, besag):
# - τ_rw1_separable: precision for temporal component
# - τ_besag_separable: precision for spatial component

gmrf = comp.combined_model(τ_rw1_separable=1.0, τ_besag_separable=2.0)

When there are multiple components of the same type, indices are added:

julia
rw1_a = RandomWalk()
rw1_b = RandomWalk()
sep = Separable(rw1_a, rw1_b)

# Hyperparameters are:
# - τ_rw1_separable: first RW1
# - τ_rw1_2_separable: second RW1

gmrf = comp.combined_model(τ_rw1_separable=1.0, τ_rw1_2_separable=1.5)

Terms and Builders

GaussianMarkovRandomFields.IID Type
julia
IID(; constraint = nothing)

Formula functor for IID (independent and identically distributed) random effects.

Arguments

  • constraint: Optional constraint specification. Can be:
    • nothing (default): No constraints

    • :sumtozero: Sum-to-zero constraint for identifiability

    • (A, e): Custom linear constraint matrix and vector where Ax = e

Usage

julia
# Unconstrained IID effect
iid = IID()
@formula(y ~ 0 + iid(group))

# With sum-to-zero constraint
iid_sz = IID(constraint=:sumtozero)
@formula(y ~ 0 + iid_sz(group))

# With custom constraint
A = [1.0 1.0 0.0]  # x1 + x2 = 0
e = [0.0]
iid_custom = IID(constraint=(A, e))
@formula(y ~ 0 + iid_custom(group))

Notes

  • You must create an IID instance before using it in a formula

  • Calling the functor directly is unsupported outside formula parsing

source
GaussianMarkovRandomFields.RandomWalk Type
julia
RandomWalk(order=1; additional_constraints = nothing)

Formula functor for RandomWalk random effects.

Arguments

  • order: Order of the random walk (default: 1). Currently only order=1 is supported.

  • additional_constraints: Optional additional constraints beyond the built-in sum-to-zero constraint. Can be:

    • nothing (default): Only the built-in sum-to-zero constraint

    • (A, e): Custom additional linear constraint matrix and vector where Ax = e

Usage

julia
# RW1 with built-in sum-to-zero constraint (order can be omitted, defaults to 1)
rw1 = RandomWalk()
@formula(y ~ 0 + rw1(time))

# Explicitly specifying order=1
rw1 = RandomWalk(1)
@formula(y ~ 0 + rw1(time))

# Used in separable models
rw1 = RandomWalk()
besag = Besag(W)
st = Separable(rw1, besag)
@formula(y ~ 1 + st(time, region))

# RW1 with additional constraints
A = [1.0 0.0 1.0 zeros(7)...]  # x1 + x3 = 0 (in addition to sum-to-zero)
e = [0.0]
rw1_extra = RandomWalk(1, additional_constraints=(A, e))
@formula(y ~ 0 + rw1_extra(time))

Notes

  • RW1 always has a built-in sum-to-zero constraint for identifiability

  • Use additional_constraints to specify constraints beyond sum-to-zero

  • When used in Separable models, order is specified in constructor, not in formula

  • You must create a RandomWalk instance before using it in a formula

  • Calling the functor directly is unsupported outside formula parsing

source
GaussianMarkovRandomFields.AR1 Type
julia
AR1(; constraint = nothing)

Formula functor for AR1 (first-order autoregressive) random effects.

Arguments

  • constraint: Optional constraint specification. Can be:
    • nothing (default): No constraints

    • :sumtozero: Sum-to-zero constraint for identifiability

    • (A, e): Custom linear constraint matrix and vector where Ax = e

Usage

julia
# Unconstrained AR1 effect
ar1 = AR1()
@formula(y ~ 0 + ar1(time))

# With sum-to-zero constraint
ar1_sz = AR1(constraint=:sumtozero)
@formula(y ~ 0 + ar1_sz(time))

# With custom constraint
A = [1.0 1.0 zeros(8)...]  # First two time points sum to zero
e = [0.0]
ar1_custom = AR1(constraint=(A, e))
@formula(y ~ 0 + ar1_custom(time))

Notes

  • You must create an AR1 instance before using it in a formula

  • Calling the functor directly is unsupported outside formula parsing

source
GaussianMarkovRandomFields.Besag Type
julia
Besag(W; id_to_node = nothing, normalize_var = true, singleton_policy = :gaussian)

Formula functor for Besag (intrinsic CAR) random effects.

Usage:

  • Create a functor instance: besag = Besag(W)

  • With string/categorical region IDs: besag = Besag(W; id_to_node = Dict("WesternIsles" => 11, ...))

  • Use in a formula: @formula(y ~ 0 + besag(region))

Notes

  • id_to_node maps arbitrary region identifiers to integer node indices (1-based) of W.

  • Calling the functor directly is unsupported outside formula parsing.

source
GaussianMarkovRandomFields.BYM2 Type
julia
BYM2(W; id_to_node = nothing, normalize_var = true, singleton_policy = :gaussian, additional_constraints = nothing, iid_constraint = nothing)

Formula functor for BYM2 (Besag-York-Mollié with improved parameterization) random effects.

The BYM2 model combines spatial (ICAR) and unstructured (IID) random effects with intuitive mixing and precision parameters. It is a reparameterization of the classic BYM model that facilitates prior specification (Riebler et al. 2016).

Arguments

  • W: Adjacency matrix for the spatial structure

  • id_to_node: Optional mapping from region identifiers to integer node indices (1-based)

  • normalize_var: Whether to normalize variance (default: true, required for BYM2)

  • singleton_policy: How to handle isolated nodes (:gaussian or :degenerate)

  • additional_constraints: Optional additional constraints on spatial component (beyond sum-to-zero)

  • iid_constraint: Optional constraint on unstructured component (e.g., :sumtozero for identifiability with intercept)

Model Structure

The BYM2 model creates a 2n-dimensional latent field:

  • Components 1:n are spatial effects (variance-normalized ICAR)

  • Components (n+1):2n are unstructured effects (IID)

  • In the linear predictor: η[i] = ... + u_[i] + v_[i]

Hyperparameters

  • τ: Overall precision (τ > 0)

  • φ: Mixing parameter (0 < φ < 1), proportion of unstructured variance

    • φ = 0: pure spatial model

    • φ = 1: pure unstructured model

    • φ = 0.5: equal spatial and unstructured variance

Identifiability

When including a fixed intercept, the model can be unidentifiable because the IID component can absorb constant shifts. Handle this by either:

  1. Not including an intercept: @formula(y ~ 0 + bym2(region))

  2. Constraining the IID component: BYM2(W; iid_constraint=:sumtozero)

Usage

julia
# Standard BYM2 without intercept (recommended)
W = adjacency_matrix
bym2 = BYM2(W)
@formula(y ~ 0 + bym2(region))

# BYM2 with intercept (constrain IID for identifiability)
bym2_constrained = BYM2(W; iid_constraint=:sumtozero)
@formula(y ~ 1 + bym2_constrained(region))

# With categorical region IDs
id_map = Dict("WesternIsles" => 11, "Highland" => 12, ...)
bym2_mapped = BYM2(W; id_to_node = id_map)
@formula(y ~ 0 + bym2_mapped(region))

# With custom singleton policy
bym2_deg = BYM2(W; singleton_policy = :degenerate)
@formula(y ~ 0 + bym2_deg(region))

Notes

  • BYM2 always uses variance normalization (normalize_var = true)

  • Sum-to-zero constraint is automatically applied to the spatial component

  • You must create a BYM2 instance before using it in a formula

  • Calling the functor directly is unsupported outside formula parsing

References

Riebler, A., Sørbye, S. H., Simpson, D., & Rue, H. (2016). An intuitive Bayesian spatial model for disease mapping that accounts for scaling. Statistical Methods in Medical Research, 25(4), 1145-1165.

source
GaussianMarkovRandomFields.Separable Type
julia
Separable(components...)

Formula functor for separable (Kronecker product) random effects.

Creates a multi-dimensional random effect where the precision matrix is a Kronecker product of the component precision matrices. Useful for space-time models, age-period-cohort models, and other separable multi-dimensional processes.

Arguments

  • components: Two or more formula term functors (IID, AR1, RandomWalk, Besag, etc.)

Usage

julia
# Space-time separable model: Q = Q_time ⊗ Q_space
rw1 = RandomWalk()
besag = Besag(W)
st = Separable(rw1, besag)
@formula(y ~ 1 + st(time, region))

# Three-way separable model: Q = Q_time ⊗ Q_space ⊗ Q_age
rw_time = RandomWalk()
besag_space = Besag(W)
iid_age = IID()
model3d = Separable(rw_time, besag_space, iid_age)
@formula(y ~ 1 + model3d(time, region, age))

Notes

  • Requires at least 2 components

  • Formula variables must match component order and count

  • Component ordering: Q = Q_1 ⊗ Q_2 ⊗ ... ⊗ Q_N (rightmost varies fastest)

  • Follows R-INLA convention: Q = Q_group ⊗ Q_main

  • Hyperparameters are named: τ_{modelname}_separable, with indices for duplicates

  • Constraints from each component are automatically composed via Kronecker products

  • Redundant constraints are removed to ensure identifiability

source
GaussianMarkovRandomFields.build_formula_components Function
julia
build_formula_components(formula, data; kwargs...)

Placeholder function for the formula interface. A concrete method is provided by the GaussianMarkovRandomFieldsFormula extension when StatsModels is loaded.

source

See Also