Joint Spatiotemporal Model

Hierarchical Bayesian Inference for NWS Detection and Abundance

Published

May 3, 2026

1 Overview

This script demonstrates the Bayesian hierarchical joint model used to estimate New World Screwworm (NWS) intensity. Given that detections arise from opportunistic surveillance, the model explicitly separates the observation process (detection probability) from the biological process (abundance) to isolate the true biological signal from surveillance bias.


2 Computational & Privacy Note

This script is for demonstrative purposes only and has not been executed within this document. The full joint model is computationally intensive, containing approximately 10.2 million latent entries and 260,185 replicated spatial field elements. Due to these requirements and the sensitive nature of the original surveillance coordinates, which involve private residences and businesses, the code provided here tracks the structural methodology but does not include the raw datasets.

3 HPC Execution Environment

To handle the high dimensionality of the latent Gaussian fields and the spatial point-process likelihood, the model was executed on a high-performance computing (HPC) cluster. The following SLURM batch script illustrates the resource allocation required for a successful run once reasonable starting values (theta_init below) have been identified:

#!/bin/bash
#SBATCH --job-name=nws_joint_model
#SBATCH -A disease_ecology
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -p bigmem
#SBATCH --cpus-per-task=14
#SBATCH -t 35:59:00
#SBATCH --mem=280G

# Load necessary spatial and computational modules
module load udunits proj geos/3.12.1 gdal/3.8.5 openblas r/4.4

# Set threading for INLA/OpenBLAS to match allocated CPUs
export OMP_NUM_THREADS=14
export OPENBLAS_NUM_THREADS=14

# Execute the R script in batch mode
R CMD BATCH --no-save --no-restore jcat.R

4 Statistical Formulation

The model evaluates two coupled processes across 15,305 spatial mesh nodes (i) over epidemiological weeks (t):

\begin{aligned} y_{1,it} &\sim \text{Bernoulli}(p_{it}) && \text{(Detection Tier)} \\ y_{2,it} &\sim \text{NegBinom}(\mu_{it}, n) && \text{(Abundance Tier)} \end{aligned}

4.1 Linear Predictors

The expected outcomes are linked to predictors via logit and log link functions:

Detection Tier (Observation): \text{logit}(p_{it}) = \beta_{0}^{(1)} + \sum \beta_{1} \mathbf{X}_{1,it} + \xi_{1,it} + w_{1,t} + \gamma_{c(i)}

Abundance Tier (Biological): \log(\mu_{it}) = \log(\text{Area}_i) + \beta_{0}^{(2)} + \sum \beta_{2} \mathbf{X}_{2,it} + f(\text{H}_{it}) + \xi_{2,it} + \alpha \xi_{1,it} + w_{2,t}
Key Components:
* \mathbf{X}_{1}, \mathbf{X}_{2}: Linear covariates (e.g., environment, infrastructure).
* \xi_{1,it}, \xi_{2,it}: Replicated spatial Gaussian Fields (SPDE).
* \alpha \xi_{1,it}: A shared spatial component where \alpha scales quantifies detection process on abundance.
* w_{1,t}, w_{2,t}: Temporal random walks (RW1) for weekly trends.
* \gamma_{c(i)}: IID random effect for administrative jurisdictions.
* f(\text{H}_{it}): Non-linear livestock density effect (RW2).

Note on \alpha \xi_{1,it}: This represents a shared spatial component where \alpha quantifies the association between surveillance effort and biological intensity, allowing for the correction of preferential sampling.


5 Spatial Projection

We utilize the Stochastic Partial Differential Equation (SPDE) approach with Penalized Complexity (PC) priors to prevent overfitting.

Set priors for each model tier`s spatial range and index locations to replicate spatial fields by timestep.

Show R Code
# SPDE for Tier 1: Detection (xi_1)
spde.t1 <- inla.spde2.pcmatern(mesh.dom, 
                               alpha = 2,
                               prior.range = c(90, 0.5),  # 50% probability that range is 90km or less
                               prior.sigma = c(1, 0.5),
                               constr = TRUE)

# SPDE for Tier 2: Abundance (xi_2)
spde.t2 <- inla.spde2.pcmatern(mesh.dom, 
                               alpha = 2,
                               prior.range = c(150, 0.05), # 5% probability that range is 150km or less 
                               prior.sigma = c(0.5, 0.05),
                               constr = TRUE)


# Temporal steps
t_steps <- length(unique(tier_1.set$time_index))

# Define Spatial Indices
field.xi1 <- inla.spde.make.index("field.xi1", n.spde = spde.t1$n.spde, n.group = t_steps)
field.xi2 <- inla.spde.make.index("field.xi2", n.spde = spde.t2$n.spde, n.group = t_steps)

6 Data Integration

6.1 Detection Stack

Organize model input data as a list() object. In this example, all data for the Detection tier is stored in the tier_1.set dataframe.

Show R Code
locs_t1 <- cbind(tier_1.set$x, tier_1.set$y) # geographic coordinates

A.xi1 <- inla.spde.make.A(mesh.dom, # triangulated mesh
                          loc = locs_t1, # locations
                          group = tier_1.set$time_index) # timesteps

eff_t1 <- list(
  c(field.xi1, list(intercept1 = 1)), # intercept
  list(north = tier_1.set$northing_km_s, # distance to origin
       road_dens = tier_1.set$road_dens_s, # road density
       night_illum = tier_1.set$night_illum_s, # nighttime illumination
       admin_gamma = as.integer(as.factor(tier_1.set$admin_u)), # administrative unit (ID)
       week_w1 = tier_1.set$timestep) # weekly time steps
)


# Yi_cens_test is the censored version of the Y1 response demonstrated in the "response" tab
jtier1.stk <- inla.stack(
  data = list(Y = cbind(tier_1.set$Yi_cens_test, NA), # bivariate response; column 1 for detection, second for abundance
              link = 1, # link function
              e = rep(NA, nrow(tier_1.set))), # no geographic exposure in detection tier
  A = list(A.xi1, 1), # projection matrix
  effects = eff_t1, # covariates
  tag = "det.stack" # arbitrary label
)

6.2 Abundance Stack

Show R Code
locs_t2 <- cbind(tier_1.set$x, tier_1.set$y) # geographic coordinates

A.xi2 <- inla.spde.make.A(mesh.dom, # same as prior chunk
                          loc = locs_t2, 
                          group = tier_2.set$time_index)

eff_t2 <- list(
  c(field.xi2, # spatial effect for tier
    list(field.xi1.copy = field.xi1$field.xi1), # spatial effect to copy detection tier
    list(intercept2 = 1)), # intercept
  list(mintemp = tier_2.set$mintemp_s, # minimum temperature
       soilmoist = tier_2.set$soilmoist_s, # soial moisture
       leafarea = tier_2.set$leafarea_s, # leaf area
       livestock_H = tier_2.set$cattle20_q, # cattle density, non-linear 
       horses_s = tier_2.set$horses_s, # linear
       pigs_s = tier_2.set$pigs_s, # linear
       goats_s = tier_2.set$goats_s, # linear
       sheep_s = tier_2.set$sheep_s, # linear
       cattle_s = tier_2.set$cattle_s, # linear
       week_w2 = tier_2.set$timestep) # weekly timesteps
)

# Count_cens_test is the censored version of the Y1 response demonstrated in the "response" tab
jtier2.stk <- inla.stack(
  data = list(Y = cbind(NA, tier_2.set$Count_cens_test), # bivaraite response
              link = 2, # link
              e = tier_2.set$area), # geographic exposure
  A = list(A.xi2, 1), # projection matrix
  effects = eff_t2, # covariates
  tag = "abun.stack" # label
)

6.3 Join Data

The tier specific stacks are merged into a single object for the inla() call. This allows the model to estimate both processes simultaneously, propagating uncertainty from the detection tier to the abundance tier.

Show R Code
joint_stk <- inla.stack(jtier1.stk, jtier2.stk)

7 Model Fitting

The model is fitted using a joint likelihood (Bernoulli and Negative Binomial). We utilize the eb (Empirical Bayes) integration strategy to handle the high dimensionality of the latent fields.

7.1 Priors

Penalized Complexity (PC) priors are defined for the random effects to prevent overfitting and reflect known dispersal capacities.

Show R Code
# Shared spatial field scaling (alpha) and fixed effects
hyper_alpha <- list(beta = list(prior = "normal", param = c(0, 1)))

# PC priors for Random Walks (Temporal and Jurisdictional)
pc_rw <- list(prec = list(prior = "pc.prec", param = c(1, 0.01)))
pc_rw_strong <- list(prec = list(prior = "pc.prec", param = c(0.3, 0.01)))
pc_rw_cat <- list(prec = list(prior = "pc.prec", param = c(0.5, 0.05)))

7.2 Formula

The formula couples the two tiers, using f(..., copy = "field.xi1") to represent the shared spatial effect \alpha \xi_{1,it}.

Show R Code
formula_nws <- Y ~ -1 + intercept1 + intercept2 + # response and intercepts for each tier
  # Tier 1 Components
  f(field.xi1, # tier 1 spatial field
    model = spde.t1, # prior
    group = field.xi1.group, # timesteps to replicate
    control.group = list(model = "iid", # among-field prior specification
                         hyper = pc_rw)) +
  f(week_w1, # weekly timesteps
    model = "rw1", 
    constr = TRUE, 
    scale.model = TRUE, 
    hyper = pc_rw) +
  f(admin_gamma, # administrative regions
    model = "iid", 
    constr = TRUE, 
    hyper = pc_rw_strong) +
  north + road_dens + night_illum + # linear effects for human access
  
  # Tier 2 Components
  f(field.xi2, # tier 2 spatial field
    model = spde.t2, 
    group = field.xi2.group, 
    control.group = list(model = "iid", 
                         hyper = pc_rw)) +
  f(field.xi1.copy, # copy of spatial field from detection tier
    copy = "field.xi1", 
    group = field.xi1.copy.group, 
    fixed = FALSE, 
    hyper = hyper_alpha) +
  f(week_w2, #weekly time correlation
    model = "rw1", 
    constr = TRUE, 
    scale.model = TRUE, 
    hyper = pc_rw) +
  f(livestock_H, #non-lieanr livestock (cattle in this example)
    model = "rw2", 
    constr = TRUE, 
    scale.model = TRUE, 
    hyper = pc_rw_cat) +
  mintemp + soilmoist + leafarea + rhum + cattle_s + horses_s + # linear effects
  pigs_s + goats_s + sheep_s + mintemp:soilmoist + mintemp:leafarea

7.3 Run Model

Show R Code
# Quantiles for posterior summaries
return_quants <- c(0.025, 0.25, 0.5, 0.75, 0.975) 

# Initial values for hyperparameters (speeds up run)
theta_init <- c(4.65, 0.98, -2.01, -3.61, 2.30, 5.02, -0.02, 0.50, 5.42, 0.39)

model_joint <- inla(formula_nws,# formula
                family = c("binomial", "nbinomial"), # family for each tier
                data = inla.stack.data(joint_stk), # combined data
                E = inla.stack.data(joint_stk)$e, # geographic exposure
                control.predictor = list(A = inla.stack.A(joint_stk), 
                                         compute = TRUE, #return fitted values
                                         link = 1), 
                control.inla = list(strategy = "adaptive", # integration strategy
                                    int.strategy = "eb"), # Empirical Bayes strategy
                control.mode = list(restart = TRUE, theta = theta_init), # initial values
                num.threads = 12, # run using 16 cores, others reserved for background processes
                quantiles = return_quants, # quanties to return
                verbose = TRUE)