Hierarchical Bayesian Inference for NWS Detection and Abundance
Published
May 3, 2026
1 Overview
This script demonstrates the Bayesian hierarchical joint model used to estimate New World Screwworm (NWS) intensity. Given that detections arise from opportunistic surveillance, the model explicitly separates the observation process (detection probability) from the biological process (abundance) to isolate the true biological signal from surveillance bias.
2 Computational & Privacy Note
This script is for demonstrative purposes only and has not been executed within this document. The full joint model is computationally intensive, containing approximately 10.2 million latent entries and 260,185 replicated spatial field elements. Due to these requirements and the sensitive nature of the original surveillance coordinates, which involve private residences and businesses, the code provided here tracks the structural methodology but does not include the raw datasets.
3 HPC Execution Environment
To handle the high dimensionality of the latent Gaussian fields and the spatial point-process likelihood, the model was executed on a high-performance computing (HPC) cluster. The following SLURM batch script illustrates the resource allocation required for a successful run once reasonable starting values (theta_init below) have been identified:
#!/bin/bash#SBATCH --job-name=nws_joint_model#SBATCH -A disease_ecology#SBATCH -N 1#SBATCH -n 1#SBATCH -p bigmem#SBATCH --cpus-per-task=14#SBATCH -t 35:59:00#SBATCH --mem=280G# Load necessary spatial and computational modulesmodule load udunits proj geos/3.12.1 gdal/3.8.5 openblas r/4.4# Set threading for INLA/OpenBLAS to match allocated CPUsexportOMP_NUM_THREADS=14exportOPENBLAS_NUM_THREADS=14# Execute the R script in batch modeR CMD BATCH --no-save--no-restore jcat.R
4 Statistical Formulation
The model evaluates two coupled processes across 15,305 spatial mesh nodes (i) over epidemiological weeks (t):
Abundance Tier (Biological):\log(\mu_{it}) = \log(\text{Area}_i) + \beta_{0}^{(2)} + \sum \beta_{2} \mathbf{X}_{2,it} + f(\text{H}_{it}) + \xi_{2,it} + \alpha \xi_{1,it} + w_{2,t} Key Components:
* \mathbf{X}_{1}, \mathbf{X}_{2}: Linear covariates (e.g., environment, infrastructure).
* \xi_{1,it}, \xi_{2,it}: Replicated spatial Gaussian Fields (SPDE).
* \alpha \xi_{1,it}: A shared spatial component where \alpha scales quantifies detection process on abundance.
* w_{1,t}, w_{2,t}: Temporal random walks (RW1) for weekly trends.
* \gamma_{c(i)}: IID random effect for administrative jurisdictions.
* f(\text{H}_{it}): Non-linear livestock density effect (RW2).
Note on \alpha \xi_{1,it}: This represents a shared spatial component where \alpha quantifies the association between surveillance effort and biological intensity, allowing for the correction of preferential sampling.
5 Spatial Projection
We utilize the Stochastic Partial Differential Equation (SPDE) approach with Penalized Complexity (PC) priors to prevent overfitting.
Set priors for each model tier`s spatial range and index locations to replicate spatial fields by timestep.
Show R Code
# SPDE for Tier 1: Detection (xi_1)spde.t1 <-inla.spde2.pcmatern(mesh.dom, alpha =2,prior.range =c(90, 0.5), # 50% probability that range is 90km or lessprior.sigma =c(1, 0.5),constr =TRUE)# SPDE for Tier 2: Abundance (xi_2)spde.t2 <-inla.spde2.pcmatern(mesh.dom, alpha =2,prior.range =c(150, 0.05), # 5% probability that range is 150km or less prior.sigma =c(0.5, 0.05),constr =TRUE)# Temporal stepst_steps <-length(unique(tier_1.set$time_index))# Define Spatial Indicesfield.xi1 <-inla.spde.make.index("field.xi1", n.spde = spde.t1$n.spde, n.group = t_steps)field.xi2 <-inla.spde.make.index("field.xi2", n.spde = spde.t2$n.spde, n.group = t_steps)
6 Data Integration
6.1 Detection Stack
Organize model input data as a list() object. In this example, all data for the Detection tier is stored in the tier_1.set dataframe.
Show R Code
locs_t1 <-cbind(tier_1.set$x, tier_1.set$y) # geographic coordinatesA.xi1 <-inla.spde.make.A(mesh.dom, # triangulated meshloc = locs_t1, # locationsgroup = tier_1.set$time_index) # timestepseff_t1 <-list(c(field.xi1, list(intercept1 =1)), # interceptlist(north = tier_1.set$northing_km_s, # distance to originroad_dens = tier_1.set$road_dens_s, # road densitynight_illum = tier_1.set$night_illum_s, # nighttime illuminationadmin_gamma =as.integer(as.factor(tier_1.set$admin_u)), # administrative unit (ID)week_w1 = tier_1.set$timestep) # weekly time steps)# Yi_cens_test is the censored version of the Y1 response demonstrated in the "response" tabjtier1.stk <-inla.stack(data =list(Y =cbind(tier_1.set$Yi_cens_test, NA), # bivariate response; column 1 for detection, second for abundancelink =1, # link functione =rep(NA, nrow(tier_1.set))), # no geographic exposure in detection tierA =list(A.xi1, 1), # projection matrixeffects = eff_t1, # covariatestag ="det.stack"# arbitrary label)
The tier specific stacks are merged into a single object for the inla() call. This allows the model to estimate both processes simultaneously, propagating uncertainty from the detection tier to the abundance tier.
Show R Code
joint_stk <-inla.stack(jtier1.stk, jtier2.stk)
7 Model Fitting
The model is fitted using a joint likelihood (Bernoulli and Negative Binomial). We utilize the eb (Empirical Bayes) integration strategy to handle the high dimensionality of the latent fields.
7.1 Priors
Penalized Complexity (PC) priors are defined for the random effects to prevent overfitting and reflect known dispersal capacities.
Show R Code
# Shared spatial field scaling (alpha) and fixed effectshyper_alpha <-list(beta =list(prior ="normal", param =c(0, 1)))# PC priors for Random Walks (Temporal and Jurisdictional)pc_rw <-list(prec =list(prior ="pc.prec", param =c(1, 0.01)))pc_rw_strong <-list(prec =list(prior ="pc.prec", param =c(0.3, 0.01)))pc_rw_cat <-list(prec =list(prior ="pc.prec", param =c(0.5, 0.05)))
7.2 Formula
The formula couples the two tiers, using f(..., copy = "field.xi1") to represent the shared spatial effect \alpha \xi_{1,it}.
Show R Code
formula_nws <- Y ~-1+ intercept1 + intercept2 +# response and intercepts for each tier# Tier 1 Componentsf(field.xi1, # tier 1 spatial fieldmodel = spde.t1, # priorgroup = field.xi1.group, # timesteps to replicatecontrol.group =list(model ="iid", # among-field prior specificationhyper = pc_rw)) +f(week_w1, # weekly timestepsmodel ="rw1", constr =TRUE, scale.model =TRUE, hyper = pc_rw) +f(admin_gamma, # administrative regionsmodel ="iid", constr =TRUE, hyper = pc_rw_strong) + north + road_dens + night_illum +# linear effects for human access# Tier 2 Componentsf(field.xi2, # tier 2 spatial fieldmodel = spde.t2, group = field.xi2.group, control.group =list(model ="iid", hyper = pc_rw)) +f(field.xi1.copy, # copy of spatial field from detection tiercopy ="field.xi1", group = field.xi1.copy.group, fixed =FALSE, hyper = hyper_alpha) +f(week_w2, #weekly time correlationmodel ="rw1", constr =TRUE, scale.model =TRUE, hyper = pc_rw) +f(livestock_H, #non-lieanr livestock (cattle in this example)model ="rw2", constr =TRUE, scale.model =TRUE, hyper = pc_rw_cat) + mintemp + soilmoist + leafarea + rhum + cattle_s + horses_s +# linear effects pigs_s + goats_s + sheep_s + mintemp:soilmoist + mintemp:leafarea
7.3 Run Model
Show R Code
# Quantiles for posterior summariesreturn_quants <-c(0.025, 0.25, 0.5, 0.75, 0.975) # Initial values for hyperparameters (speeds up run)theta_init <-c(4.65, 0.98, -2.01, -3.61, 2.30, 5.02, -0.02, 0.50, 5.42, 0.39)model_joint <-inla(formula_nws,# formulafamily =c("binomial", "nbinomial"), # family for each tierdata =inla.stack.data(joint_stk), # combined dataE =inla.stack.data(joint_stk)$e, # geographic exposurecontrol.predictor =list(A =inla.stack.A(joint_stk), compute =TRUE, #return fitted valueslink =1), control.inla =list(strategy ="adaptive", # integration strategyint.strategy ="eb"), # Empirical Bayes strategycontrol.mode =list(restart =TRUE, theta = theta_init), # initial valuesnum.threads =12, # run using 16 cores, others reserved for background processesquantiles = return_quants, # quanties to returnverbose =TRUE)