Accounting for autocorrelation in wild boar abundance models

Study area and population simulation

We will simulate a wild boar population over an hypothetical study area. We will use a Poisson distribution to scatter wild boar individuals across a study area:

NiPoisson(λ) (Eq 1)

where λ is the expected numbers of wild boar at each quadrant (cell) of our study area. As animals are not usually random distributed, we will use an environmental covariate (distance to the water point) to drive the variation of λ.



In addition, we will add a random autocorrelated predictor to reproduce some kind of autocorrelation of the wild boar distribution, which can be seen as biological processes such as migration, etc.

log(λi)=β0+β1∗dwati+β2∗autocorri (Eq 2)

where λi is the expected (average) abundance of wild boar at each cell i, β0 is the coefficient for the intercept, dwati is the environmental covariate at cell i and β1 its coefficient, and autocorri is the environmental covariate at cell i and β2

its coefficient. Therefore, the general process that describes the number of wild boar at each cell will be:

NPoisson(exp(β0+β1∗dwat+β2∗autocorr)) (Eq 3)

We will set our coefficients as follows:

  • β0=3
  • β1=−0.9
  • β2=0.15
set.seed(123)
library(dismo)
library(raster)
library(sf)

# Create a study area
sarea <- raster(nrows = 100, ncols = 100, xmn = 0, xmx = 100, ymn = 0, ymx = 100)
# Create the "distance from water point" layer
dwat <- scale(distanceFromPoints(sarea, c(25,25))) + 2.26
# Create a random autocorrelated variable
autoc <- raster(nrows = 10, ncols = 10, xmn = 0, xmx = 100, ymn = 0, ymx = 100)
autoc[] <- runif(100, 1,10)
autoc <- disaggregate(autoc,10, "bilinear")

# Lambda parameter for the Poisson distribution of the abundance distribution 
# will be a function from the distance of water point and the autocorrelated,
# with a 3, -0.9 and 0.15 as coefficients
beta_0 <- 3
beta_1 <- -0.9
beta_2 <- 0.15
lambda <- exp(beta_0 + beta_1*(dwat) + beta_2*(autoc))

# Now we can fill each cell of our study area with a random number from a Poisson 
# distribution with a different lambda at each site/cell
for (i in 1:ncell(sarea)){
  sarea[i] <- rpois(1, lambda[i])
}

# Plot the different variables and the study area
par(mfrow = c(2,2))
plot(dwat, main = "Distance to the water point")
plot(autoc, main = "Autocorrelated random covariate")
plot(lambda, main = "Lambda parameter of the Poisson distribution")
plot(sarea, main = "Abundance of wild boar per cell")

Hunting ground simulation and data colecting

Once we have the wild boar population distributed, we will divide the sudy area in 300 different hunting grounds and we will obtain the number of wild boar and a set of possible predictor variables for each hunting ground: area and distance to the water. point.

# Let's create a layer fill with 1 to indicate the area of each cell
uni <- sarea
uni[] <- 1

# Now we'll create hunting grounds from which we'll obtain the counts of our 
# population. They'll be irregular polygons for which we will obtain the actual 
# number of individuals. Here we'll ignore detectability, and we simulate a 
# perfect sampling, being able to obtain the actual number of individuals at 
# each hunting ground.

pp <- randomPoints(sarea, 300)
cotos <- crop(voronoi(pp),sarea)

# Now, we'll obtain the number of individuals in each hunting ground, as well as
# the measures of the other two predictor variables: the average of the distance 
# to the water point and the area of each hunting ground. We'll also obtain the 
# wild boar density by dividing the wild boar number by the area at each
# hunting ground.

cotosr <- (rasterize(cotos, sarea, 'id'))
df <- cbind(zonal(sarea,cotosr, fun = sum), zonal(dwat,cotosr), 
            zonal(uni, cotosr, fun = sum))
df <- data.frame(df[,c(-3,-5)])
colnames(df) <- c("id", "wb", "dwat", "area")
df$dens <- df$wb / df$area
df$dens_poiss <- round((df$dens * 100),0)

# Take a look to our data. Each hunting ground has an ID, the number of 
# individuals, the distance to the water point, the area and the wild boar
# density
head(df)
##   id   wb      dwat area      dens dens_poiss
## 1  1 1509 0.7832324   69 21.869565       2187
## 2  2  240 1.0638565   17 14.117647       1412
## 3  3   57 3.2996035   39  1.461538        146
## 4  4  116 1.9671127   23  5.043478        504
## 5  5   85 2.2280306   12  7.083333        708
## 6  6  204 1.6097854   21  9.714286        971
# Let's include this variables in the spatial object too
cotos$wb <- df$wb
cotos$dwat <- df$dwat
cotos$area <- df$area
cotos$dens <- df$dens
cotos$dens_poiss <- round((df$dens * 100),0)

#PLOT
par(mfrow = c(1,2))
plot(sarea, legend = FALSE, main = "Simulated wild boar abundance")
lines(cotos)
plot(sarea,axes=FALSE, legend = FALSE, 
     main = "Wild boar counts by hunting ground")
plot(st_as_sf(cotos)["wb"], add = TRUE, pal = rev(terrain.colors(10)))

Obtaining the spatial structure of our data

In order to account for the spatial dependency in our data, we should obtain the spatial structure of our hunting grounds, that is, the neighborhood of each hunting ground.

# First, we'll prepare neighborhood matrix of territorial units
library(spdep)
nb <- poly2nb(cotos, row.names = cotos$id)
n.neighbors <- sapply(nb, FUN =length)
neighbors <- unlist(nb, recursive = FALSE)

plot(cotos, col = "grey", main = "Hunting grounds neighborhood structure")
plot(nb, coordinates(cotos), col='red', lwd=1.2, add=TRUE, pch = 16)
# Now we should assign each cell to a hunting ground for model projections
sites.pred <- data.frame(rasterToPoints(dwat))
sites.pred$area <- 1
dup <- raster::extract(cotos, sites.pred[,1:2])
cells.pred <- dup[!duplicated(dup$point.ID),2]
sites.pred$cell <- cells.pred
names(sites.pred)[3] <- "dwat"

Model fitting

Now we will follow three different strategies to model wild boar abundance from hunting ground data and to project our predictions in a finer scale (1×1 grid).

  1. Using the total wild boar counts as response variable and the distance to water point and area of hunting ground as predictors.
  2. Using the density of wild boar as response variable and the distance to water point as predictor
  3. Using the density of wild boar as response variable and the distance to water point as predictor and including iCAR to control for spatial autocorrelation in our data.

We will use the R package hSDM and a Poisson distribution to fit all models. A good explanation about hSDM R package can be found here. It uses a Bayesian framework (through MCMC) to fit model parameters.

library(hSDM)

m1 <- hSDM.poisson(
  counts = df$wb,
  suitability=~ dwat + area,
  data=df,
  suitability.pred=sites.pred, # For model projection into a finer resolution
  burnin = 1000,
  mcmc = 10000,
  thin = 10,
  beta.start=0,
  mubeta =0,
  Vbeta = 1e+06,
  seed = 1234,
  verbose = 1,
  save.p = 0)
## 
## Running the Gibbs sampler. It may be long, please keep cool :)
## 
## **********:10.0%, mean accept. rates= beta:0.303
## **********:20.0%, mean accept. rates= beta:0.303
## **********:30.0%, mean accept. rates= beta:0.303
## **********:40.0%, mean accept. rates= beta:0.320
## **********:50.0%, mean accept. rates= beta:0.330
## **********:60.0%, mean accept. rates= beta:0.290
## **********:70.0%, mean accept. rates= beta:0.320
## **********:80.0%, mean accept. rates= beta:0.293
## **********:90.0%, mean accept. rates= beta:0.293
## **********:100.0%, mean accept. rates= beta:0.317
# Here we use the density as response variable and the distance to the water 
# point as the only predictor

m2 <- hSDM.poisson(
  counts = df$dens_poiss,
  suitability=~ dwat,
  data=df,
  suitability.pred=sites.pred,  # For model projection into a finer resolution
  burnin = 1000,
  mcmc = 10000,
  thin = 10,
  beta.start=0,
  mubeta =0,
  Vbeta = 1e+06,
  seed = 1234,
  verbose = 1,
  save.p = 0)
## 
## Running the Gibbs sampler. It may be long, please keep cool :)
## 
## **********:10.0%, mean accept. rates= beta:0.400
## **********:20.0%, mean accept. rates= beta:0.465
## **********:30.0%, mean accept. rates= beta:0.410
## **********:40.0%, mean accept. rates= beta:0.350
## **********:50.0%, mean accept. rates= beta:0.420
## **********:60.0%, mean accept. rates= beta:0.455
## **********:70.0%, mean accept. rates= beta:0.400
## **********:80.0%, mean accept. rates= beta:0.430
## **********:90.0%, mean accept. rates= beta:0.380
## **********:100.0%, mean accept. rates= beta:0.360
# Finally, we'll use the iCAR approach to have into account the spatial 
# autocorrelation

m3 <- hSDM.poisson.iCAR(
  counts = df$dens_poiss,
  suitability=~ dwat,
  spatial.entity = df$id,
  data=df,
  n.neighbors = n.neighbors, # Spatial structure
  neighbors = neighbors, # Spatial structure
  suitability.pred=sites.pred,  # For model projection into a finer resolution
  spatial.entity.pred=sites.pred$cell, # Spatial structure of projections
  burnin = 1000,
  mcmc = 10000,
  thin = 10,
  beta.start=0,
  Vrho.start=1,
  mubeta =0,
  Vbeta = 1e+06,
  priorVrho = "1/Gamma",
  shape = 0.5,
  rate = 0.0005,
  Vrho.max=1000,
  seed = 1234,
  verbose = 1,
  save.rho = 0,
  save.p = 0)
## 
## Running the Gibbs sampler. It may be long, please keep cool :)
## 
## **********:10.0%, mean accept. rates= beta:0.265, rho:0.256
## **********:20.0%, mean accept. rates= beta:0.235, rho:0.253
## **********:30.0%, mean accept. rates= beta:0.220, rho:0.257
## **********:40.0%, mean accept. rates= beta:0.195, rho:0.260
## **********:50.0%, mean accept. rates= beta:0.275, rho:0.257
## **********:60.0%, mean accept. rates= beta:0.280, rho:0.258
## **********:70.0%, mean accept. rates= beta:0.135, rho:0.258
## **********:80.0%, mean accept. rates= beta:0.300, rho:0.257
## **********:90.0%, mean accept. rates= beta:0.230, rho:0.261
## **********:100.0%, mean accept. rates= beta:0.220, rho:0.256

Now we can predict wild boar numbers obtained for each hunting ground following the three strategies and compare them with the actual data. Take into account that m2 and m3 are fitted using densities, and therefore model predictions should be backtransformed to obtain wild boar numbers.

df$predm1 <- m1$lambda.latent
df$predm2 <- ((m2$lambda.latent/100) * df$area)
df$predm3 <- ((m3$lambda.latent/100) * df$area)

par(mfrow= c(1,3)) 
plot(df$predm1, df$wb, xlab= "wild boar predicted", ylab= "wild boar observed")
abline(a=0,b=1,col="red", lwd = 3)
plot(df$predm2, df$wb, xlab= "wild boar predicted", ylab= "wild boar observed")
abline(a=0,b=1,col="red", lwd = 3)
plot(df$predm3, df$wb, xlab= "wild boar predicted", ylab= "wild boar observed")
abline(a=0,b=1,col="red", lwd = 3)



We can see that iCAR approach is obtaining the more accurate predictions at hunting ground level. In addition, we can check if we were able to remove spatial autocorrelation of model residuals by computing autocorrelograms (using Moran’s I at different distance lags)

library(spatialEco)

# Obtain model residuals (observed - predicted)
df$m1res <- df$wb - m1$lambda.latent
df$m2res <- df$wb - ((m2$lambda.latent/100) * df$area)
df$m3res <- df$wb - ((m3$lambda.latent/100) * df$area)

dataI <- SpatialPointsDataFrame(coordinates(cotos), 
                                data.frame(cbind(df$m1res,df$m2res,df$m3res)))
par(mfrow = c(1,3))
c1 <- correlogram(dataI, dataI@data$X1, dist = 10, ns = 50, latlong=FALSE)
c2 <- correlogram(dataI, dataI@data$X2, dist = 10, ns = 50, latlong=FALSE)
c3 <- correlogram(dataI, dataI@data$X3, dist = 10, ns = 50, latlong=FALSE)



We can see that spatial autocorrelation is significatively reduced in the firsts lags by using the iCAR approach.

## Model projection to a finer resolution (downscaling)

Finally, a further step would be to project our model to a higher resolution: that is, to use our models to obtain a prediction of number of wild boar at finer scale. We could then compare between actual data at cell sclae and model predictions under each strategy.

projm1 <- sarea
projm1[] <- m1$lambda.pred
projm2 <- sarea
projm2[] <- m2$lambda.pred/100
projm3 <- sarea
projm3[] <- m3$lambda.pred/100

par(mfrow = c(2,3))
plot(dwat, main = "Predictor dwat beta = -0.8")
plot(autoc, main = "Predictor autocorrelated beta = 0.15")
plot(sarea, main = paste("Actual abundance distribution\r N =",sum(sarea[])))
plot(projm1, main = paste("Raw counts model (m1)\r N =",round(sum(projm1[]),0)))
plot(projm2, main = paste("Density model (m2)\r N =",round(sum(projm2[]),0)))
plot(projm3, main = paste("Density + iCAR model (m2)\r N =",round(sum(projm3[]),0)))

As we see, the wild boar abundance distribution obtained in m1 and m2 show a uniform pattern, since the only environmental predictor included in the model is the distance to the water point. However, total numbers of wild boar for the whole study area shows an overprediction when we use the total wild boar counts as the response variable. This overprediction is not controlled by including the hunting ground area as predictor variable, so caution should be taken when downscaling this model predictions. The wild boar abundance distribution obtained from m3 shows a much more realistic pattern, since spatial structure of the data is accounted by the iCAR approach and total number of wild boar predicted is in accordance with actual data.

This script is associated to:

Suggested citation: ENETWILD‐consortium, Javier, Fernández‐López, Marco, Apollonio, Jose Antonio, Blanco‐Aguiar, Francesca, Brivio, Simon, Croft, Angela, Fanelli, Alberto, Fernández‐Arias, Ezio, Ferroglio, Oliver, Keuling, Tomislav, Levanič, Kamila, Plis, Tomasz, Podgórski, Boštjan, Pokorny, Massimo, Scandura, Graham, C Smith, Ramon, Soriguer, Joaquín, Vicente, Stefania, Zanet, Pelayo, Acevedo, 2020. Improving models of wild boar hunting yield distribution: new insights for predictions at fine spatial resolution. EFSA supporting publication 2020: 17( 12): EN‐1980. 27 pp. doi: 10.2903/sp.efsa.2020.EN‐1980