Synthetic world and data

Author

Murray Logan

Published

March 30, 2024

1 Synopsis

This document illustrates the generation of synthetic data that can be used to test the processing and modelling routines used in the ReefCloud statistical modelling components. Specifically, it will illustrate the generation of a full spatio-temporal grid for hard coral (HC), soft coral (SC) and macroalgae (MA) within a fabricated area by establishing a baseline for each in space and then perturbing them over time by overlaying synthetic disturbances (degree heating weeks, cyclones and other disturbances) as well as between disturbance growth (recovery).

The established full spatio-temporal grid for each of HC, SC and MA can thereafter be considered the “true” synthetic world from which a range of sampling designs and analyses can be explored.

2 Preparations

We will start by loading the required r packages.

library(knitr)
library(tidyverse)
library(simstudy)
library(gstat)
library(sf)
library(RandomFields)
RFoptions(install = "no")
library(INLA)
library(inlabru)
library(mgcv)
library(stars)
library(patchwork)
library(Hmisc)
#library(downloadthis)

The generated data are synthetic. As such they are the result of a combination of deterministic (based on precise repeatable rules and equations) and stochastic (based on random variability) processes. To ensure repeatability (in the presence of the stochasticity), we will set a random seed value that we can utilise any time we are enacting a stochastic (random) element.

seed <- 123

3 Define the spatial and temporal domains

The spatial and temporal domains will be used to represent our synthetic (fabricated) “world”. Before we can project any spatial and temporal patterns in benthic cover, we first need to define the bounds of this world. Thereafter, cover will broadly be generated as the baseline cover (cover at time 0) plus the spatial patterns over time plus random noise.

cover \sim{} baseline + spatial\times temporal + noise

We will simulate 12 annual sampling events. Although these will be assumed to be evenly spaced in time (descrete times), it is acknowledged that this is typically not now sampling programs collect data.

years <- 1:12

Lets define the spatial domain.

spatial_domain <- st_geometry(
  st_multipoint(
    x = rbind(
      c(0, -10),
      c(3, -10),
      c(10, -20),
      c(1, -21),
      c(2, -16),
      c(0, -10)
    )
  )
) |>
  st_set_crs(4326) |>
  st_cast("POLYGON")

spatial_domain |>
  ggplot() +
  geom_sf() +
  theme_bw()

Now we will generate a grid of 10,000 points within the spatial domain. This will essentially represent the full continuous spatial surface.

set.seed(seed)
spatial_grid <- spatial_domain |>
  st_set_crs(NA) |>
  st_sample(size = 10000, type = "regular") |>
  st_set_crs(4236)

spatial_grid |> ggplot() +
  geom_sf(size = 0.1) +
  theme_bw()

## Compile the spatial data frame - note it is important for
## RFsimulate that the data be sorted by Longitude then Latitude
spatial_grid_pts_df <- spatial_grid |>
  st_coordinates() |>
  as.data.frame() |>
  dplyr::rename(Longitude = X, Latitude = Y) |>
  arrange(Longitude, Latitude)

Although the above represents the broad spatial domain, the entire domain will not be saturated with coral reef. Indeed only a small fraction of the full domain will contain coral reef. Furthermore, the coral reef will not be distributed uniformly throughout the domain.

To simulate this, we will create a collection of discrete reefs that vary in size and shape and be sprinkled throughout the spatial domain. This can be acheived by creating a gaussian markov random field to defines a noisy variable in space. If we think of the noisy variable as being a third dimension (such as height), then this field is like a very rough topography. “Reefs” can be created by only keeping the parts of the topography that are higher than a specific value (picture viewing the hilly landscape from a plane as a large, flat cloud slowly decends towards the ground gradually revealing a set of seperate peaks).

Coral reefs tend to consist of a shallow, sand-filled lagoon surrounded by a sloping escarpment of hard substrate. The coral of a coral reef is typically restricted to the escarpment and thus only the 5-20 meter perimeter of a reef. Hence, to further the realism of the simulated reefs, we will further modify the “reefs” so that they are a series of irregular frames.

RFoptions(seed = 1)
threshold <- 1.75
model <- RMexp(var = 1, scale = 0.1)
sim <- RFsimulate(model,
  x = as.vector(scale(spatial_grid_pts_df$Longitude,
    scale = FALSE
  )),
  y = as.vector(scale(spatial_grid_pts_df$Latitude,
    scale = FALSE
  ))
)
## combine with spatial data
reefs <- spatial_grid_pts_df |>
  mutate(Y = as.vector(sim))

reefs |>
  ggplot() +
  geom_tile(aes(y = Latitude, x = Longitude, fill = Y)) +
  coord_sf(crs = 4326) +
  theme_bw() +
  theme(
    axis.title = element_blank(),
    legend.position = c(0.95, 0.95),
    legend.justification = c(1, 1)
  )

reefs |>
  mutate(Y = Y > threshold) |>
  ggplot() +
  geom_tile(aes(y = Latitude, x = Longitude, fill = Y)) +
  coord_sf(crs = 4326) +
  theme_bw() +
  theme(
    axis.title = element_blank(),
    legend.position = c(0.95, 0.95),
    legend.justification = c(1, 1)
  )

We can then filter the field to exclude Y values that are less than 1.75, convert the data into a simple features points object, generate square buffers around each point, convert the points into polygons and then combine overlapping and connected polygons into single polygons,

The result will be a set of irregular and randomly postioned polygons that can represent coral reefs.

reefs_sf <- reefs |>
  st_as_sf(coords = c("Longitude", "Latitude")) |>
  filter(Y > threshold) |>
  st_buffer(0.05, endCapStyle = "SQUARE") |>
  st_cast("POLYGON") |>
  st_union() |>
  st_set_crs(4326)
reefs_sf |>
  ggplot() +
  geom_sf(data = spatial_domain, fill = NA) +
  geom_sf(fill = "red") + # nolint
  theme_bw() +
  theme(
    axis.title = element_blank(),
    legend.position = c(0.95, 0.95),
    legend.justification = c(1, 1)
  )
reefs_full_sf <- reefs_sf

Finally, we generate the difference between an enlarged and shrunk versions of each reef so as to yield polygons that resembled frames. To illustrate, the figure is zoomed in on a small collection of reefs.

sf_use_s2(FALSE) # negative buffers dont work if this is true
reefs_sf <- reefs_sf |>
  st_buffer(0.01) |>
  st_difference(reefs_sf |> st_buffer(-0.01))
sf_use_s2(TRUE)
reefs_sf |> ggplot() +
  geom_sf() +
  coord_sf(xlim = c(2.4, 2.9), ylim = c(-16.75, -16.25)) +
  theme_bw() +
  theme(
    axis.title = element_blank(),
    legend.position = c(0.95, 0.95),
    legend.justification = c(1, 1)
  )
reefs_poly_sf <- reefs_sf |>
  st_cast("POLYGON") |>
  st_as_sf() |>
  mutate(Reef = paste0("Reef", 1:n()))

4 Synthetic broad scale patterns

In this section, we will use a range of statistical routines to project spatial and temporal patterns (as well as noise) onto the entire spatio-temporal domain ignoring the fact that the reefs only occur in the limited bordered shaped defined in the previous section. We will this later point in the next section.

The response data (hard coral, soft coral and macroalgae cover) will be effected by the following:

base coral cover - the global average coral cover (pooled over space and time)
spatial pattern in this base cover which reflects the spatial pattern at T0
annual growth (e.g. 5-10% annual increase)
influence of covariates (spatio-temporal effects)
random noise

Create the SPDE mesh

variance <- 1
kappa <- 1

alpha <- 2
mesh_pars <- c(1, 0.5, 0.1, 1, 0.5) *
  sqrt(alpha - ncol(spatial_grid_pts_df) / 2) / kappa
s <- inla.mesh.segment(
  spatial_grid_pts_df[chull(spatial_grid_pts_df), ]
)
mesh <- inla.mesh.2d(
  spatial_grid_pts_df[chull(spatial_grid_pts_df), ],
  max.edge = mesh_pars[1:2],
  cutoff = mesh_pars[3],
  offset = mesh_pars[4:5],
  boundary = s
)

ggplot() +
  gg(mesh) +
  geom_sf(data = spatial_domain, fill = NA, size = 2) +
  coord_sf(crs = 4326) +
  theme_bw()

Create a SPDE for a Matern model. If you want to apply PC priors, use inla.spde2.pcmatern instead. Matern SPDE model with spatial scale paramter kappa(u) and variance rescaling. parameter tau(u). Alpha is the Fractional operator order where nu = \alpha-d/2.
```
spde <- inla.spde2.matern(mesh, alpha = alpha)
```

calculate the precision matrix from the parameter values (theta)

theta <- c(-0.5 * log(4 * pi * variance * kappa^2), log(kappa))
Q <- inla.spde2.precision(spde, theta = theta)

calculate a lattice projection to and from the mesh

A <- inla.spde.make.A(
  mesh = mesh,
  loc = as.matrix(spatial_grid_pts_df)
)
# OR
## A <- inla.mesh.project(mesh = mesh,
##                        loc = as.matrix(spatial_grid_pts_df ))$A

The baseline represents the spatial pattern of hard coral cover the year prior to sampling. This spatial pattern is defined as a simple sine wave (applied to the centered latitudes) and rotated slightly and projected onto the SPDE grid.

cover_i = longitude_i + sin(latitude_i) + 1.5\times (longitude_i + latitude_i)

Note, these values are on the expected link scale (logit). The second (bottom) figure displays the baseline on the response (percent cover) scale.

baseline_sample_hcc <- mesh$loc[, 1:2] |>
  as.data.frame() |>
  dplyr::select(Longitude = V1, Latitude = V2) |>
  mutate(
    clong = as.vector(scale(Longitude, scale = FALSE)),
    clat = as.vector(scale(Latitude, scale = FALSE)),
    Y = clong + sin(clat) + # rnorm(1,0,1) +
      1.5 * clong + clat
  ) |>
  mutate(Y = scales::rescale(Y, to = c(-2, 0.8)))

baseline_effects_hcc <- baseline_sample_hcc |>
  dplyr::select(Y) |>
  as.matrix()
baseline_pts_sample_hcc <- inla.mesh.project(mesh,
  loc = as.matrix(spatial_grid_pts_df[, 1:2]),
  baseline_effects_hcc
)
baseline_pts_effects_hcc <- baseline_pts_sample_hcc |>
  cbind() |>
  as.matrix() |>
  as.data.frame() |>
  cbind(spatial_grid_pts_df) |>
  pivot_longer(
    cols = c(-Longitude, -Latitude),
    names_to = c("Year"),
    names_pattern = "sample:(.*)",
    values_to = "Value"
  ) |>
  mutate(Year = as.numeric(Year))

ggplot(baseline_pts_effects_hcc, aes(y = Latitude, x = Longitude)) +
  geom_tile(aes(fill = Value)) +
  scale_fill_gradientn(colors = terrain.colors(10)) +
  coord_sf(crs = 4236) +
  theme_bw(base_size = 7) +
  theme(
    axis.title = element_blank(),
    legend.position = c(0.95, 0.95),
    legend.justification = c(1, 1)
  )

ggplot(baseline_pts_effects_hcc, aes(y = Latitude, x = Longitude)) +
  geom_tile(aes(fill = 100 * plogis(Value))) +
  scale_fill_gradientn("Cover (%)", colors = terrain.colors(10)) +
  coord_sf(crs = 4236) +
  theme_bw(base_size = 7) +
  theme(
    axis.title = element_blank(),
    legend.position = c(0.95, 0.95),
    legend.justification = c(1, 1)
  )

The baseline represents the spatial pattern of soft coral cover the year prior to sampling. Similar to hard coral cover, this spatial pattern is defined as a simple sine wave (applied to the centered latitudes), yet it is rotated slightly more before being projected onto the SPDE grid.

cover_i = longitude_i + sin(latitude_i) + 1.5\times longitude_i - 1.5\times latitude_i

Note again, these values are on the expected link scale (logit). The second (bottom) figure displays the baseline on the response (percent cover) scale.

baseline_sample_sc <- mesh$loc[, 1:2] |>
  as.data.frame() |>
  dplyr::select(Longitude = V1, Latitude = V2) |>
  mutate(
    clong = as.vector(scale(Longitude, scale = FALSE)),
    clat = as.vector(scale(Latitude, scale = FALSE)),
    Y = clong + sin(clat) + # rnorm(1,0,1) +
      1.5 * clong + -1.5 * clat
  ) |>
  mutate(Y = scales::rescale(Y, to = c(-4, -2)))

baseline_effects_sc <- baseline_sample_sc |>
  dplyr::select(Y) |>
  as.matrix()
baseline_pts_sample_sc <- inla.mesh.project(mesh,
  loc = as.matrix(spatial_grid_pts_df[, 1:2]),
  baseline_effects_sc
)
baseline_pts_effects_sc <- baseline_pts_sample_sc |>
  cbind() |>
  as.matrix() |>
  as.data.frame() |>
  cbind(spatial_grid_pts_df) |>
  pivot_longer(
    cols = c(-Longitude, -Latitude),
    names_to = c("Year"),
    names_pattern = "sample:(.*)",
    values_to = "Value"
  ) |>
  mutate(Year = as.numeric(Year))

ggplot(baseline_pts_effects_sc, aes(y = Latitude, x = Longitude)) +
  geom_tile(aes(fill = Value)) +
  scale_fill_gradientn(colors = terrain.colors(10)) +
  coord_sf(crs = 4236) +
  theme_bw(base_size = 7) +
  theme(
    axis.title = element_blank(),
    legend.position = c(0.95, 0.95),
    legend.justification = c(1, 1)
  )
ggplot(baseline_pts_effects_sc, aes(y = Latitude, x = Longitude)) +
  geom_tile(aes(fill = 100 * plogis(Value))) +
  scale_fill_gradientn("Cover (%)", colors = terrain.colors(10)) +
  coord_sf(crs = 4236) +
  theme_bw(base_size = 7) +
  theme(
    axis.title = element_blank(),
    legend.position = c(0.95, 0.95),
    legend.justification = c(1, 1)
  )

In this simulated world, macroalgae simply fills the space not occupied by hard and soft coral. Consequently, we will not generate an initial baseline for this. Rather, we will calculate it from the hard and soft coral spatio-temporal projections later.

The degree heating weeks effects represent relative spatio-temporal patterns. We start by establishing an overall temporal trend in degree heating weeks defined as:

\begin{align} cyear &= year - 1\\ dhw_i &\sim{} Beta(\alpha_i, 1)\\ log\left(\frac{\alpha_i}{1-\alpha_i}\right) &= 0.2 + cyear_i + sin(cyear_i)\\ dhw_i &= 5 * (dhw_i - min(dhw)) / range(dhw) \end{align}

set.seed(seed)
dhw_temporal <- data.frame(Year = years) %>%
  mutate(
    cYear = Year - 1, # as.vector(scale(Year, scale=FALSE)),
    Y = 0.2 * cYear + sin(cYear),
    Y = Y * rbeta(length(years), Y, 1),
    Y = scales::rescale(Y - min(Y), to = c(0, 5))
  )
dhw_temporal %>%
  ggplot(aes(y = Y, x = Year)) +
  geom_line() +
  theme_bw(base_size = 7)

Now we propagate this temporal trend across a random field with a time varying autocorrelation coefficient drawn from beta distribution with shape parameters of 0.2 and 1.

set.seed(seed)
dhw_sample <- inla.qsample(length(years),
  Q,
  seed = seed,
  constr = spde$f$extraconstr
)

rho <- rep(0.7, length(years))
rho <- rbeta(length(years), 0.2, 1)
x <- dhw_sample
for (j in 2:length(years)) {
  x[, j] <- rho[j] * x[, j - 1] + sqrt(1 - rho[j]^2) * dhw_sample[, j]
}
x <- sweep(x, 2, dhw_temporal$Y, FUN = "+")
dhw_effects <- scales::rescale(x, to = c(0, 1))
dhw_pts_sample <- inla.mesh.project(mesh,
  loc = as.matrix(spatial_grid_pts_df[, 1:2]),
  dhw_effects
)

dhw_pts_effects_df <- dhw_pts_sample %>%
  as.matrix() %>%
  as.data.frame() %>%
  cbind(spatial_grid_pts_df) %>%
  pivot_longer(
    cols = c(-Longitude, -Latitude),
    names_to = c("Year"),
    names_pattern = "sample:(.*)",
    values_to = "Value"
  ) %>%
  mutate(Year = as.numeric(Year))
## Value=scales::rescale(Value, to=c(0,1)))

## dhw.effect <- dhw.gmrf %>% mutate(Value=scales::rescale(Value, to=c(0,-0.3)))
ggplot(dhw_pts_effects_df, aes(y = Latitude, x = Longitude)) +
  geom_tile(aes(fill = Value)) +
  facet_wrap(~Year, nrow = 2) +
  scale_fill_gradientn(colors = rev(heat.colors(10))) +
  coord_sf(crs = 4236) +
  theme_bw(base_size = 12) +
  theme(axis.title = element_blank())

For each year, we calculate the probability that a cyclone has occurred somewhere in the spatial domain, the cyclone intensity and a sine wave path for the cyclone to follow through the spatial domain.

set.seed(seed)
cyc <- vector("list", length(years))

## spatial_grid_pts_df |>
##     mutate(clong = as.vector(scale(Longitude, scale=FALSE)),
##            clat = as.vector(scale(Latitude, scale=FALSE)))

for (yr in years) {
  cat(paste("Year:", yr, "\n"))
  cyc_occur <- rbinom(1, 1, prob = min(0.05 * yr^2, 0.6))
  cat(paste("Cyclone Occurance:", cyc_occur, "\n"))
  cyc_intensity <- rbeta(1, 2, 1) |> round(2)
  cat(paste("Cyclone intensity:", cyc_intensity, "\n"))
  ## cyc_spatial <- spatial_grid_pts_df  |>
  lat_offset <- runif(1, 0, 5)
  cyc_spatial <- mesh$loc[, 1:2] |>
    as.data.frame() |>
    dplyr::select(Longitude = V1, Latitude = V2) |>
    mutate(
      clong = as.vector(scale(Longitude, scale = FALSE)),
      clat = as.vector(scale(Latitude, scale = FALSE)),
      Y = lat_offset + runif(1, -1, 1) * clong + runif(1, -1, 1) *
        clat + sin(clat),
      # Y= Y - runif(1,-10,10),
      Y = abs(Y),
      Y = ifelse(Y > cyc_intensity, cyc_intensity, Y),
      Y = cyc_intensity - Y,
      Value = Y * cyc_occur
    )
  cyc[[yr]] <- cyc_spatial |> mutate(Year = yr)
}
cyc <- do.call("rbind", cyc)
cyc_effects_df <- cyc |>
  mutate(Value = scales::rescale(Value, to = c(0, 1)))

cyc_effects_df |>
  group_by(Year) |>
  summarise(
    Mean = mean(Value),
    Median = median(Value)
  ) |>
  ggplot(aes(x = Year)) +
  geom_line(aes(y = Mean), color = "blue") +
  geom_line(aes(y = Median), color = "red") +
  theme_bw(base_size = 7)

cyc_effects <- cyc_effects_df |>
  dplyr::select(-clong, -clat, -Y) |>
  pivot_wider(
    id_cols = c(Longitude, Latitude),
    names_prefix = "sample:",
    names_from = Year,
    values_from = Value
  ) |>
  dplyr::select(-Longitude, -Latitude) |>
  as.matrix()

cyc_pts_sample <- inla.mesh.project(mesh,
  loc = as.matrix(spatial_grid_pts_df[, 1:2]),
  cyc_effects
)
cyc_pts_effects <- cyc_pts_sample |>
  as.matrix() |>
  as.data.frame() |>
  cbind(spatial_grid_pts_df) |>
  pivot_longer(
    cols = c(-Longitude, -Latitude),
    names_to = c("Year"),
    names_pattern = "sample:(.*)",
    values_to = "Value"
  ) |>
  mutate(Year = as.numeric(Year))
## Value=scales::rescale(Value, to=c(0,-0.5)))

ggplot(cyc_pts_effects, aes(y = Latitude, x = Longitude)) +
  geom_tile(aes(fill = Value)) +
  facet_wrap(~Year, nrow = 2) +
  scale_fill_gradientn(colors = terrain.colors(10)) +
  coord_sf(crs = 4236) +
  theme_bw(base_size = 12) +
  theme(
    axis.title = element_blank(),
    legend.position = c(0.95, 0.95),
    legend.justification = c(1, 1)
  )

Similar to degree heating weeks, other disturbances are generated by defining local effects that are autocorrelated. This is a catchall for all other disturbances including crown of thorns, disease etc.

set.seed(seed + 1)
other_sample <- inla.qsample(length(years),
  Q,
  seed = seed + 1,
  constr = spde$f$extraconstr
)

rho <- rep(0.7, length(years))
rho <- rbeta(length(years), 0.2, 1)
x <- other_sample
for (j in 2:length(years)) {
  x[, j] <- rho[j] * x[, j - 1] + sqrt(1 - rho[j]^2) * other_sample[, j]
}
other_effects <- scales::rescale(x, to = c(0, 1))
other_pts_sample <- inla.mesh.project(mesh,
  loc = as.matrix(spatial_grid_pts_df[, 1:2]),
  other_effects
)

other_pts_effects <- other_pts_sample |>
  as.matrix() |>
  as.data.frame() |>
  cbind(spatial_grid_pts_df) |>
  pivot_longer(
    cols = c(-Longitude, -Latitude),
    names_to = c("Year"),
    names_pattern = "sample:(.*)",
    values_to = "Value"
  ) |>
  mutate(Year = as.numeric(Year)) # ,
## Value=scales::rescale(Value, to=c(0,1)))

ggplot(other_pts_effects, aes(y = Latitude, x = Longitude)) +
  geom_tile(aes(fill = Value)) +
  facet_wrap(~Year, nrow = 2) +
  scale_fill_gradientn(colors = terrain.colors(10)) +
  coord_sf(crs = 4236) +
  theme_bw(base_size = 12) +
  theme(axis.title = element_blank())

other_pts_effects |>
  group_by(Year) |>
  summarise(
    Mean = mean(Value, na.rm = TRUE),
    Median = median(Value, na.rm = TRUE)
  ) |>
  ggplot(aes(x = Year)) +
  geom_line(aes(y = Mean))

We can now compile all the effects (disturbances as well as growth) together. If we work on the link scale, we can simply calculate a cumulative sum of effects per pixel.

We will define the relative influence (annual decline weighting) of each of the disturbances as:

Degree heating weeks (0.5)
Cyclones (0.5)
All others (0.2)

In addition, we will indicate growth (annual increase) of:

Hard coral (0.3)
Soft coral (0.3)

Macrolgae will respond differently. Rather than respond directly, macroalgae will take up the remaining available space (e.g MA = Total~available~space - HCC - SC).

disturb_effects <-
  (0.5 * dhw_effects) +
  (0.4 * cyc_effects) +
  (0.1 * other_effects) |>
  as.data.frame() # |>
all_effects_df <- mesh$loc[, 1:2] |>
  as.data.frame() |>
  dplyr::rename(Longitude = V1, Latitude = V2) |>
  cbind(disturb_effects) |>
  pivot_longer(
    cols = c(-Longitude, -Latitude),
    names_to = "Year",
    names_pattern = "sample:(.*)",
    values_to = "Y"
  ) |>
  mutate(Year = factor(Year, levels = sort(unique(as.numeric(
    as.character(Year)
  ))))) |>
  group_by(Longitude, Latitude) |>
  mutate(
    Growth_HCC = 0.3, ## Add growth onto this
    Growth_SC = 0.3,
    Y_HCC = cumsum(-Y + Growth_HCC), ## cumsum on link scale will accumulate effects
    Y_SC = cumsum(-Y + Growth_SC)
  )

The marginalized temporal effects are approximately:

all_effects_df |>
  group_by(Year) |>
  summarise(
    Mean = mean(Y_HCC, na.rm = TRUE),
    Median = median(Y_HCC, na.rm = TRUE)
  ) |>
  ggplot(aes(x = as.numeric(as.character(Year)))) +
  geom_line(aes(y = Mean, color = "Mean")) +
  geom_line(aes(y = Median, color = "Median")) +
  scale_x_continuous("Year") +
  scale_y_continuous("Effect on HCC") +
  theme_bw()

And the spatio-temporal effects are:

all_effects_df |>
  ggplot(aes(y = Latitude, x = Longitude)) +
  geom_point(aes(color = Y_HCC)) +
  ## geom_tile(aes(fill = Y_HCC)) +
  facet_wrap(~Year, nrow = 2) +
  scale_color_gradient2("HCC", low = "red", high = "green", mid = "white") +
  coord_sf(crs = 4236) +
  theme_bw(base_size = 12) +
  theme(axis.title = element_blank())

all_effects <- all_effects_df |>
  pivot_wider(
    id_cols = c(Longitude, Latitude),
    names_prefix = "sample:",
    names_from = Year,
    values_from = Y_HCC
  )

## Project onto the spatial grid
disturb_pts_sample <- inla.mesh.project(mesh,
  loc = as.matrix(spatial_grid_pts_df[, 1:2]),
  all_effects |>
    ungroup() |> 
    dplyr::select(-Longitude, -Latitude) |>
    as.matrix()
)
disturb_pts_effects <- disturb_pts_sample |>
  as.matrix() |>
  as.data.frame() |>
  cbind(spatial_grid_pts_df) |>
  pivot_longer(
    cols = c(-Longitude, -Latitude),
    names_to = c("Year"),
    names_pattern = "sample:(.*)",
    values_to = "Value"
  ) |>
  mutate(Year = as.numeric(Year))

ggplot(disturb_pts_effects, aes(y = Latitude, x = Longitude)) +
  geom_tile(aes(fill = Value)) +
  facet_wrap(~Year, nrow = 2) +
  scale_fill_gradientn(colors = terrain.colors(10)) +
  coord_sf(crs = 4236) +
  theme_bw(base_size = 12) +
  theme(
    axis.title = element_blank(),
    legend.position = c(0.95, 0.95),
    legend.justification = c(1, 1)
  )

## Do all this on the link scale so that can use cumsum
all_effects_hcc <- all_effects_df |>
  full_join(baseline_sample_hcc |>
    dplyr::select(Longitude, Latitude, BASE_HCC = Y)) |>
  group_by(Longitude, Latitude) |>
  mutate(HCC = BASE_HCC + Y_HCC) |>
  ungroup() |>
  dplyr::select(-BASE_HCC, -Y_HCC) |>
  pivot_wider(
    id_cols = c(Longitude, Latitude),
    names_prefix = "sample:",
    names_from = Year,
    values_from = HCC
  ) |>
  dplyr::select(-Longitude, -Latitude) |>
  as.matrix()

all_pts_sample_hcc <- inla.mesh.project(mesh,
  loc = as.matrix(spatial_grid_pts_df[, 1:2]),
  all_effects_hcc
)
all_pts_effects_hcc <- all_pts_sample_hcc |>
  as.matrix() |>
  as.data.frame() |>
  cbind(spatial_grid_pts_df) |>
  pivot_longer(
    cols = c(-Longitude, -Latitude),
    names_to = c("Year"),
    names_pattern = "sample:(.*)",
    values_to = "Value"
  ) |>
  mutate(
    Year = as.numeric(Year),
    ## Value = Value+qlogis(0.3))
    Value = Value
  )
## Value=scales::rescale(Value, to=c(-0.5,0)))

all_pts_effects_hcc |>
  mutate(Value = plogis(Value)) |> 
  group_by(Year) |>
  summarise(Mean = mean(Value),
    Median = median(Value)) |>
  ggplot(aes(x = Year)) +
  geom_line(aes(y = Mean), color = "blue") +
  geom_line(aes(y = Median), color = "red")

all_pts_effects_hcc |>
  mutate(Value = plogis(Value)) |> 
  ggplot(aes(y = Latitude, x = Longitude)) +
  geom_tile(aes(fill = Value)) +
  scale_fill_gradientn(colors = terrain.colors(10)) + 
  coord_sf(crs = 4236) +
  facet_wrap(~Year, nrow = 2) + 
  theme_bw(base_size = 7) +
  theme(axis.title = element_blank(),
    legend.position = c(0.95,0.95),
    legend.justification = c(1,1))

## Do all this on the link scale so that can use cumsum
all_effects_sc <- all_effects_df |>
  full_join(baseline_sample_sc |> dplyr::select(Longitude, Latitude, BASE_SC=Y)) |>
  group_by(Longitude, Latitude) |>
  mutate(SC = BASE_SC + Y_SC) |>
  ungroup() |>
  dplyr::select(-BASE_SC, -Y_SC) |>
  pivot_wider(id_cols = c(Longitude, Latitude), 
    names_prefix = 'sample:',
    names_from = Year, 
    values_from = SC) |>
  dplyr::select(-Longitude, -Latitude) |>
    as.matrix()

all_pts_sample_sc <- inla.mesh.project(mesh,
  loc = as.matrix(spatial_grid_pts_df [,1:2]),
  all_effects_sc)
all_pts_effects_sc = all_pts_sample_sc |> 
  as.matrix() |> 
  as.data.frame() |>
  cbind(spatial_grid_pts_df ) |> 
  pivot_longer(cols = c(-Longitude, -Latitude),
    names_to = c('Year'),
    names_pattern = 'sample:(.*)',
    values_to = 'Value') |>
  mutate(Year = as.numeric(Year),
    ## Value = Value+qlogis(0.3))
    Value = Value)
## Value=scales::rescale(Value, to=c(-0.5,0)))

all_pts_effects_sc |>
  mutate(Value = plogis(Value)) |> 
  group_by(Year) |>
  summarise(Mean = mean(Value),
    Median = median(Value)) |>
  ggplot(aes(x = Year)) +
  geom_line(aes(y = Mean), color = "blue") +
  geom_line(aes(y = Median), color = "red")

all_pts_effects_sc |>
  mutate(Value = plogis(Value)) |> 
  ggplot(aes(y = Latitude, x = Longitude)) +
  geom_tile(aes(fill = Value)) +
  scale_fill_gradientn(colors = terrain.colors(10)) + 
  coord_sf(crs = 4236) +
  facet_wrap(~Year, nrow = 2) + 
  theme_bw(base_size = 7) +
  theme(axis.title = element_blank(),
    legend.position = c(0.95,0.95),
    legend.justification = c(1,1))

## Do all this on the link scale so that can use cumsum
all_pts_effects_ma <- all_pts_effects_hcc |>
  dplyr::rename(HCC=Value) |> 
  bind_cols(all_pts_effects_sc |>
              dplyr::select(SC=Value)) |>
  mutate(Total_Avail = 0.8 - plogis(HCC) + plogis(SC),
    ## MA = Total_Avail*rbeta(n(), 2, 1),
    MA = Total_Avail,
    Value = qlogis(MA)) |>
  dplyr::select(-HCC, -SC, -Total_Avail, -MA)

all_pts_effects_ma |>
  mutate(Value = plogis(Value)) |> 
  group_by(Year) |>
  summarise(Mean = mean(Value),
    Median = median(Value)) |>
  ggplot(aes(x = Year)) +
  geom_line(aes(y = Mean), color = "blue") +
  geom_line(aes(y = Median), color = "red")

all_pts_effects_ma |>
  mutate(Value = plogis(Value)) |> 
  ggplot(aes(y = Latitude, x = Longitude)) +
  geom_tile(aes(fill = Value)) +
  scale_fill_gradientn(colors = terrain.colors(10)) + 
  coord_sf(crs = 4236) +
  facet_wrap(~Year, nrow = 2) + 
  theme_bw(base_size = 7) +
  theme(axis.title = element_blank(),
    legend.position = c(0.95,0.95),
    legend.justification = c(1,1))

5 Broad scale reef patterns

As indicated previously, the entire spatial domain is not covered with coral reef. Rather coral reefs are usually sprinkled throughout the spatial domain. Therefore, to represent reality more closely, we should restict the broad spatial patterns to just the reefs (which are themselves the set of frames we created earlier).

Note, as the ‘reefs’ (frames) take up relatively little space within the full spatial domain, the figures in this section will be zoomed in on a specific section of the spatial domain (so as to highlight the reef fraction of the space).

rasterize the reefs frame
convert to points (centroids of raster cells)
filter to the values of 1
extract coordinates
convert to data frame

data_reefs_sf <- reefs_sf |>
  st_as_stars(dx = 0.01) |>  # rasterize
  st_as_sf(as_points = TRUE) |>
  filter(values == 1L)

data_reefs_df <- data_reefs_sf |>
  st_coordinates() |>
  as.data.frame() |>
  dplyr::rename(Longitude = X, Latitude = Y)

data_reefs_sample_hcc <- inla.mesh.project(mesh,
  loc = as.matrix(data_reefs_df[, 1:2]),
  all_effects_hcc
)
data_reefs_hcc <- data_reefs_sample_hcc |>
  as.matrix() |>
  as.data.frame() |>
  cbind(data_reefs_df) |>
  pivot_longer(
    cols = c(-Longitude, -Latitude),
    names_to = c("Year"),
    names_pattern = "sample:(.*)",
    values_to = "Value"
  ) |>
  mutate(
    Year = as.numeric(Year),
    ## Value = Value + qlogis(0.3))
    Value = Value
  )

data_reefs_pts_hcc_sf <- data_reefs_hcc |>
  st_as_sf(coords = c("Longitude", "Latitude")) |>
  st_set_crs(st_crs(data_reefs_sf))
sf_use_s2(FALSE)
data_reefs_pts_hcc_sf <- data_reefs_pts_hcc_sf |>
  st_intersection(reefs_poly_sf)
sf_use_s2(TRUE)

Given that the entire spatial domain is very large and thus it would be very difficult to discern any within individual reef variations from a plot of the entire spatial domain, we will zoom in on a small region for the purposes of illustrating the result of projecting onto the reefs frames.

sf_use_s2(FALSE)
data_reefs_pts_hcc_sf |>
  st_crop(xmin = 2.5, xmax = 3, ymin = -16.75, ymax = -16.25) |>
  mutate(Y = plogis(Value)) |>
  ggplot() +
  geom_sf(aes(color = Y)) +
  facet_wrap(~Year, nrow = 2) +
  scale_color_gradientn(colors = terrain.colors(10)) +
  coord_sf(
    crs = 4236,
    xlim = c(2.5, 3),
    ylim = c(-16.75, -16.25)
  ) +
  theme_bw(base_size = 12) +
  theme(axis.title = element_blank())

sf_use_s2(TRUE)

data_reefs_sample_sc <- inla.mesh.project(mesh,
  loc = as.matrix(data_reefs_df[, 1:2]),
  all_effects_sc
)
data_reefs_sc <- data_reefs_sample_sc |>
  as.matrix() |>
  as.data.frame() |>
  cbind(data_reefs_df) |>
  pivot_longer(
    cols = c(-Longitude, -Latitude),
    names_to = c("Year"),
    names_pattern = "sample:(.*)",
    values_to = "Value"
  ) |>
  mutate(
    Year = as.numeric(Year),
    Value = Value
  )

data_reefs_pts_sc_sf <- data_reefs_sc |>
  st_as_sf(coords = c("Longitude", "Latitude")) |>
  st_set_crs(st_crs(data_reefs_sf))
sf_use_s2(FALSE)
data_reefs_pts_sc_sf <- data_reefs_pts_sc_sf |>
  st_intersection(reefs_poly_sf)
sf_use_s2(TRUE)

sf_use_s2(FALSE)
data_reefs_pts_sc_sf |>
    st_crop(xmin = 2.5, xmax = 3, ymin = -16.75, ymax = -16.25) |>
    mutate(Y = plogis(Value)) |>
    ggplot() +
    geom_sf(aes(color = Y)) +
    facet_wrap(~Year, nrow = 2) +
    scale_color_gradientn(colors = terrain.colors(10)) + 
    coord_sf(crs = 4236, 
             xlim = c(2.5,3), 
             ylim = c(-16.75,-16.25)) +
    theme_bw(base_size = 12) +
    theme(axis.title = element_blank())

sf_use_s2(TRUE)

data_reefs_ma <- data_reefs_hcc |>
  rename(HCC = Value) |>
  full_join(data_reefs_sc |> rename(SC = Value)) |>
  mutate(
    Total_Avail = 0.8 - plogis(HCC) + plogis(SC),
    MA = Total_Avail,
    Value = qlogis(MA)
  ) |>
  dplyr::select(-HCC, -SC, -Total_Avail, -MA)

data_reefs_pts_ma_sf <- data_reefs_ma |>
  st_as_sf(coords = c("Longitude", "Latitude")) |>
  st_set_crs(st_crs(data_reefs_sf))
sf_use_s2(FALSE)
data_reefs_pts_ma_sf <- data_reefs_pts_ma_sf |>
  st_intersection(reefs_poly_sf)
sf_use_s2(TRUE)

sf_use_s2(FALSE)
data_reefs_pts_ma_sf |>
  st_crop(xmin = 2.5, xmax = 3, ymin = -16.75, ymax = -16.25) |>
  mutate(Y = plogis(Value)) |>
  ggplot() +
  geom_sf(aes(color = Y)) +
  facet_wrap(~Year, nrow = 2) +
  scale_color_gradientn(colors = terrain.colors(10)) +
  coord_sf(
    crs = 4236,
    xlim = c(2.5, 3),
    ylim = c(-16.75, -16.25)
  ) +
  theme_bw(base_size = 12) +
  theme(axis.title = element_blank())

sf_use_s2(TRUE)

6 Sampling designs (large scale components)

Before going any further, we will combine the HCC, SC and MA data together. Since each of these are projected onto the same grid, we can simply bind the columns together.

data_reefs_pts_sf <-
  data_reefs_pts_hcc_sf |>
  rename(HCC = Value) |>
  bind_cols(data_reefs_pts_sc_sf |>
              dplyr::select(SC = Value) |> st_drop_geometry()) |>
  bind_cols(data_reefs_pts_ma_sf |>
              dplyr::select(MA = Value) |> st_drop_geometry())

Rarely (if ever) can we actually saturate a spatial (and temporal) domain with sampling locations. More typically, monitoring sampling designs comprise either:

a limited number of fixed location that are repeatidly visited annually (or perhaps biennially). Such a design is thought to provide more power for exploring the temporal trends, yet the absolute values in the trend are biased towards the actual sampling locations.
an approximately set number of random locations are visited annually (or perhaps biennially). Such a design is throught to provide more accurace (less biased and thus more representative) estimates of the absolute value of the response (or at least not have a consistent bias), albeit with (potentially) less power for exploring temporal trends.

On top of this, as already described above, the full spatial domain does not consist only of reef. Indeed, much of the spatial domain is open water. Hence, sampling locations must be selected from within the reef areas.

So far we have created a large number of standard features objects and data frames. Going forward, we only need a single one: data_reefs_points_sf. This is a standard features object containing the Latitude and Longitude of the pool of potential sampling locations along with the underlying hard coral, softcoral and macroalgae cover associated with each year (for each location).

Lets start by assuming that we want to sample from two sites within each of 25 locations (reefs) annually.

nLocs <- 25 # Number of 'reefs'
nSites <- 2 # Number of 'sites' within 'reefs'

In selecting our annual sampling locations, we don’t want to have multiple locations from the one reef (we will define subsample ‘sites’ within Reefs later). Instead, we want to select 25 Reefs.

Fixed locations
Random locations

set.seed(seed)
## Start by randomly selecting nLocs Reefs
Reefs_fixed <- data_reefs_pts_sf |>
  st_drop_geometry() |>
  dplyr::select(Reef) |>
  distinct() |>
  sample_n(size = nLocs) |>
  pull(Reef)
## Then filter to these Reefs before selecting a single location within
## each of the Reefs
data_fixed_locs_sf <- data_reefs_pts_sf |>
  filter(Reef %in% Reefs_fixed) |>
  dplyr::select(Reef, geometry) |>
  distinct(.keep_all = TRUE) |> 
  group_by(Reef) |>
  sample_n(nSites) |>
  mutate(Site = paste0("S", 1:n())) |>
  ungroup() |> 
  st_join(data_reefs_pts_sf |> 
            dplyr::select(-Reef))

g <-
  data_fixed_locs_sf |>
  pivot_longer(cols = c(HCC, SC, MA),
    names_to = "Group",
    values_to = "Value") |>
  group_by(Group) |>
  nest() |>  
  mutate(G = purrr::map2(.x=data, .y=Group,
    .f=function(.x, .y) {
      .x |> st_as_sf() |>
        mutate(Value = plogis(Value)) |>
        ggplot() +
        geom_sf(data = reefs_poly_sf, color = "lightgray") +
        geom_sf(data = spatial_domain, fill = NA) +
        geom_sf(aes(color = Value, size = Value)) +
        scale_color_gradientn(colors = terrain.colors(10)) + 
        facet_wrap(~Year, nrow = 2) +
        coord_sf(crs = 4236) +
        theme_bw(base_size = 12) +
        theme(axis.title = element_blank()) +
        ggtitle(.y)
    }))

g$G |>
  patchwork::wrap_plots(nrow = 3)

data_reefs_sum <- data_reefs_pts_sf |>
  st_drop_geometry() |>
  group_by(Year) |>
  dplyr::summarise(across(c(HCC, SC, MA), mean)) |>
  pivot_longer(
    cols = c(HCC, SC, MA),
    names_to = "Group",
    values_to = "Reef_mean"
  )
data_fixed_locs_sum <- data_fixed_locs_sf |>
  st_drop_geometry() |>
  pivot_longer(
    cols = c(HCC, SC, MA),
    names_to = "Group",
    values_to = "Value"
  ) |>
  group_by(Year, Group) |>
  dplyr::summarise(mean_cl_boot(Value))

g <-
  data_reefs_sum |>
  full_join(data_fixed_locs_sum) |>
  group_by(Group) |>
  nest() |>
  mutate(G = purrr::map2(
    .x = data, .y = Group,
    .f = function(.x, .y) {
      .x |>
        mutate(across(c(y, ymin, ymax, Reef_mean), plogis)) |>
        ggplot() +
        geom_line(aes(y = Reef_mean, x = Year, color = "Reef level")) +
        geom_point(aes(y = Reef_mean, x = Year, color = "Reef level")) +
        geom_line(aes(y = y, x = Year, color = "Site level")) +
        geom_ribbon(aes(
          y = y, x = Year, ymin = ymin, ymax = ymax,
          fill = "Site level"
        ), alpha = 0.2) +
        scale_y_continuous("Cover", labels = function(x) x * 100) +
        scale_color_manual("",
          breaks = c("Reef level", "Site level"),
          values = c("red", "blue"), limits = c("Reef level", "Site level")
        ) +
        scale_fill_manual("", breaks = c("Reef level", "Site level"), values = c("red", "blue"), limits = c("Reef level", "Site level")) +
        theme_bw() +
        ggtitle(.y)
    }
  ))

g$G |> patchwork::wrap_plots(nrow = 3)

set.seed(seed)
## Start by randomly selecting nLocs different Reefs per year
Reefs_random <- data_reefs_pts_sf |>
  st_drop_geometry() |>
  dplyr::select(Reef, Year) |>
  distinct() |>
  group_by(Year) |>
  sample_n(nLocs) |>
  ungroup()
## Then select a single location within each of the reefs per year
data_random_locs_sf <- data_reefs_pts_sf |>
  right_join(Reefs_random) |>
  group_by(Reef, Year) |>
  sample_n(nSites) |>
  mutate(Site = paste0("S", 1:n())) |>
  arrange(Year, Reef) |>
  ungroup()

g <-
  data_random_locs_sf |>
  pivot_longer(
    cols = c(HCC, SC, MA),
    names_to = "Group",
    values_to = "Value"
  ) |>
  group_by(Group) |>
  nest() |>
  mutate(G = purrr::map2(.x = data, .y = Group,
    .f = function(.x, .y) {
      .x |>
        st_as_sf() |>
        mutate(Value = plogis(Value)) |>
        ggplot() +
        geom_sf(data = reefs_poly_sf, color = "lightgray") +
        geom_sf(data = spatial_domain, fill = NA) +
        geom_sf(aes(color = Value, size = Value)) +
        scale_color_gradientn(colors = terrain.colors(10)) +
        facet_wrap(~Year, nrow = 2) +
        coord_sf(crs = 4236) +
        theme_bw(base_size = 12) +
        theme(axis.title = element_blank()) +
        ggtitle(.y)
    }))

g$G |>
  patchwork::wrap_plots(nrow = 3)

data_reefs_sum <- data_reefs_pts_sf |>
  st_drop_geometry() |>
  group_by(Year) |>
  dplyr::summarise(across(c(HCC, SC, MA), mean)) |>
  pivot_longer(
    cols = c(HCC, SC, MA),
    names_to = "Group",
    values_to = "Reef_mean"
  )

data_random_locs_sum <- data_random_locs_sf |>
  st_drop_geometry() |>
  pivot_longer(cols = c(HCC, SC, MA),
    names_to = "Group",
    values_to = "Value") |>
  group_by(Year, Group) |>
  dplyr::summarise(mean_cl_boot(Value))

g <-
  data_reefs_sum |>
  full_join(data_random_locs_sum) |>
  group_by(Group) |>
  nest() |>
  mutate(G = purrr::map2(
    .x = data, .y = Group,
    .f = function(.x, .y) {
      .x |>
        mutate(across(c(y, ymin, ymax, Reef_mean), plogis)) |>
        ggplot() +
        geom_line(aes(y = Reef_mean, x = Year, color = "Reef level")) +
        geom_point(aes(y = Reef_mean, x = Year, color = "Reef level")) +
        geom_line(aes(y = y, x = Year, color = "Site level")) +
        geom_ribbon(aes(
          y = y, x = Year, ymin = ymin, ymax = ymax,
          fill = "Site level"
        ), alpha = 0.2) +
        scale_y_continuous("Cover", labels = function(x) x * 100) +
        scale_color_manual("",
          breaks = c("Reef level", "Site level"),
          values = c("red", "blue"), limits = c("Reef level", "Site level")
        ) +
        scale_fill_manual("",
          breaks = c("Reef level", "Site level"),
          values = c("red", "blue"), limits = c("Reef level", "Site level")
        ) +
        theme_bw() +
        ggtitle(.y)
    }
  ))
g$G |> patchwork::wrap_plots(nrow = 3)

Conclusions:

the fixed design yields trends in site means that are more consistent with the trends in the “true” reef level means
the accuracy of the fixed design remains relatively constant each year
bootstrap confidence intervals calculated from the fixed site level data typically contain the “true” mean values of the reef level cover
the random design yields estimates of cover that vary in their accuracy each year
indeed, the estimates of cover from the random design were rarely more accurate than those of the fixed design
the estimates from site level data in this section were all based on simple means and bootstrap intervals. It is expected that if the estimates were generated from more sophisticed statistical models, the uncertainty around the fixed design estimates would be narrower than those around the random design estimates.
hence arguably the fixed design is more suitable for depicting the temporal trends

7 Finer sampling design components

Now that we have the location (e.g Reef) cover estimates for each of hard coral, soft coral and macroalgae, we need to spread this out over the fine scale sampling design (e.g sites, transects, photos etc). That is, we want the structure of the data to resemble the structure of real collected within the reefCloud platform.

For each of the fixed and random reef/site location designs, we will define a hierarchical structure in which there are a set number (5) of fixed transects within each site, there are a set number photos (100) within each transect and a set number of points (5) within each photo.

The hierarchical structures will be illustrated via a diagram at the top of the respective sections (tabs).

In order to decompose a Reef mean into (for example) two Sites, for each Site, I will draw two random numbers from a Gaussian distribution with mean of 0 and variance equal to a set parameter (which differs between HCC, SC and MA). These numbers will be added to the Reef value so as to yield to new values (one for each Site). Similarly, to decompose these Site values into Transects values, random numbers will be drawn from a zero mean Gaussian. These calculation are performed on the logit scale before the values are transformed onto a 0-100 (percentage) scale.

To decompose into multiple depths, random numbers will be again drawn from a zero-centered Gaussian. These random draws will themselves be normalized and ordered (such that there is a relationship between depth and cover) and added to the 0-100 scaled values.

Finally, cover values will be converted into total integer counts per transect before being partitioned into Frames (photos) and Points.

Number_of_transects_per_site <- 5
Depths <- 2
Number_of_frames_per_transect <- 100
Points_per_frame <- 5


## Note, the following are on the link scale
hcc_site_sigma <- 0.5        # variability in Sites within Locations
hcc_transect_sigma <- 0.2    # variability in Transects within Sites
hcc_sigma <- 0.1             # random noise

sc_site_sigma <- 0.05        # variability in Sites within Locations
sc_transect_sigma <- 0.02    # variability in Transects within Sites
sc_sigma <- 0.01             # random noise

ma_site_sigma <- 0.5        # variability in Sites within Locations
ma_transect_sigma <- 0.2    # variability in Transects within Sites
ma_sigma <- 0.1             # random noise

Fixed locations
Random locations

The design comprises of:

a single region (_synthetic)
25 Reefs (of which _Reef118 is the first)
2 Sites (_S1 and ~S2) within each Reef
5 Transects (_T1 – _T5) within each Site
each Transect is sampled annually for 12 Years (₂₀₁₀ – ₂₀₂₁). Colours indicate different years,
in each Year, 100 photos are collected along each Transect. Although photos represent a spatial scale under Transect, unlike Transect (which is anchored at a fixed location that can be visited annually), it is not possible to guarantee that Photo 1 is in exactly the same location each year - hence Photos are treated as random within the transects.
5 Points per Photo (not depicted in diagram).

set.seed(seed)
data_fixed_locs_obs <- data_fixed_locs_sf |>
  bind_cols(data_fixed_locs_sf |>
              st_coordinates() |>
              as.data.frame() |>
              dplyr::rename(Longitude = X, Latitude = Y)) |>
  st_drop_geometry() |>
  as.data.frame() |>
  group_by(Longitude, Latitude, Reef) |>
  crossing(
    Transect = paste0("T",1:Number_of_transects_per_site)) |>
  group_by(Site, .add = TRUE) |>
  mutate(
    SiteEffects_HCC = rnorm(1, 0, hcc_site_sigma),
    SiteEffects_SC = rnorm(1, 0, sc_site_sigma),
    SiteEffects_MA = rnorm(1, 0, ma_site_sigma)
  ) |>
  group_by(Transect, .add = TRUE) |>
  mutate(
    TransectEffects_HCC = rnorm(1, 0, hcc_transect_sigma),
    TransectEffects_SC = rnorm(1, 0, sc_transect_sigma),
    TransectEffects_MA = rnorm(1, 0, ma_transect_sigma)
  ) |>
  ungroup() |>
  mutate(
    HCC1 = HCC + SiteEffects_HCC +
      TransectEffects_HCC +
      rnorm(n(), 0, hcc_sigma),
    HCC2 = 100*plogis(HCC1),
    SC1 = SC + SiteEffects_SC + TransectEffects_SC +
      rnorm(n(), 0, sc_sigma),
    SC2 = 100*plogis(SC1),
    MA1 = MA + SiteEffects_MA + TransectEffects_MA
    + rnorm(n(), 0, ma_sigma),
    MA2 = 100*plogis(MA1)
  ) |>
  arrange(Reef, Site, Transect, Year) |>
  dplyr::select(Reef, Longitude, Latitude, Site,
    Transect, Year, HCC = HCC2, SC = SC2, MA = MA2) |>
  mutate(Year = 2021 - max(years) + Year,
    Date = as.POSIXct(paste0(Year, "-01-01 14:00:00")))

## The following are on a fold scale.
## Hence a value of 0.8, indicates that 
Depth_effect_multiplier <- 2

data_fixed_locs_obs <-
  data_fixed_locs_obs |>
  tidyr::crossing(Depth = seq(3, 10, length = Depths)) |>
  pivot_longer(cols = c(HCC, SC, MA),
    names_to = "Group",
    values_to = "Value") |>
  group_by(Reef, Site, Transect, Year, Date) |>
  mutate(Value = Value + rev(sort(Depth_effect_multiplier *
                                    scale(rnorm(Depths))))) |>
  ungroup()
data_fixed_locs_obs |> head()

# A tibble: 6 × 10
  Reef   Longitude Latitude Site  Transect  Year Date                Depth Group
  <chr>      <dbl>    <dbl> <chr> <chr>    <dbl> <dttm>              <dbl> <chr>
1 Reef1…      3.96    -20.3 S1    T1        2010 2010-01-01 14:00:00     3 HCC  
2 Reef1…      3.96    -20.3 S1    T1        2010 2010-01-01 14:00:00     3 SC   
3 Reef1…      3.96    -20.3 S1    T1        2010 2010-01-01 14:00:00     3 MA   
4 Reef1…      3.96    -20.3 S1    T1        2010 2010-01-01 14:00:00    10 HCC  
5 Reef1…      3.96    -20.3 S1    T1        2010 2010-01-01 14:00:00    10 SC   
6 Reef1…      3.96    -20.3 S1    T1        2010 2010-01-01 14:00:00    10 MA   
# ℹ 1 more variable: Value <dbl>

save(data_fixed_locs_obs, file = "../data/data_fixed_locs_obs.RData")

## Need to split the percentage cover into point and frames
load(file = "../data/data_fixed_locs_obs.RData")
data_fixed_locs_obs <- data_fixed_locs_obs |>
  group_by(Reef,Site,Transect,Year,Depth,Date) |>
  mutate(Points = round(Number_of_frames_per_transect *
                          Points_per_frame *
                          (Value/sum(Value)),0),
    Points = ifelse(Points<0, 0, Points)) |>
  tidyr::uncount(Points) |>
  sample_n(n(), replace=FALSE) |>
  mutate(POINT_NO = rep_len(1:Points_per_frame, length = n()),
    ## FRAME = 1 + cumsum(POINT_NO) %/% (sum(1:Points_per_frame) + 1e-10)) |>
    FRAME = rep(1:Number_of_frames_per_transect, each=Points_per_frame, length = n())) |>
  ungroup() 

## a |> group_by(Reef, Site, Transect, Year, Depth, Group) |>
##     summarise(Count = n()) |>
##     ungroup(Group) |>
##     mutate(Total=sum(Count),
##            Cover = Count/Total)

reef_data_synthetic_fixed <-
  data_fixed_locs_obs |>
  mutate(
    project_id = 1,
    project_name = "synthetic_fixed",
    SITE_NO = str_replace(Site, "^S", "Site "),
    TRANSECT_NO = str_replace(Transect, "^T", "Transect "),
    site_name = factor(paste(Reef, SITE_NO)),
    site_id = as.numeric(site_name),
    site_latitude = Latitude,
    site_longitude = Longitude,
    site_depth = Depth,
    site_country = "synthetic Country",
    site_reef_name = factor(Reef),
    site_reef_type = NA,
    site_reef_zone = NA,
    site_code = NA,
    site_management = NA,
    survey_title = factor(paste(Reef, SITE_NO, TRANSECT_NO, format(Date, "%Y-%m-%d"))),
    survey_id = as.numeric(survey_title),
    survey_start_date = Date,
    survey_depth = Depth,
    survey_transect_number = as.numeric(str_replace(TRANSECT_NO, "Transect ", "")),
    image_name = factor(paste(survey_title, FRAME)),
    image_id = as.numeric(image_name),
    image_quality = 100,
    point_no = POINT_NO,
    point_id = as.numeric(factor(paste(image_name, POINT_NO))),
    point_machine_classification = Group
  ) |>
  dplyr::select(
    project_id,
    project_name,
    site_id,
    site_name,
    site_latitude,
    site_longitude,
    site_depth,
    site_country,
    site_reef_name,
    site_reef_type,
    site_reef_zone,
    site_code,
    site_management,
    survey_id,
    survey_title,
    survey_start_date,
    survey_depth,
    survey_transect_number,
    image_id,
    image_name,
    image_quality,
    point_id,
    point_no,
    point_machine_classification
  )
  ##   PCODE = "SYNTHETIC-fixed",
  ##   ID = 1:n(),
  ##   CRUISE_CODE = paste0("SYNTHETIC",Year),
  ##   REEF_NAME = Reef,
  ##   AIMS_REEF_NAME = Reef,
  ##   SECTOR = "synthetic",
  ##   LATITUDE = Latitude,
  ##   LONGITUDE = Longitude,
  ##   SITE_NO = Site,
  ##   TRANSECT_NO = Transect,
  ##   SITE_DEPTH = Depth,
  ##   REEF_ZONE = "-",
  ##   REPORT_YEAR = Year,
  ##   SURVEY_DATE = Date,
  ##   FRAME = paste0(PCODE, "/", REEF_NAME, "/",
  ##     REEF_ZONE, "/", SITE_NO, "/", SITE_DEPTH,
  ##     "/", TRANSECT_NO, "/", REPORT_YEAR, "/", FRAME),
  ##   POINT_NO = POINT_NO,
  ##   FAMILY = NA,
  ##   GROUP_DESC = Group,
  ##   REEFPAGE_CATEGORY = paste0(Group,"_alt")
  ## ) |>
  ## dplyr::select(PCODE, ID, CRUISE_CODE, REEF_NAME,
  ##   AIMS_REEF_NAME, SECTOR,
  ##   LATITUDE, LONGITUDE, SITE_NO, TRANSECT_NO, SITE_DEPTH,
  ##   REEF_ZONE, REPORT_YEAR, SURVEY_DATE, FRAME, POINT_NO,
  ##   FAMILY, GROUP_DESC, REEFPAGE_CATEGORY)

write_csv(reef_data_synthetic_fixed,
  file = "../data/reef_data_synthetic_fixed.csv"
)
rmarkdown::paged_table(reef_data_synthetic_fixed |> head())

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).

The design comprises of:

a single region (_synthetic)
25 Reefs (of which _Reef118 is the first) in each of 12 Years (₂₀₁₀ – ₂₀₂₁). Colours indicate different years. Note the selection of Reefs differs each Year.
2 Sites (_S1 and ~S2) within each Reef
5 Transects (_T1 – _T5) within each Site
100 photos are collected along each Transect. Although photos represent a spatial scale under Transect, it is not possible to gaurantee that Photo 1 is in exactly the same location each year - hence Photos are treated as random within the transects.
5 Points per Photo (not depicted in diagram).

set.seed(seed)
 
data_random_locs_obs <- data_random_locs_sf |>
  bind_cols(data_random_locs_sf |>
              st_coordinates() |>
              as.data.frame() |>
              dplyr::rename(Longitude = X, Latitude = Y)) |>
  st_drop_geometry() |>
  as.data.frame() |>
  group_by(Longitude, Latitude, Reef) |>
  crossing(Transect = paste0("T",
    1:Number_of_transects_per_site)) |>
  group_by(Site, .add = TRUE) |>
  mutate(
    SiteEffects_HCC = rnorm(1, 0, hcc_site_sigma),
    SiteEffects_SC = rnorm(1, 0, sc_site_sigma),
    SiteEffects_MA = rnorm(1, 0, ma_site_sigma)
  ) |>
  group_by(Transect, .add = TRUE) |>
  mutate(
    TransectEffects_HCC = rnorm(1, 0, hcc_transect_sigma),
    TransectEffects_SC = rnorm(1, 0, sc_transect_sigma),
    TransectEffects_MA = rnorm(1, 0, ma_transect_sigma)
  ) |>
  ungroup() |>
  mutate(
    HCC1 = HCC + SiteEffects_HCC + TransectEffects_HCC +
      rnorm(n(), 0, hcc_sigma),
    HCC2 = 100*plogis(HCC1),
    SC1 = SC + SiteEffects_SC + TransectEffects_SC +
      rnorm(n(), 0, sc_sigma),
    SC2 = 100*plogis(SC1),
    MA1 = MA + SiteEffects_MA + TransectEffects_MA +
      rnorm(n(), 0, ma_sigma),
    MA2 = 100*plogis(MA1)
  ) |>
  arrange(Reef, Site, Transect, Year) |>
  dplyr::select(Reef, Longitude, Latitude, Site, Transect,
    Year, HCC = HCC2, SC = SC2, MA = MA2) |>
  mutate(Year = 2021 - max(years) + Year,
    Date = as.POSIXct(paste0(Year, "-01-01 14:00:00")))

## The following are on a fold scale.
## Hence a value of 0.8, indicates that 
Depth_effect_multiplier <- 2

data_random_locs_obs <-
  data_random_locs_obs |>
  tidyr::crossing(Depth = seq(3, 10, length = Depths)) |>
  pivot_longer(
    cols = c(HCC, SC, MA),
    names_to = "Group",
    values_to = "Value"
  ) |>
  group_by(Reef, Site, Transect, Year, Date) |>
  mutate(Value = Value +
    rev(sort(Depth_effect_multiplier *
      scale(rnorm(Depths))))) |>
  ungroup()
save(data_random_locs_obs, file = "../data/data_random_locs_obs.RData")

## Need to split the percentage cover into point and frames
load(file = "../data/data_random_locs_obs.RData")
data_random_locs_obs <- data_random_locs_obs |>
  group_by(Reef, Site, Transect, Year, Depth, Date) |>
  mutate(
    Points = round(Number_of_frames_per_transect *
      Points_per_frame * (Value / sum(Value)), 0),
    Points = ifelse(Points < 0, 0, Points)
  ) |>
  tidyr::uncount(Points) |>
  sample_n(n(), replace = FALSE) |>
  mutate(
    POINT_NO = rep_len(1:Points_per_frame, length = n()),
    ## FRAME = 1 + cumsum(POINT_NO) %/% (sum(1:Points_per_frame) + 1e-10)) |>
    FRAME = rep(1:Number_of_frames_per_transect,
      each = Points_per_frame, length = n()
    )
  ) |>
  ungroup()

reef_data_synthetic_random <-
  data_random_locs_obs |>
  mutate(
    project_id = 1,
    project_name = "synthetic_fixed",
    SITE_NO = str_replace(Site, "^S", "Site "),
    TRANSECT_NO = str_replace(Transect, "^T", "Transect "),
    site_name = factor(paste(Reef, SITE_NO)),
    site_id = as.numeric(site_name),
    site_latitude = Latitude,
    site_longitude = Longitude,
    site_depth = Depth,
    site_country = "synthetic Country",
    site_reef_name = factor(Reef),
    site_reef_type = NA,
    site_reef_zone = NA,
    site_code = NA,
    site_management = NA,
    survey_title = factor(paste(Reef, SITE_NO, TRANSECT_NO, format(Date, "%Y-%m-%d"))),
    survey_id = as.numeric(survey_title),
    survey_start_date = Date,
    survey_depth = Depth,
    survey_transect_number = as.numeric(str_replace(TRANSECT_NO, "Transect ", "")),
    image_name = factor(paste(survey_title, FRAME)),
    image_id = as.numeric(image_name),
    image_quality = 100,
    point_no = POINT_NO,
    point_id = as.numeric(factor(paste(image_name, POINT_NO))),
    point_machine_classification = Group
  ) |>
  dplyr::select(
    project_id,
    project_name,
    site_id,
    site_name,
    site_latitude,
    site_longitude,
    site_depth,
    site_country,
    site_reef_name,
    site_reef_type,
    site_reef_zone,
    site_code,
    site_management,
    survey_id,
    survey_title,
    survey_start_date,
    survey_depth,
    survey_transect_number,
    image_id,
    image_name,
    image_quality,
    point_id,
    point_no,
    point_machine_classification
  )
  ##   PCODE = "SYNTHETIC-random",
  ##   ID = 1:n(),
  ##   CRUISE_CODE = paste0("SYNTHETIC", Year),
  ##   REEF_NAME = Reef,
  ##   AIMS_REEF_NAME = Reef,
  ##   SECTOR = "synthetic",
  ##   LATITUDE = Latitude,
  ##   LONGITUDE = Longitude,
  ##   SITE_NO = Site,
  ##   TRANSECT_NO = Transect,
  ##   SITE_DEPTH = Depth,
  ##   REEF_ZONE = "-",
  ##   REPORT_YEAR = Year,
  ##   SURVEY_DATE = Date,
  ##   FRAME = paste0(PCODE, "/", REEF_NAME, "/", REEF_ZONE,
  ##     "/", SITE_NO, "/", SITE_DEPTH, "/", TRANSECT_NO,
  ##     "/", REPORT_YEAR, "/", FRAME),
  ##   POINT_NO = POINT_NO,
  ##   FAMILY = NA,
  ##   GROUP_DESC = Group,
  ##   REEFPAGE_CATEGORY = paste0(Group, "_alt")
  ## ) |>
  ## dplyr::select(
  ##   PCODE, ID, CRUISE_CODE, REEF_NAME, AIMS_REEF_NAME, SECTOR,
  ##   LATITUDE, LONGITUDE, SITE_NO, TRANSECT_NO, SITE_DEPTH,
  ##   REEF_ZONE, REPORT_YEAR, SURVEY_DATE, FRAME, POINT_NO,
  ##   FAMILY, GROUP_DESC, REEFPAGE_CATEGORY
  ## )

write_csv(reef_data_synthetic_random,
  file = "../data/reef_data_synthetic_random.csv"
  )
rmarkdown::paged_table(reef_data_synthetic_random |> head())

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).

The end