Matrix occupancy model in nimble (R) - missing visits and years - r

Thanks for the help! I've got a latent variable occupancy model with p (detection) and Psi (occupancy) covariates that I set up using matrices rather than vectors for convenience. It is based on the jags model from Tobler et al. (2019) but I have converted to nimble and added detection covariates.
Most sites have 3 visits per year in each of three years, so I have a response variable array of:
y[1:n_sites , 1:n_species , 1:3 (years), 1:3 (surveys per year)]
I have a similar array of detection covariates:
Xobs[1:n_sites , 1:n_p.covariates , 1:3 (years), 1:3 (surveys per year)]
And occupancy covariates (same for each visit in a given year):
Xocc[1:n_sites , 1:n_Psi.covariates , 1:3 (years)]
The model runs fine when data are included for all sites, years, and surveys. However, some sites (e.g. 5%) were only surveyed twice in a given year, and a few (3%) were only surveyed in one or two (rather than all three) years. This results in a 'ragged array' of covariates, where some (e.g. z, estimated true presence, for a site that wasn't surveyed in a given year) covariates don't need to be estimated.
When I try to run the model with the missing surveys filled with either a) zeros or b) NAs, I get several problems:
some parameters fail to initialize (this isn't necessarily a problem though), and more importantly...
some chains for z (estimated true presence) for a given site and year get stuck on 1 (this seems to occur for site_year combinations that have fewer than 3 surveys
So my main questions is how I should be dealing with those missing values? The solution I tried (in the code below) is to index year and survey numbers for each site. For years I do this with a years-per-site vector (yrs), and for surveys per year I do this with a matrix of k[1:n_sites,1:n years] that shows how many visits there were.
[side note: this model doesn't yet deal with the non-independence of sites between years; I will deal with that once I get the basic model working but am open to suggestions]
The code below makes a reproducible example (albeit perhaps not in the most efficient way). Run as-is it consists of 'full' input data (no missing years or surveys) and runs fine (as far as I can tell). If you then comment out these lines....
##############################################################
#now to make sure model runs properly with all arrays full (no missing sites):
##############################################################
k[]=3
yrsV[]=3
...and un-comment the lines that follow it (setting options for what to use to replace missing X and Y values (NA vs. zero), then you will have a dataset that resembles our true data (and gives problems in MCMC output, although it runs).
Thank you!
library(nimble)
library(mcmcplots)
library(abind)
###########################
#create detection data
###########################
#n species
n.sp=10
#mean p per species
p.sp=(1:10)/12;p.sp
#mean Psi per species
psi.intercept=0.4 #all species
#n sites
n.sites=100
#n years each site surveyed
yrs.site=sample(c(3,3,3,2,2,1),n.sites,replace = TRUE);yrs.site
#n surveys in each year at each site
surv.yr.site=data.frame(matrix(0,n.sites,max(yrs.site)))
for (i in 1:max(yrs.site)) {
names(surv.yr.site)[i]=paste0("yr_",i)
surv.yr.site[,i]=sample(c(3,3,3,2,2,1),n.sites,replace = TRUE)
for (j in 1:n.sites) {
if (yrs.site[j]<i) {
surv.yr.site[j,i]=NA
}
}
};surv.yr.site
#make true presence/abs by year
z1=z2=z3=matrix(0,n.sites,n.sp)
for (i in 1:n.sp) {
z1[,i]=rbinom(n=n.sites,size=1,prob=psi.intercept)
z2[,i]=rbinom(n=n.sites,size=1,prob=psi.intercept)
z3[,i]=rbinom(n=n.sites,size=1,prob=psi.intercept)
};head(z1)
#make detections by year
y1=y2=y3=array(0,c(n.sites,n.sp,max(as.matrix(surv.yr.site),na.rm = TRUE)));dim(y1)
for (i in 1:max(as.matrix(surv.yr.site),na.rm = TRUE)) {
for (j in 1:n.sp) {
y1[,j,i]=z1[,j]*rbinom(n.sites,1,p.sp[j])
y2[,j,i]=z2[,j]*rbinom(n.sites,1,p.sp[j])
y3[,j,i]=z3[,j]*rbinom(n.sites,1,p.sp[j])
}
};head(y1)
#merge into y array y[sites,species,years,surveys]
y=array(0,c(n.sites,n.sp,max(yrs.site),max(as.matrix(surv.yr.site),
na.rm = TRUE)));dim(y)
for (i in 1:dim(y)[3]) {
for (j in 1:dim(y)[4]) {
y[,,i,j]=get(paste0("y",i))[,,j]
}
};head(y)
#make four p covariates for each visit (with no simulated effects)
Xobs=array(0,c(n.sites,4,max(yrs.site),max(as.matrix(surv.yr.site),
na.rm = TRUE)));dim(Xobs)
Xobs[]=rnorm(length(Xobs))
head(Xobs)
#make four Psi covariates for each year (with no simulated effect)
Xocc=array(0,c(n.sites,4,max(yrs.site)));dim(Xocc)
Xocc[]=rnorm(length(Xocc))
head(Xocc)
#generate constants
#n species
n=n.sp;n
#J sites
J=n.sites;J
#nYr years
yrsV=yrs.site;yrsV
#surveys per year per site
k=surv.yr.site
#Vocc psi covars
Vocc=ncol(Xocc);Vocc
#Vobs p covars
Vobs=ncol(Xobs);Vobs
#set n latent variables (1/4 to 1/2 n)
nlv=5
#
#############################
#nimble model
#############################
code <- nimbleCode({
#model modified from Tobler et al. 2019 by Ryan C. Burner ryan.c.burner#gmail.com
#original paper https://doi.org/10.1002/ecy.2754
#modified to include detection covariates and derived parameters
# community hyper-priors for occupancy covariates
for(m in 1:Vocc) { #n occ covars
mu.u.b[m] ~ T(dnorm(0, 0.01),-10,10)
tau.u.b[m] <- pow(sigma.u.b[m], -2)
sigma.u.b[m] ~ dunif(0, 10)
}
# community hyper-priors for detection covariates
for(m in 1:Vobs) { #n detect covars
mu.v.b[m] ~ T(dnorm(0, 0.01),-10,10)
tau.v.b[m] <- pow(sigma.v.b[m], -2)
sigma.v.b[m] ~ dunif(0, 10)
}
# Latent variables
for(j in 1:J) { #n sites
for(l in 1:nlv){ # n lv
LV[j,l] ~ dnorm(0,1)
}
}
# Latent variable coefficients with constraints
# diagonal elements positive, upper diagonal 0
for(l in 1:(nlv-1)){ # n lv
for(l2 in (l+1):nlv){ # n lv
lv.coef[l,l2] <- 0
}
}
# Sign constraints on diagonal elements
for(l in 1:nlv) { # n lv
lv.coef[l,l] ~ dunif(0,1)
}
# lower diagonal free
for(l in 2:nlv){ # n lv
for(l2 in 1:(l-1)){
lv.coef[l,l2] ~ dunif(-1,1)
}
}
# other elements free
for(i in (nlv+1):n) { # n lv
for(l in 1:nlv){ # n lv
lv.coef[i,l] ~ dunif(-1,1)
}
}
# loop over all SPECIES
for (i in 1:(n)) { #n species
# random [beta] effect priors by species for occupancy
for(m in 1:Vocc) { #n occ covars
u.b[i, m] ~ T(dnorm(mu.u.b[m], tau.u.b[m]),-10,10)
} #prior, i species by m occ covars
# random [beta] effect priors by species for detection
for(m in 1:Vobs) { #n obs covars
v.b[i, m] ~ T(dnorm(mu.v.b[m], tau.v.b[m]),-10,10)
} #prior, i species by m det covars
# loop over all SITES
for (j in 1:J) { #n sites
for (yr in 1:(yrs[j])) {
eta[j,i,yr] <- inprod(u.b[i,1:Vocc], Xocc[j, ,yr]) +
inprod(lv.coef[i,1:nlv],LV[j,1:nlv])
}
# draw from normal, constrain variance to 1
# j sites x i species
#for each year
for (yr in 1:(yrs[j])) {
u[j,i,yr] ~ dnorm(eta[j,i,yr],1/(1-sum(lv.coef[i,1:nlv]^2)))
z[j,i,yr] <- step(u[j,i,yr])
}
# loop over all VISITS (SURVEYS) in all YEARS
for (yr in 1:(yrs[j])) {
for (kv in 1:(k[j,yr])) {
# logistic model for detection
logit(p[j,i,yr,kv]) <- inprod(v.b[i,1:Vobs], Xobs[j, ,yr,kv])
#p sites x species x years x visits
#estimated p x site x species x year x survey
mu.p[j,i,yr,kv] <- p[j,i,yr,kv]*z[j,i,yr]
y[j,i,yr,kv] ~ dbern(mu.p[j,i,yr,kv])
} # i species, j sites, yr years, kv visits
}
}
}
###Derived parameters###
#proportion of sites (j) occupied in 1+ yrs by each species (i) #could divide by pool/year
for (i in 1:(n)) { #n species
for (j in 1:(J)) { #J sites
site.occ.ever.bySp[j,i] <- max(z[j,i,1:(yrs[j])]) #indexing trouble here??
}
prop.occ.ever.bySp[i] <- sum(site.occ.ever.bySp[1:J,i])/J # j x i
}
})
occ.inits = function() { #function is used so that it is repeated per chain
#for u.b., I use estimates from a glm. If NA, fill with random
u.b<-t(sapply(seq_len(ncol(y[,,,1])),
function(x) {xx=unname(coef(glm(((apply(y, c(1,2),
max,na.rm=TRUE)>0)*1)[, x] ~ Xocc[, -1,1],
family=binomial(link=logit),
na.action = 'na.omit')))
#random values replace errors
xx[is.na(xx)==TRUE]<-rnorm(length(xx[is.na(xx)==TRUE]))
return(xx)}))
colnames(u.b)<-colnames(Xocc)
rownames(u.b)<-colnames(y)
#for really high values, bring in line with priors
u.b[abs(u.b)>10]=runif(length(u.b[abs(u.b)>10]),0,5)*sign(u.b[abs(u.b)>10])
lv.coef=matrix(runif(nlv*n,-0.2,0.2),n,nlv)
diag(lv.coef)=abs(diag(lv.coef))
lv.coef[upper.tri(lv.coef)]=0
v.b=matrix(rnorm(Vobs*n),c(n,Vobs))
colnames(v.b)=colnames(Xobs)
rownames(v.b)<-colnames(y)
u=apply(y, c(1,2,3), max,na.rm=TRUE)-runif(J*n*max(yrsV),0.1,0.8)
#replace Inf/-Inf with zero (not sure if it matters)
u[is.infinite(u)]=runif(length(u[is.infinite(u)]),-0.3,-0.1)
list(
mu.u.b=rnorm(Vocc,0, 0.01),
sigma.u.b=runif(Vocc, 0, 10),
mu.v.b=rnorm(Vobs,0, 0.01),
sigma.v.b=runif(Vobs, 0, 10),
u.b=u.b,
v.b=v.b,
u=u,
lv.coef=lv.coef,
LV=matrix(rnorm(nlv*J),J,nlv)
)
}
#indicate which parameters to monitor
params <- c('u.b','v.b','mu.u.b','tau.u.b',
prop.occ.ever.bySp, #if derived parameter included in model
'mu.v.b','tau.v.b','z')
##############################################################
#now to make sure model runs properly with all arrays full (no missing sites):
##############################################################
k[]=3
yrsV[]=3
#####################################################
#alternatively, what happens if we remove the surveys that didn't happen
#(need to rerun all code above, skipping previous two lines)
#####################################################
# #choose values to pad X covars and Y responses with
# # X covars:
# #padX=NA
# padX=0
# # Y responses:
# padY=NA
# #padY=0
#
# #fill in missing surveys with pad value
# for (i in 1:max(yrs.site)) {
# for (j in 1:nrow(y)) {
# if (yrsV[j]<i) {
# y[j,,i,][]=padY
# Xobs[j,,i,][]=padX
# Xocc[j,,i][]=padX
# } else {
# for (l in 1:max(as.matrix(surv.yr.site),na.rm = TRUE)) {
# if (k[j,i]<l) {
# y[j,,i,l][]=padY
# Xobs[j,,i,l][]=padX
# }
# }
# }
# }
# }
#
# k[is.na(k)==TRUE]=padVal
##get the x and y data##
# DATA DESCRIPTION
# n is number of species
# J is number of sites
# y is detections
# k is number of surveys in each of 3 years
# yrsV is number of years surveyed for each point
# nlv is number of latent variables
# Vocc and Vobs are numbers of occ (Psi) and obs (p) covariates
# Xocc and Xobs are matrices of covariates (incl. intercept)
occ.data = list(n=n, J=J, k=k, y=y,
Xocc=Xocc,Xobs=Xobs,Vocc=Vocc,Vobs=Vobs,nlv=nlv,yrs=yrsV)#
names(occ.data)
data <- occ.data[c(4:6)];names(data)
constants=occ.data[c(1:3,7:length(occ.data))];names(constants)
#set the initial values
inits <- occ.inits()
#inspect inits
i=1
head(inits[[i]]);names(inits)[i];i=i+1
######################################
###to run the model in Nimble...
######################################
# 1) create the model object
glmmModel <- nimbleModel(code = code, data = data, constants = constants,
inits = inits, check = getNimbleOption("checkModel"))
glmmModel$initializeInfo()
# 2) compile model
CglmmModel <- compileNimble(glmmModel,showCompilerOutput = TRUE)
# 3) configure mcmc
fglmmMCMC <- configureMCMC(glmmModel)
# 4) add monitors for parameters
fglmmMCMC$addMonitors(params)
# 5) build mcmc
BglmmMCMC<- buildMCMC(fglmmMCMC)
# 6) compile model
CglmmMCMC <- compileNimble(BglmmMCMC,showCompilerOutput = TRUE)
#setup mcmc
ni <- 800
nb <- 150
nt <- 10
nc <- 2
#run mcmc
s=Sys.time();s
samples <- runMCMC(CglmmMCMC, niter = ni, nburnin = nb, nchains = nc, thin = nt,
setSeed = 3,
summary = FALSE, samplesAsCodaMCMC = TRUE)
s-Sys.time();ss=s-Sys.time()
print(ss)
beepr::beep(5)
#check for z (occupancy) parameters where chains are stuck on 1:
problem=which(colMeans(samples$chain1[,grep("z",colnames(samples$chain1))])==1)
length(problem)
#view a subset (can run this line repeatedly)
#z parameters are z[site,species,year]
#can run this line repeatedly to see random sample
samples$chain1[1:5,grep("z",colnames(samples$chain1))[sample(problem,10)]]
#to check number of years (columns) and surveys (values) in a given year:
k
#to check y (observation) input values for a given site enter number here:
#y is y[sites,species,years,surveys]
#e.g.....
y[26,10,1,]
#check for z (occupancy) parameters where chains are stuck on 0:
problem2=which(colMeans(samples$chain1[,grep("z",colnames(samples$chain1))])==0)
length(problem2)
#view a subset (can run this line repeatedly)
#z parameters are z[site,species,year]
#can run this line repeatedly to see random sample
samples$chain1[1:5,grep("z",colnames(samples$chain1))[sample(problem2,10)]]
#to check number of years (columns) and surveys (values) in a given year:
k
#to check y (observation) input values for a given site enter number here:
#y is y[sites,species,years,surveys]
#e.g.....
y[26,10,1,]
#get summaries of the chains
xx=summary(samples)
head(xx)
#here I get an error that there are NAs (from unused parameters due to ragged array??)
#get plots of chains
mcmcplot(samples)
beepr::beep(5)
#why does this error occur on so many (or all) of the z's
#(and some other parameters with my real data)
#z[1, 1, 1]. Error in bw.SJ(x, method = "ste"): sample is too sparse to find TD

Related

Extracting individual growth constants using population growth curve model in R

I would like to derive individual growth rates from our growth model directly, similar to this OP and this OP.
I am working with a dataset that contains the age and weight (wt) measurements for ~2000 individuals in a population. Each individual is represented by a unique id number.
A sample of the data can be found here. Here is what the data looks like:
id age wt
1615 6 15
3468 32 61
1615 27 50
1615 60 145
6071 109 209
6071 125 207
10645 56 170
10645 118 200
I have developed a non-linear growth curve to model growth for this dataset (at the population level). It looks like this:
wt~ A*atan(k*age - t0) + m
which predicts weight (wt) for a given age and has modifiable parameters A, t0, and m. I have fit this model to the dataset at the population level using a nlme regression fit where I specified individual id as a random effect and used pdDiag to specify each parameter as uncorrelated. (Note: the random effect would need to be dropped when looking at the individual level.)
The code for this looks like:
nlme.k = nlme(wt~ A*atan(k*age - t0) + m,
data = df,
fixed = A+k+t0+m~1,
random = list(id = pdDiag(A+t0+k+m~1)), #cannot include when looking at the individual level
start = c(A = 99.31,k = 0.02667, t0 = 1.249, m = 103.8), #these values are what we are using at the population level # might need to be changed for individual models
na.action = na.omit,
control = nlmeControl(maxIter = 200, pnlsMaxIter = 10, msMaxIter = 100))
I have our population level growth model (nlme.k), but I would like to use it to derive/extract individual values for each growth constant.
How can I extract individual growth constants for each id using my population level growth model (nlme.k)? Note that I don't need it to be a solution that uses nlme, that is just the model I used for the population growth model.
Any suggestions would be appreciated!
I think this is not possible due to the nature on how random effects are designed. According to this post the effect size (your growth constant) is estimated using partial pooling. This involves using data points from other groups. Thus you can not estimate the effect size of each group (your individual id).
Strictly speaking (see here) random effects are not really a part of the model at all, but more a part of the error.
However, you can estimate the R2 for all groups together. If you want it on an individual level (e.g. parameter estiamtes for id 1), then just run the same model only on all data points of this particular individual. This give you n models with n parameter sets for n individuals.
We ended up using a few loops to do this.
Note that our answer builds off a model posted in this OP if anyone wants the background script. We will also link to the published script when it is posted.
For now - this is should give a general idea of how we did this.
#Individual fits dataframe generation
yid_list <- unique(young_inds$squirrel_id)
indf_prs <- list('df', 'squirrel_id', 'A_value', 'k_value', 'mx_value', 'my_value', 'max_grate', 'hit_asymptote', 'age_asymptote', 'ind_asymptote', 'ind_mass_asy', 'converge') #List of parameters
ind_fits <- data.frame(matrix(ncol = length(indf_prs), nrow = length(yid_list))) #Blank dataframe for all individual fits
colnames(ind_fits) <- indf_prs
#Calculates individual fits for all individuals and appends into ind_fits
for (i in 1:length(yid_list)) {
yind_df <-young_inds%>%filter(squirrel_id %in% yid_list[i]) #Extracts a dataframe for each squirrel
ind_fits[i , 'squirrel_id'] <- as.numeric(yid_list[i]) #Appends squirrel i's id into individual fits dataframe
sex_lab <- unique(yind_df$sex) #Identifies and extracts squirrel "i"s sex
mast_lab <- unique(yind_df$b_mast) #Identifies and extracts squirrel "i"s mast value
Hi_dp <- max(yind_df$wt) #Extracts the largest mass for each squirrel
ind_long <- unique(yind_df$longevity) #Extracts the individual death date
#Sets corresponding values for squirrel "i"
if (mast_lab==0 && sex_lab=="F") { #Female no mast
ind_fits[i , 'df'] <- "fnm" #Squirrel dataframe (appends into ind_fits dataframe)
df_asm <- af_asm #average asymptote value corresponding to sex
df_B_guess <- guess_df[1, "B_value"] #Inital guesses for nls fits corresponding to sex and mast sex and mast
df_k_guess <- guess_df[1, "k_value"]
df_mx_guess <- guess_df[1, "mx_value"]
df_my_guess <- guess_df[1, "my_value"]
ind_asyr <- indf_asy #growth rate at individual asymptote
} else if (mast_lab==0 && sex_lab=="M") { #Male no mast
ind_fits[i , 'df'] <- "mnm"
df_asm <- am_asm
df_B_guess <- guess_df[2, "B_value"]
df_k_guess <- guess_df[2, "k_value"]
df_mx_guess <- guess_df[2, "mx_value"]
df_my_guess <- guess_df[2, "my_value"]
ind_asyr <- indm_asy
} else if (mast_lab==1 && sex_lab=="F") { #Female mast
ind_fits[i , 'df'] <- "fma"
df_asm <- af_asm
df_B_guess <- guess_df[3, "B_value"]
df_k_guess <- guess_df[3, "k_value"]
df_mx_guess <- guess_df[3, "mx_value"]
df_my_guess <- guess_df[3, "my_value"]
ind_asyr <- indm_asy
} else if (mast_lab==1 && sex_lab=="M") { #Males mast
ind_fits[i , 'df'] <- "mma"
df_asm <- am_asm
df_B_guess <- guess_df[4, "B_value"]
df_k_guess <- guess_df[4, "k_value"]
df_mx_guess <- guess_df[4, "mx_value"]
df_my_guess <- guess_df[4, "my_value"]
ind_asyr <- indf_asy
} else { #If sex or mast is not identified or identified improperlly in the data
print("NA")
} #End of if else loop
#Arctangent
#Fits nls model to the created dataframe
nls.floop <- tryCatch({data.frame(tidy(nls(wt~ B*atan(k*(age - mx)) + my, #tryCatch lets nls have alternate results instead of "code stopping" errors
data=yind_df,
start = list(B = df_B_guess, k = df_k_guess, mx = df_mx_guess, my = df_my_guess),
control= list(maxiter = 200000, minFactor = 1/100000000))))
},
error = function(e){
nls.floop <- data.frame(c(0,0), c(0,0)) #Specifies nls.floop as a dummy dataframe if no convergence
},
warning = function(w) {
nls.floop <- data.frame(tidy(nls.floop)) #Fit is the same if warning is displayed
}) #End of nls.floop
#Creates a dummy numerical index from nls.floop for if else loop below
numeric_floop <- as.numeric(nls.floop[1, 2])
#print(numeric_floop) #Taking a look at the values. If numaric floop...
# == 0, function did not converge on iteration "i"
# != 0, function did converge on rapid "i" and code will run through calculations
if (numeric_floop != 0) {
results_DF <- nls.floop
ind_fits[i , 'converge'] <- 1 #converge = 1 for converging fit
#Extracting, calculating, and appending values into dataframe
B_value <- as.numeric(results_DF[1, "estimate"]) #B value
k_value <- as.numeric(results_DF[2, "estimate"]) #k value
mx_value <- as.numeric(results_DF[3, "estimate"]) #mx value
my_value <- as.numeric(results_DF[4, "estimate"]) #my value
A_value <- ((B_value*pi)/2)+ my_value #A value calculation
ind_fits[i , 'A_value'] <- A_value
ind_fits[i , 'k_value'] <- k_value
ind_fits[i , 'mx_value'] <- mx_value
ind_fits[i , 'my_value'] <- my_value #appends my_value into df
ind_fits[i , 'max_grate'] <- adr(mx_value, B_value, k_value, mx_value, my_value) #Calculates max growth rate
}
} #End of individual fits loop
Which gives this output:
> head(ind_fits%>%select(df, squirrel_id, A_value, k_value, mx_value, my_value))
df squirrel_id A_value k_value mx_value my_value
1 mnm 332 257.2572 0.05209824 52.26842 126.13183
2 mnm 1252 261.0728 0.02810033 42.37454 103.02102
3 mnm 3466 260.4936 0.03946594 62.27705 131.56665
4 fnm 855 437.9569 0.01347379 86.18629 158.27641
5 fnm 2409 228.7047 0.04919819 63.99252 123.63404
6 fnm 1417 196.0578 0.05035963 57.67139 99.65781
Note that you need to create a blank dataframe first before running the loops.

Realistic age structured model using ODE from the deSolve package

I am trying to simulate a realistic age structured model where all individuals could shift into the following age group at the end of the time step (and not age continuously at a given rate) using ODE from the deSolve package.
Considering for example a model with two states Susceptible (S) and Infectious (I), each state being divided in 4 age groups (S1, S2, S3, S4, and I1, I2, I3, I4), all individuals in S1 should go into S2 at the end of the time step, those in S2 should go into S3, and so on.
I tried to make this in two steps, the first by solving the ODE, the second by shifting individuals into the following age group at the end of the time step, but without success.
Below is one of my attempts :
library(deSolve)
times <- seq(from = 0, to = 100, by = 1)
n_agecat <- 4
#Initial number of individuals in each state
S_0 = c(999,rep(0,n_agecat-1))
I_0 = c(1,rep(0,n_agecat-1))
si_initial_state_values <- c(S = S_0,
I = I_0)
# Parameter values
si_parameters <- c(beta = 0.01) #contact rate assuming random mixing
si_model <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
n_agegroups <- 4
S <- state[1:n_agegroups]
I <- state[(n_agegroups+1):(2*n_agegroups)]
# Total population
N <- S+I
# Force of infection
lambda <- beta * I/N
# Solving the differential equations
dS <- -lambda * S
dI <- lambda * S
# Trying to shift all individuals into the following age group
S <- c(0,S[-n_agecat])
I <- c(0,I[-n_agecat])
return(list(c(dS, dI)))
})
}
output <- as.data.frame(ode(y = si_initial_state_values,
times = times,
func = si_model,
parms = si_parameters))
Any guidance will be much appreciated, thank you in advance!
I had a look at your model. Implementing the shift in an event function works, in principle, but the main model has still several problems:
die out: if the age groups are shifted per time step and the first element is just filled with zero, everything is shifted to the end within 4 time steps and the population dies out.
infection: in your case, the infected can only infect the same age group, so you need to summarize over the "age" groups before calculating lambda.
Finally, what is "age" group? Do you want the time since infection?
To sum up, there are several options: I would personally prefer a discrete model for such a simulation, i.e. difference equations, a age structured matrix model or an individual-based model.
If you want to keep it an ODE, I recommend to let the susceptible together as one state and to implement only the infected as stage structured.
Here a quick example, please check:
library(deSolve)
times <- seq(from = 0, to = 100, by = 1)
n_agegroups <- 14
n_agecat <- 14
# Initial number of individuals in each state
S_0 = c(999) # only one state
I_0 = c(1, rep(0,n_agecat-1)) # several stages
si_initial_state_values <- c(S = S_0,
I = I_0)
# Parameter values
si_parameters <- c(beta = 0.1) # set contact parameter to a higher value
si_model <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
S <- state[1]
I <- state[2:(n_agegroups + 1)]
# Total population
N <- S + sum(I)
# Force of infection
#lambda <- beta * I/N # old
lambda <- beta * sum(I) / N # NEW
# Solving the differential equations
dS <- -lambda * S
dI <- lambda * S
list(c(dS, c(dI, rep(0, n_agegroups-1))))
})
}
shift <- function(t, state, p) {
S <- state[1]
I <- state[2:(n_agegroups + 1)]
I <- c(0, I[-n_agecat])
c(S, I)
}
# output time steps (note: ode uses automatic simulation steps!)
times <- 1:200
# time step of events (i.e. shifting), not necessarily same as times
evt_times <- 1:200
output <- ode(y = si_initial_state_values,
times = times,
func = si_model,
parms = si_parameters,
events=list(func=shift, time=evt_times))
## default plot function
plot(output, ask=FALSE)
## plot totals
S <- output[,2]
I <- rowSums(output[, -(1:2)])
par(mfrow=c(1,2))
plot(times, S, type="l", ylim=c(0, max(S)))
lines(times, I, col="red", lwd=1)
## plot stage groups
matplot(times, output[, -(1:2)], col=rainbow(n=14), lty=1, type="l", ylab="S")
Note: This is just a technical demonstration, not a valid stage structured SIR model!

How to perform double-pair mating with R. Fixing some code issues please

I am workin on plants, and as you may know, we have to do lot of crosses to improve our varieties. One kind of cross is called "double-pair mating'. I would like to realize a double-pair mating (it means that each parent of an initial population take place in 2 differents crosses exactly) in order to create un pedigree, for future breeding calculation values.
It's just a problem of logic because I do not practice enough R. I think you could help me without understanding all that stuff.
The part bellow shows where I am stuck.
#selection of pairs
pairs <- data.frame()
for (i in 1:2) {
pairs <- rbind(pairs,(data.frame(dam=sample(pdams, npairs, replace=TRUE), sire=sample(psires, npairs, replace=TRUE))))
}
**
Here is my entire script :
##################################
####### PEDIGREE FUNCTION ########
##################################
# function to create a pedigree with dispersal
# inputs:
# nids = list of number of individuals per generation
# ngenerations = number of generations to simulate
# epm = rate of extra-pair mating (defaults to NULL, no extra-pair)
# missing = probability that one parent is missing in the pedigree
# nonb = proportion of each generation that is non-breeding
# gridsize = length of one size of (square) spatial grid
# dispmean = mean dispersal distance (lognormal)
# dispvar = variance in dispersal distance (lognormal)
pedfun<-function(nids, ngenerations, epm=NULL, missing=NULL, nonb=0.4,
gridsize=50, dispmean, dispsd){
# get list of individuals and their generations
gener<-1:ngenerations
genern <- rep(1:ngenerations, times = nids)
ID <- 1:sum(nids)
# runs on generation-by-generation basis
for(i in 1:ngenerations){
id<-ID[which(genern==i)]
dam<-rep(NA, nids[i])
sire<-rep(NA, nids[i])
Xloc<-rep(NA, nids[i])
Yloc<-rep(NA, nids[i])
# randomly allocates sex (0 = male, 1 = female)
sex<-sample(c(0,1), length(id), replace=TRUE)
# for first generation, no dams or sires are known
# so remain NA
if(i==1){
# for first generation
# spatial locations sampled at random for X and Y coordinates
Xloc<-sample(1:gridsize, length(id), replace=TRUE)
Yloc<-sample(1:gridsize, length(id), replace=TRUE)
# combine into single data frame
pedigree<-data.frame(id=id, dam=dam, sire=sire,
generation=i, sex=sex,
Xloc=Xloc, Yloc=Yloc, disp_dist=NA,
fall=0)
}else if(i>1){
# for all generations after first
# list of all possible dams and sires
# from previous generation
pdams<-pedigree$id[which(pedigree$generation==(i-1) &
pedigree$sex==1)]
psires<-pedigree$id[which(pedigree$generation==(i-1) &
pedigree$sex==0)]
# determine number of pairs
# depending on how many males and females
# and the proportion of the population that is non-breeding
npairs<-min(length(pdams), length(psires)) -
round(min(length(pdams), length(psires))*nonb)
# selects breeding males and females
pdams<-pedigree$id[which(pedigree$generation==(i-1) &
pedigree$sex==1 & pedigree$fall==0)]
psires<-pedigree$id[which(pedigree$generation==(i-1) &
pedigree$sex==0 & pedigree$fall==0)]
if(length(pdams)<npairs | length(psires)<npairs){
npairs<-min(length(pdams), length(psires))
}
#selection of pairs
pairs <- data.frame()
for (i in 1:2) {
pairs <- rbind(pairs,(data.frame(dam=sample(pdams, npairs, replace=TRUE), sire=sample(psires, npairs, replace=TRUE))))
}
**
# gives each offspring their parental pair
pairid<-as.numeric(sample(rownames(pairs),
length(id), replace=TRUE))
# gives each offspring their sex
sex<-sample(c(0,1), length(id), replace=TRUE)
# put into dataframe format
addped<-data.frame(id=id,
dam=pairs$dam[pairid],
sire=pairs$sire[pairid],
generation=i,
sex=sex)
Thx for advices !

Dimension mismatch when initalizing an array (JAGS)

Wondering if any of you know why JAGS would tell me there was a dimension mismatch with my initial values here.
I am attempting to fit a spatially explicit capture-recapture model in which I estimate a fish location (x,y) at each time step. There are M=64 individuals for T=21 time steps. This is estimated in array s, which loops though i=M and t=T drawing from two normal distributions for each coordinate--x,y. Making the dimension of s = (64,2,21).
My initial values for this array are plausible locations within suitable habitat, and is an array with dimensions 64, 2, 21.
Yet, JAGS gives me the error, Error in setParameters(init.values[[i]], i) : RUNTIME ERROR: Dimension mismatch in values supplied for s. If I simply dont initialize it, I get the same error but for a state matrix, z, with dimensions 64,21. If I also dont supply values for z, I get the error Error in node y[1,1,2] Node inconsistent with parents, where y is my array of observations, dimensions 64, 7, 21 (the second element is the #of detection gates).
Any help is very appreciated. See below for full code.
Cross-posted at SourceForge, with the initial values for s attached. Not quite sure how to post it here.
sink("mod.txt")
cat("
model{
#######################################
# lamda0 = baseline detection rate, gate dependent and stage dependent
# sigma2 = scale parameter for decline in detection prob, time dependent
# phi = survival, stage dependent
# s = activity center
# M=64 individuals, T=21 time steps
# z = true state, alive (1) or dead (0)
#######################################
for(i in 1:M){
for (j in 1:ngates){
logit(lamda0[i,j])<- beta[group[i]]+ gamma[j]
}
}
for(j in 1:ngates){
gamma[j] ~ dnorm(0,0.001)
lam0.g1[j]<- 1/(1+exp(-gamma[j]))
lam0.g2[j]<- 1/(1+exp(-gamma[j]-beta[2]))
}
beta[1]<-0
beta[2] ~dnorm(0,0.001)T(-10,10)
for (t in 1:T){
sigma2[t] ~ dgamma(0.1,0.1)
}
tauv ~ dunif(0,40)
tau<- 1/(tauv*tauv)
phi[1] ~dunif(0,1)
phi[2] ~dunif(0,1)
for(i in 1:M){
for(t in 1:(first[i]-1)){
s[i,1,t]<-0 #before first detection, not in system
s[i,2,t]<-0 #before first detection, not in system
z[i,t]<-0
}
for(t in (last[i]+1):T){
s[i,1,t]<-0
s[i,2,t]<-0
z[i,t]<-0
}
#First period, locs and states
z[i,first[i]] ~ dbern(1) #know fish is alive
s[i,1,first[i]] ~ dunif(xl,xu) #possible x,y coords
s[i,2,first[i]] ~ dunif(yl,yu)
xdex[i,first[i]]<- trunc(s[i,1,first[i]]+1)
ydex[i,first[i]]<- trunc(s[i,2,first[i]]+1)
pOK[i,first[i]] <- habmat[xdex[i,first[i]],ydex[i,first[i]]] # habitat check
OK[i,first[i]] ~ dbern(pOK[i,first[i]]) # OK[i] = 1, the ones trick
for(j in 1:ngates){
#First period, detection
d[i,j,first[i]]<-sqrt(pow((s[i,1,first[i]]-gate.locs[j,1]),2) + pow((s[i,2,first[i]]-gate.locs[j,2]),2)) #estimate distance to gate (euclid)
d2[i,j,first[i]]<-pow(d[i,j,first[i]],2)
lam_g[i,j,first[i]]<-lamda0[i,j]*exp(-d2[i,j,first[i]]/(2*sigma2[first[i]]))
y[i,j,first[i]] ~ dpois(lam_g[i,j,first[i]]) # number of captures/period/gate
}
#Subsequent periods
for(t in (first[i]+1):last[i]){
s[i,1,t] ~ dnorm(s[i,1,(t-1)],tau)T(xl, xu)
s[i,2,t] ~ dnorm(s[i,2,(t-1)],tau)T(yl, yu)
xdex[i,t]<- trunc(s[i,1,t]+1)
ydex[i,t]<- trunc(s[i,2,t]+1)
pOK[i,t] <- habmat[xdex[i,t],ydex[i,t]] # habitat check
OK[i,t] ~ dbern(pOK[i,t]) # OK[i] = 1, the ones trick
for(j in 1:ngates){
d[i,j,t]<-sqrt(pow((s[i,1,t]-gate.locs[j,1]),2) + pow((s[i,2,t]-gate.locs[j,2]),2)) #estimate distance to gate (euclid)
d2[i,j,t]<-pow(d[i,j,t],2)
lam_g[i,j,t]<-lamda0[i,j]*exp(-d2[i,j,t]/(2*sigma2[t]))
y[i,j,t] ~ dpois(lam_g[i,j,t])
}
phiUP[i,t]<-z[i,t-1]*phi[group[i]] #estimate 3-day survival rate
z[i,t] ~ dbern(phiUP[i,t])
}
}
}#model
", fill=TRUE)
sink()
OK = matrix(1, nrow=M, ncol=T)
dat<-list(y=y, first=first, habmat=habmat, group=group,
xl=xl,xu=xu,yl=yl,yu=yu,
last=last, OK = OK, M=M, T=T,
ngates=ngates,gate.locs=gate.locs)
z<-matrix(NA,M,T)
for(i in 1:M){
for(t in first[i]:last[i]){
z[i,t]<-1
}
}
s<-readRDS("s_inits.Rda")
inits<-function() {list(phi=runif(2,0,1), sigma2=runif(T,0,0.5), tauv=runif(1,0,30), s=s, z=z)}
init1<-inits()
init2<-inits()
init3<-inits()
jag.inits<-list(init1,init2,init3)
params<-c("phiUP","tauv","sigma2","s","z","beta","gamma","phi")

How to implement Aly's permutation test for comparison of variances in R?

The excerpt below is from "Permutation, Parametric and Bootstrap Tests of Hypotheses", Third Ed. by Phillip Good (pages 58-61), section 3.7.2..
I am trying to implement this permutation test in R (see further below) to compare two variances. I am thinking now about how to calculate the p-value, and whether the test allows for different alternative hypothesis (greater, less, two-sided) and I am not sure on how to proceed.
Could you shed some light on this and perhaps give me some criticism about the code? Many thanks!
# Aly's non-parametric, permutation test of equality of variances
# From "Permutation, Parametric and Bootstrap Tests of Hypotheses", Third Ed.
# by Phillip Good (pages 58-61), section 3.7.2.
# Implementation of delta statistic as defined by formula in page 60
# x_{i}, order statistics
# z = x_{i+1} - x_{i}, differences between successive order statistics
aly_delta_statistic <- function(z) {
z_length <- length(z)
m <- z_length + 1
i <- 1:z_length
sum(i*(m-i)*z)
}
aly_test_statistic <- function(sample1, sample2 = NULL, nperm = 1) {
# compute statistic based on one sample only: sample1
if(is.null(sample2)) {
sample1 <- sort(sample1)
z <- diff(sample1)
return(aly_delta_statistic(z))
}
# statistic based on randomization of the two samples
else {
m1 <- length(sample1)
m2 <- length(sample2)
# allocate a vector to save the statistic delta
statistic <- vector(mode = "numeric", length = nperm)
for(j in 1:nperm) {
# 1st stage resampling (performed only if samples sizes are different)
# larger sample is resized to the size of the smaller
if(m2 > m1) {
sample2 <- sort(sample(sample2, m1))
m <- m1
} else {
sample1 <- sort(sample(sample1, m2))
m <- m2
}
# z-values: z1 in column 1 and z2 in column 2.
z_two_samples <- matrix(c(diff(sample1), diff(sample2)), ncol = 2)
# 2nd stage resampling
z <- apply(z_two_samples, 1, sample, 1)
statistic[j] <- aly_delta_statistic(z)
}
return(statistic)
}
}

Resources