Extracting individual growth constants using population growth curve model in R - r

I would like to derive individual growth rates from our growth model directly, similar to this OP and this OP.
I am working with a dataset that contains the age and weight (wt) measurements for ~2000 individuals in a population. Each individual is represented by a unique id number.
A sample of the data can be found here. Here is what the data looks like:
id age wt
1615 6 15
3468 32 61
1615 27 50
1615 60 145
6071 109 209
6071 125 207
10645 56 170
10645 118 200
I have developed a non-linear growth curve to model growth for this dataset (at the population level). It looks like this:
wt~ A*atan(k*age - t0) + m
which predicts weight (wt) for a given age and has modifiable parameters A, t0, and m. I have fit this model to the dataset at the population level using a nlme regression fit where I specified individual id as a random effect and used pdDiag to specify each parameter as uncorrelated. (Note: the random effect would need to be dropped when looking at the individual level.)
The code for this looks like:
nlme.k = nlme(wt~ A*atan(k*age - t0) + m,
data = df,
fixed = A+k+t0+m~1,
random = list(id = pdDiag(A+t0+k+m~1)), #cannot include when looking at the individual level
start = c(A = 99.31,k = 0.02667, t0 = 1.249, m = 103.8), #these values are what we are using at the population level # might need to be changed for individual models
na.action = na.omit,
control = nlmeControl(maxIter = 200, pnlsMaxIter = 10, msMaxIter = 100))
I have our population level growth model (nlme.k), but I would like to use it to derive/extract individual values for each growth constant.
How can I extract individual growth constants for each id using my population level growth model (nlme.k)? Note that I don't need it to be a solution that uses nlme, that is just the model I used for the population growth model.
Any suggestions would be appreciated!

I think this is not possible due to the nature on how random effects are designed. According to this post the effect size (your growth constant) is estimated using partial pooling. This involves using data points from other groups. Thus you can not estimate the effect size of each group (your individual id).
Strictly speaking (see here) random effects are not really a part of the model at all, but more a part of the error.
However, you can estimate the R2 for all groups together. If you want it on an individual level (e.g. parameter estiamtes for id 1), then just run the same model only on all data points of this particular individual. This give you n models with n parameter sets for n individuals.

We ended up using a few loops to do this.
Note that our answer builds off a model posted in this OP if anyone wants the background script. We will also link to the published script when it is posted.
For now - this is should give a general idea of how we did this.
#Individual fits dataframe generation
yid_list <- unique(young_inds$squirrel_id)
indf_prs <- list('df', 'squirrel_id', 'A_value', 'k_value', 'mx_value', 'my_value', 'max_grate', 'hit_asymptote', 'age_asymptote', 'ind_asymptote', 'ind_mass_asy', 'converge') #List of parameters
ind_fits <- data.frame(matrix(ncol = length(indf_prs), nrow = length(yid_list))) #Blank dataframe for all individual fits
colnames(ind_fits) <- indf_prs
#Calculates individual fits for all individuals and appends into ind_fits
for (i in 1:length(yid_list)) {
yind_df <-young_inds%>%filter(squirrel_id %in% yid_list[i]) #Extracts a dataframe for each squirrel
ind_fits[i , 'squirrel_id'] <- as.numeric(yid_list[i]) #Appends squirrel i's id into individual fits dataframe
sex_lab <- unique(yind_df$sex) #Identifies and extracts squirrel "i"s sex
mast_lab <- unique(yind_df$b_mast) #Identifies and extracts squirrel "i"s mast value
Hi_dp <- max(yind_df$wt) #Extracts the largest mass for each squirrel
ind_long <- unique(yind_df$longevity) #Extracts the individual death date
#Sets corresponding values for squirrel "i"
if (mast_lab==0 && sex_lab=="F") { #Female no mast
ind_fits[i , 'df'] <- "fnm" #Squirrel dataframe (appends into ind_fits dataframe)
df_asm <- af_asm #average asymptote value corresponding to sex
df_B_guess <- guess_df[1, "B_value"] #Inital guesses for nls fits corresponding to sex and mast sex and mast
df_k_guess <- guess_df[1, "k_value"]
df_mx_guess <- guess_df[1, "mx_value"]
df_my_guess <- guess_df[1, "my_value"]
ind_asyr <- indf_asy #growth rate at individual asymptote
} else if (mast_lab==0 && sex_lab=="M") { #Male no mast
ind_fits[i , 'df'] <- "mnm"
df_asm <- am_asm
df_B_guess <- guess_df[2, "B_value"]
df_k_guess <- guess_df[2, "k_value"]
df_mx_guess <- guess_df[2, "mx_value"]
df_my_guess <- guess_df[2, "my_value"]
ind_asyr <- indm_asy
} else if (mast_lab==1 && sex_lab=="F") { #Female mast
ind_fits[i , 'df'] <- "fma"
df_asm <- af_asm
df_B_guess <- guess_df[3, "B_value"]
df_k_guess <- guess_df[3, "k_value"]
df_mx_guess <- guess_df[3, "mx_value"]
df_my_guess <- guess_df[3, "my_value"]
ind_asyr <- indm_asy
} else if (mast_lab==1 && sex_lab=="M") { #Males mast
ind_fits[i , 'df'] <- "mma"
df_asm <- am_asm
df_B_guess <- guess_df[4, "B_value"]
df_k_guess <- guess_df[4, "k_value"]
df_mx_guess <- guess_df[4, "mx_value"]
df_my_guess <- guess_df[4, "my_value"]
ind_asyr <- indf_asy
} else { #If sex or mast is not identified or identified improperlly in the data
print("NA")
} #End of if else loop
#Arctangent
#Fits nls model to the created dataframe
nls.floop <- tryCatch({data.frame(tidy(nls(wt~ B*atan(k*(age - mx)) + my, #tryCatch lets nls have alternate results instead of "code stopping" errors
data=yind_df,
start = list(B = df_B_guess, k = df_k_guess, mx = df_mx_guess, my = df_my_guess),
control= list(maxiter = 200000, minFactor = 1/100000000))))
},
error = function(e){
nls.floop <- data.frame(c(0,0), c(0,0)) #Specifies nls.floop as a dummy dataframe if no convergence
},
warning = function(w) {
nls.floop <- data.frame(tidy(nls.floop)) #Fit is the same if warning is displayed
}) #End of nls.floop
#Creates a dummy numerical index from nls.floop for if else loop below
numeric_floop <- as.numeric(nls.floop[1, 2])
#print(numeric_floop) #Taking a look at the values. If numaric floop...
# == 0, function did not converge on iteration "i"
# != 0, function did converge on rapid "i" and code will run through calculations
if (numeric_floop != 0) {
results_DF <- nls.floop
ind_fits[i , 'converge'] <- 1 #converge = 1 for converging fit
#Extracting, calculating, and appending values into dataframe
B_value <- as.numeric(results_DF[1, "estimate"]) #B value
k_value <- as.numeric(results_DF[2, "estimate"]) #k value
mx_value <- as.numeric(results_DF[3, "estimate"]) #mx value
my_value <- as.numeric(results_DF[4, "estimate"]) #my value
A_value <- ((B_value*pi)/2)+ my_value #A value calculation
ind_fits[i , 'A_value'] <- A_value
ind_fits[i , 'k_value'] <- k_value
ind_fits[i , 'mx_value'] <- mx_value
ind_fits[i , 'my_value'] <- my_value #appends my_value into df
ind_fits[i , 'max_grate'] <- adr(mx_value, B_value, k_value, mx_value, my_value) #Calculates max growth rate
}
} #End of individual fits loop
Which gives this output:
> head(ind_fits%>%select(df, squirrel_id, A_value, k_value, mx_value, my_value))
df squirrel_id A_value k_value mx_value my_value
1 mnm 332 257.2572 0.05209824 52.26842 126.13183
2 mnm 1252 261.0728 0.02810033 42.37454 103.02102
3 mnm 3466 260.4936 0.03946594 62.27705 131.56665
4 fnm 855 437.9569 0.01347379 86.18629 158.27641
5 fnm 2409 228.7047 0.04919819 63.99252 123.63404
6 fnm 1417 196.0578 0.05035963 57.67139 99.65781
Note that you need to create a blank dataframe first before running the loops.

Related

OpenBugs error invalid integer value keeps coming up

I am trying to create a Bayesian latent class model through R software and OpenBUGS for 2 populations using 2 diagnostic tests. I have created the equations for most parameters but I also have inconclusive results. Therefore I want to expand the model to include the probability of an inconclusive result in test 1 in an infected individual and an uninfected individual, and then the same for test 2. I have been trying to create the equations for these probabilities (p1/2[1] - p1/2[4] in the model attached) but I am struggling and don't think I am quite there yet as the model won't run. Would you be able to help me with these equations? The error code I get every time I run it at the moment is invalid integer value for x1[1]
#Bayesian LCA model creation
library(R2OpenBUGS)
setwd("mywd")
model3=paste0("model3{
#multinomial model for the data
x1[1:8] ~ dmulti(p1[1:8], n1)
x2[1:8] ~ dmulti(p2[1:8], n2)
#Observed prevalence
#Pop 1 with 2 tests and unknown prevalence
p1[1] <- prev1*Se1*Se2*(1-IncT1Inf)+(1-prev1)*(1-Sp1)*(1-Sp2)*IncT1NonInf
p1[2] <- prev1*Se1*(1-Se2)*(1-IncT1NonInf)+(1-prev1)*(1-Sp1)*Sp2*IncT1Inf
p1[3] <- prev1*Se1*Se2*(1-IncT2Inf)+(1-prev1)*(1-Sp1)*(1-Sp2)*IncT2NonInf
p1[4] <- prev1*(1-Se1)*Se2*(1-IncT2NonInf)+(1-prev1)*Sp1*(1-Sp2)*IncT2Inf
p1[5] <- prev1*Se1*Se2+(1-prev1)*(1-Sp1)*(1-Sp2)
p1[6] <- prev1*Se1*(1-Se2)+(1-prev1)*(1-Sp1)*Sp2
p1[7] <- prev1*(1-Se1)*Se2+(1-prev1)*Sp1*(1-Sp2)
p1[8] <- prev1*(1-Se1)*(1-Se2)+(1-prev1)*Sp1*Sp2
#Pop 2 with 2 tests (same tests) and unknown prevalence
p2[1] <- prev2*Se1*Se2*(1-IncT1Inf)+(1-prev2)*(1-Sp1)*(1-Sp2)*IncT1NonInf
p2[2] <- prev2*Se1*(1-Se2)*(1-IncT1NonInf)+(1-prev2)*(1-Sp1)*Sp2*IncT1Inf
p2[3] <- prev2*Se1*Se2*(1-IncT2Inf)+(1-prev2)*(1-Sp1)*(1-Sp2)*IncT2NonInf
p2[4] <- prev2*(1-Se1)*Se2*(1-IncT2NonInf)+(1-prev2)*Sp1*(1-Sp2)*IncT2Inf
p2[5] <- prev2*Se1*Se2+(1-prev2)*(1-Sp1)*(1-Sp2)
p2[6] <- prev2*Se1*(1-Se2)+(1-prev2)*(1-Sp1)*Sp2
p2[7] <- prev2*(1-Se1)*Se2+(1-prev2)*Sp1*(1-Sp2)
p2[8] <- prev2*(1-Se1)*(1-Se2)+(1-prev2)*Sp1*Sp2
#Priors taken using median values from total lit search and 95th percentile provided in lit
prev1 ~ dbeta(8.89,60.26)
prev2 ~ dbeta(4.35,76.63)
Se1 ~ dbeta(14.59,0.86)
Sp1 ~ dbeta(14.95,0.86)
Se2 ~ dbeta(78.55,15.96)
Sp2 ~ dbeta(1.71,0.38)
IncT1Inf ~ dbeta(1,1)
IncT1NonInf ~ dbeta(1,1)
IncT2Inf ~ dbeta(1,1)
IncT2NonInf ~ dbeta(1,1)
}")
#write to temporary text file
write.table(model3, file="model3.txt", quote=FALSE, sep="", row.names=FALSE, col.names=FALSE)
#Data
#Sanctuary 1
n1=19
x1<-matrix(c(3,3,1,0,1,0,0,6,5),byrow=T,ncol=3,dimnames=list(c("TST+", "TST-",
"TSTInc"),c("QFT+", "QFT-", "QFTInc")))
as.numeric(x1)
x1 <- as.numeric(x1)
#Sanctuary 2
n2=12
x2<-matrix(c(0,0,0,4,8,0,0,0,0),byrow=T,ncol=3,dimnames=list(c("TST+", "TST-",
"TSTInc"),c("QFT+", "QFT-", "QFTInc")))
as.numeric(x2)
x2 <- as.numeric(x2)
#set data inputs to BUGS
dat <- list(x1=x1,n1=sum(x1),x2=x2,n2=sum(x2))
dat
#Set parameters desired to monitor
paras <- c("Se1","Sp1","Se2","Sp2","prev1","prev2","IncT1Inf","IncT1NonInf", "IncT2Inf",
"IncT2NonInf")
#Initialising values for 3 chains done using median and mean values from initial prior
calculation
inits<-list(list(Se1=0.964, Sp1=0.965, Se2=0.834, Sp2=0.969, prev1=0.125,
prev2=0.050,IncT1Inf=0.100,IncT1NonInf=0.250,IncT2Inf=0.150,IncT2Inf=0.250), list(Se1=0.965,
Sp1=0.965, Se2=0.834, Sp2=0.969, prev1=0.125, prev2=0.05,
IncT1Inf=0.200,IncT1NonInf=0.150,IncT2Inf=0.050,IncT2Inf=0.200), list(Se1=0.960,
Sp1=0.965,Se2=0.858,Sp2=0.937,IncT1Inf=0.100,IncT1NonInf=0.250,IncT2Inf=0.150,IncT2Inf=0.250)
)
#run model in R2OpenBUGS
niterations=12000
#Running with initial 12000 iterations, burning first 1000 and thinning every 10
bug.out <- bugs(dat, inits, paras, model.file="model3.txt", n.iter=niterations, n.burnin=1000,
n.thin=10, n.chains=3, saveExec=F, restart=F, debug=T, DIC=T, digits=6, codaPkg=F,
working.directory="mywd", clearWD=F, useWINE=F, WINE=NULL,
newWINE=F, WINEPATH=NULL, bugs.seed=1, summary.only=FALSE, over.relax = F)

Matrix occupancy model in nimble (R) - missing visits and years

Thanks for the help! I've got a latent variable occupancy model with p (detection) and Psi (occupancy) covariates that I set up using matrices rather than vectors for convenience. It is based on the jags model from Tobler et al. (2019) but I have converted to nimble and added detection covariates.
Most sites have 3 visits per year in each of three years, so I have a response variable array of:
y[1:n_sites , 1:n_species , 1:3 (years), 1:3 (surveys per year)]
I have a similar array of detection covariates:
Xobs[1:n_sites , 1:n_p.covariates , 1:3 (years), 1:3 (surveys per year)]
And occupancy covariates (same for each visit in a given year):
Xocc[1:n_sites , 1:n_Psi.covariates , 1:3 (years)]
The model runs fine when data are included for all sites, years, and surveys. However, some sites (e.g. 5%) were only surveyed twice in a given year, and a few (3%) were only surveyed in one or two (rather than all three) years. This results in a 'ragged array' of covariates, where some (e.g. z, estimated true presence, for a site that wasn't surveyed in a given year) covariates don't need to be estimated.
When I try to run the model with the missing surveys filled with either a) zeros or b) NAs, I get several problems:
some parameters fail to initialize (this isn't necessarily a problem though), and more importantly...
some chains for z (estimated true presence) for a given site and year get stuck on 1 (this seems to occur for site_year combinations that have fewer than 3 surveys
So my main questions is how I should be dealing with those missing values? The solution I tried (in the code below) is to index year and survey numbers for each site. For years I do this with a years-per-site vector (yrs), and for surveys per year I do this with a matrix of k[1:n_sites,1:n years] that shows how many visits there were.
[side note: this model doesn't yet deal with the non-independence of sites between years; I will deal with that once I get the basic model working but am open to suggestions]
The code below makes a reproducible example (albeit perhaps not in the most efficient way). Run as-is it consists of 'full' input data (no missing years or surveys) and runs fine (as far as I can tell). If you then comment out these lines....
##############################################################
#now to make sure model runs properly with all arrays full (no missing sites):
##############################################################
k[]=3
yrsV[]=3
...and un-comment the lines that follow it (setting options for what to use to replace missing X and Y values (NA vs. zero), then you will have a dataset that resembles our true data (and gives problems in MCMC output, although it runs).
Thank you!
library(nimble)
library(mcmcplots)
library(abind)
###########################
#create detection data
###########################
#n species
n.sp=10
#mean p per species
p.sp=(1:10)/12;p.sp
#mean Psi per species
psi.intercept=0.4 #all species
#n sites
n.sites=100
#n years each site surveyed
yrs.site=sample(c(3,3,3,2,2,1),n.sites,replace = TRUE);yrs.site
#n surveys in each year at each site
surv.yr.site=data.frame(matrix(0,n.sites,max(yrs.site)))
for (i in 1:max(yrs.site)) {
names(surv.yr.site)[i]=paste0("yr_",i)
surv.yr.site[,i]=sample(c(3,3,3,2,2,1),n.sites,replace = TRUE)
for (j in 1:n.sites) {
if (yrs.site[j]<i) {
surv.yr.site[j,i]=NA
}
}
};surv.yr.site
#make true presence/abs by year
z1=z2=z3=matrix(0,n.sites,n.sp)
for (i in 1:n.sp) {
z1[,i]=rbinom(n=n.sites,size=1,prob=psi.intercept)
z2[,i]=rbinom(n=n.sites,size=1,prob=psi.intercept)
z3[,i]=rbinom(n=n.sites,size=1,prob=psi.intercept)
};head(z1)
#make detections by year
y1=y2=y3=array(0,c(n.sites,n.sp,max(as.matrix(surv.yr.site),na.rm = TRUE)));dim(y1)
for (i in 1:max(as.matrix(surv.yr.site),na.rm = TRUE)) {
for (j in 1:n.sp) {
y1[,j,i]=z1[,j]*rbinom(n.sites,1,p.sp[j])
y2[,j,i]=z2[,j]*rbinom(n.sites,1,p.sp[j])
y3[,j,i]=z3[,j]*rbinom(n.sites,1,p.sp[j])
}
};head(y1)
#merge into y array y[sites,species,years,surveys]
y=array(0,c(n.sites,n.sp,max(yrs.site),max(as.matrix(surv.yr.site),
na.rm = TRUE)));dim(y)
for (i in 1:dim(y)[3]) {
for (j in 1:dim(y)[4]) {
y[,,i,j]=get(paste0("y",i))[,,j]
}
};head(y)
#make four p covariates for each visit (with no simulated effects)
Xobs=array(0,c(n.sites,4,max(yrs.site),max(as.matrix(surv.yr.site),
na.rm = TRUE)));dim(Xobs)
Xobs[]=rnorm(length(Xobs))
head(Xobs)
#make four Psi covariates for each year (with no simulated effect)
Xocc=array(0,c(n.sites,4,max(yrs.site)));dim(Xocc)
Xocc[]=rnorm(length(Xocc))
head(Xocc)
#generate constants
#n species
n=n.sp;n
#J sites
J=n.sites;J
#nYr years
yrsV=yrs.site;yrsV
#surveys per year per site
k=surv.yr.site
#Vocc psi covars
Vocc=ncol(Xocc);Vocc
#Vobs p covars
Vobs=ncol(Xobs);Vobs
#set n latent variables (1/4 to 1/2 n)
nlv=5
#
#############################
#nimble model
#############################
code <- nimbleCode({
#model modified from Tobler et al. 2019 by Ryan C. Burner ryan.c.burner#gmail.com
#original paper https://doi.org/10.1002/ecy.2754
#modified to include detection covariates and derived parameters
# community hyper-priors for occupancy covariates
for(m in 1:Vocc) { #n occ covars
mu.u.b[m] ~ T(dnorm(0, 0.01),-10,10)
tau.u.b[m] <- pow(sigma.u.b[m], -2)
sigma.u.b[m] ~ dunif(0, 10)
}
# community hyper-priors for detection covariates
for(m in 1:Vobs) { #n detect covars
mu.v.b[m] ~ T(dnorm(0, 0.01),-10,10)
tau.v.b[m] <- pow(sigma.v.b[m], -2)
sigma.v.b[m] ~ dunif(0, 10)
}
# Latent variables
for(j in 1:J) { #n sites
for(l in 1:nlv){ # n lv
LV[j,l] ~ dnorm(0,1)
}
}
# Latent variable coefficients with constraints
# diagonal elements positive, upper diagonal 0
for(l in 1:(nlv-1)){ # n lv
for(l2 in (l+1):nlv){ # n lv
lv.coef[l,l2] <- 0
}
}
# Sign constraints on diagonal elements
for(l in 1:nlv) { # n lv
lv.coef[l,l] ~ dunif(0,1)
}
# lower diagonal free
for(l in 2:nlv){ # n lv
for(l2 in 1:(l-1)){
lv.coef[l,l2] ~ dunif(-1,1)
}
}
# other elements free
for(i in (nlv+1):n) { # n lv
for(l in 1:nlv){ # n lv
lv.coef[i,l] ~ dunif(-1,1)
}
}
# loop over all SPECIES
for (i in 1:(n)) { #n species
# random [beta] effect priors by species for occupancy
for(m in 1:Vocc) { #n occ covars
u.b[i, m] ~ T(dnorm(mu.u.b[m], tau.u.b[m]),-10,10)
} #prior, i species by m occ covars
# random [beta] effect priors by species for detection
for(m in 1:Vobs) { #n obs covars
v.b[i, m] ~ T(dnorm(mu.v.b[m], tau.v.b[m]),-10,10)
} #prior, i species by m det covars
# loop over all SITES
for (j in 1:J) { #n sites
for (yr in 1:(yrs[j])) {
eta[j,i,yr] <- inprod(u.b[i,1:Vocc], Xocc[j, ,yr]) +
inprod(lv.coef[i,1:nlv],LV[j,1:nlv])
}
# draw from normal, constrain variance to 1
# j sites x i species
#for each year
for (yr in 1:(yrs[j])) {
u[j,i,yr] ~ dnorm(eta[j,i,yr],1/(1-sum(lv.coef[i,1:nlv]^2)))
z[j,i,yr] <- step(u[j,i,yr])
}
# loop over all VISITS (SURVEYS) in all YEARS
for (yr in 1:(yrs[j])) {
for (kv in 1:(k[j,yr])) {
# logistic model for detection
logit(p[j,i,yr,kv]) <- inprod(v.b[i,1:Vobs], Xobs[j, ,yr,kv])
#p sites x species x years x visits
#estimated p x site x species x year x survey
mu.p[j,i,yr,kv] <- p[j,i,yr,kv]*z[j,i,yr]
y[j,i,yr,kv] ~ dbern(mu.p[j,i,yr,kv])
} # i species, j sites, yr years, kv visits
}
}
}
###Derived parameters###
#proportion of sites (j) occupied in 1+ yrs by each species (i) #could divide by pool/year
for (i in 1:(n)) { #n species
for (j in 1:(J)) { #J sites
site.occ.ever.bySp[j,i] <- max(z[j,i,1:(yrs[j])]) #indexing trouble here??
}
prop.occ.ever.bySp[i] <- sum(site.occ.ever.bySp[1:J,i])/J # j x i
}
})
occ.inits = function() { #function is used so that it is repeated per chain
#for u.b., I use estimates from a glm. If NA, fill with random
u.b<-t(sapply(seq_len(ncol(y[,,,1])),
function(x) {xx=unname(coef(glm(((apply(y, c(1,2),
max,na.rm=TRUE)>0)*1)[, x] ~ Xocc[, -1,1],
family=binomial(link=logit),
na.action = 'na.omit')))
#random values replace errors
xx[is.na(xx)==TRUE]<-rnorm(length(xx[is.na(xx)==TRUE]))
return(xx)}))
colnames(u.b)<-colnames(Xocc)
rownames(u.b)<-colnames(y)
#for really high values, bring in line with priors
u.b[abs(u.b)>10]=runif(length(u.b[abs(u.b)>10]),0,5)*sign(u.b[abs(u.b)>10])
lv.coef=matrix(runif(nlv*n,-0.2,0.2),n,nlv)
diag(lv.coef)=abs(diag(lv.coef))
lv.coef[upper.tri(lv.coef)]=0
v.b=matrix(rnorm(Vobs*n),c(n,Vobs))
colnames(v.b)=colnames(Xobs)
rownames(v.b)<-colnames(y)
u=apply(y, c(1,2,3), max,na.rm=TRUE)-runif(J*n*max(yrsV),0.1,0.8)
#replace Inf/-Inf with zero (not sure if it matters)
u[is.infinite(u)]=runif(length(u[is.infinite(u)]),-0.3,-0.1)
list(
mu.u.b=rnorm(Vocc,0, 0.01),
sigma.u.b=runif(Vocc, 0, 10),
mu.v.b=rnorm(Vobs,0, 0.01),
sigma.v.b=runif(Vobs, 0, 10),
u.b=u.b,
v.b=v.b,
u=u,
lv.coef=lv.coef,
LV=matrix(rnorm(nlv*J),J,nlv)
)
}
#indicate which parameters to monitor
params <- c('u.b','v.b','mu.u.b','tau.u.b',
prop.occ.ever.bySp, #if derived parameter included in model
'mu.v.b','tau.v.b','z')
##############################################################
#now to make sure model runs properly with all arrays full (no missing sites):
##############################################################
k[]=3
yrsV[]=3
#####################################################
#alternatively, what happens if we remove the surveys that didn't happen
#(need to rerun all code above, skipping previous two lines)
#####################################################
# #choose values to pad X covars and Y responses with
# # X covars:
# #padX=NA
# padX=0
# # Y responses:
# padY=NA
# #padY=0
#
# #fill in missing surveys with pad value
# for (i in 1:max(yrs.site)) {
# for (j in 1:nrow(y)) {
# if (yrsV[j]<i) {
# y[j,,i,][]=padY
# Xobs[j,,i,][]=padX
# Xocc[j,,i][]=padX
# } else {
# for (l in 1:max(as.matrix(surv.yr.site),na.rm = TRUE)) {
# if (k[j,i]<l) {
# y[j,,i,l][]=padY
# Xobs[j,,i,l][]=padX
# }
# }
# }
# }
# }
#
# k[is.na(k)==TRUE]=padVal
##get the x and y data##
# DATA DESCRIPTION
# n is number of species
# J is number of sites
# y is detections
# k is number of surveys in each of 3 years
# yrsV is number of years surveyed for each point
# nlv is number of latent variables
# Vocc and Vobs are numbers of occ (Psi) and obs (p) covariates
# Xocc and Xobs are matrices of covariates (incl. intercept)
occ.data = list(n=n, J=J, k=k, y=y,
Xocc=Xocc,Xobs=Xobs,Vocc=Vocc,Vobs=Vobs,nlv=nlv,yrs=yrsV)#
names(occ.data)
data <- occ.data[c(4:6)];names(data)
constants=occ.data[c(1:3,7:length(occ.data))];names(constants)
#set the initial values
inits <- occ.inits()
#inspect inits
i=1
head(inits[[i]]);names(inits)[i];i=i+1
######################################
###to run the model in Nimble...
######################################
# 1) create the model object
glmmModel <- nimbleModel(code = code, data = data, constants = constants,
inits = inits, check = getNimbleOption("checkModel"))
glmmModel$initializeInfo()
# 2) compile model
CglmmModel <- compileNimble(glmmModel,showCompilerOutput = TRUE)
# 3) configure mcmc
fglmmMCMC <- configureMCMC(glmmModel)
# 4) add monitors for parameters
fglmmMCMC$addMonitors(params)
# 5) build mcmc
BglmmMCMC<- buildMCMC(fglmmMCMC)
# 6) compile model
CglmmMCMC <- compileNimble(BglmmMCMC,showCompilerOutput = TRUE)
#setup mcmc
ni <- 800
nb <- 150
nt <- 10
nc <- 2
#run mcmc
s=Sys.time();s
samples <- runMCMC(CglmmMCMC, niter = ni, nburnin = nb, nchains = nc, thin = nt,
setSeed = 3,
summary = FALSE, samplesAsCodaMCMC = TRUE)
s-Sys.time();ss=s-Sys.time()
print(ss)
beepr::beep(5)
#check for z (occupancy) parameters where chains are stuck on 1:
problem=which(colMeans(samples$chain1[,grep("z",colnames(samples$chain1))])==1)
length(problem)
#view a subset (can run this line repeatedly)
#z parameters are z[site,species,year]
#can run this line repeatedly to see random sample
samples$chain1[1:5,grep("z",colnames(samples$chain1))[sample(problem,10)]]
#to check number of years (columns) and surveys (values) in a given year:
k
#to check y (observation) input values for a given site enter number here:
#y is y[sites,species,years,surveys]
#e.g.....
y[26,10,1,]
#check for z (occupancy) parameters where chains are stuck on 0:
problem2=which(colMeans(samples$chain1[,grep("z",colnames(samples$chain1))])==0)
length(problem2)
#view a subset (can run this line repeatedly)
#z parameters are z[site,species,year]
#can run this line repeatedly to see random sample
samples$chain1[1:5,grep("z",colnames(samples$chain1))[sample(problem2,10)]]
#to check number of years (columns) and surveys (values) in a given year:
k
#to check y (observation) input values for a given site enter number here:
#y is y[sites,species,years,surveys]
#e.g.....
y[26,10,1,]
#get summaries of the chains
xx=summary(samples)
head(xx)
#here I get an error that there are NAs (from unused parameters due to ragged array??)
#get plots of chains
mcmcplot(samples)
beepr::beep(5)
#why does this error occur on so many (or all) of the z's
#(and some other parameters with my real data)
#z[1, 1, 1]. Error in bw.SJ(x, method = "ste"): sample is too sparse to find TD

How to generate a latent variable from a set of different kinds of variables with R?

For a n number of observations, I want to generate a latent variable (unobserved), I can assume or not that this variable has a specific distribution or not, from a set of other variables that proxy this latent variable. For my specific case, I want to generate latent ability, from a set of variables that proxy ability (observed ability). One variable is discrete, and exhibit normality, another is binary but very skewed, and the last one is an ordered categorical variable. This looks like my data, and I would like to estimate a response for each observation.
set.seed(123877)
# number of units
n <- 1000L
# age
age <- sample(rnorm(n, 25, 10))
# cum laude
hon <- sample(0L:1L, n, TRUE, prob = c(.9, .1) )
# prestige of university
pres <- factor(sample(1L:25L, n, TRUE), labels = 25L:1L, ordered = T)
dat <- data.frame(id=1L:n, age, hon, pres)
I found a solution, using the ltm package, here is the code:
set.seed(123877)
u.latent <- vector()
class(u.latent) <- 'try-error'
library('ltm')
while (class(u.latent)=='try-error') {
# numer of units
n <- 1000L
# age
age <- round(rnorm(n, 25, 10))
# cum laude
hon <- sample(0L:1L, n, TRUE, prob = c(.9, .1) )
# prestige of university
pres <- sample(1L:10L, n, TRUE)
# pres <-factor(pres, levels = 1L:25L, ordered = TRUE)
dat <- data.frame(age, hon, pres)
# latent variable
u.latent <- try(gpcm(dat))
}
We can test if the model fits the data:
GoF.gpcm(u.latent)
#H0 the model fits the data
#Ha: the model does not fit the data
The estimates of the latent variables are straight forward:
u.estimates <-factor.scores(u.latent)
hist(u.estimates$score.dat$z1)

Attempting to use MonteCarlo package in R

I am trying to run a Monte Carlo simulation of a difference in differences estimator, but I am running into an error. Here is the code I am running:
# Set the random seed
set.seed(1234567)
library(MonteCarlo)
#Set up problem, doing this before calling the function
# set sample size
n<- 400
# set true parameters: betas and sd of u
b0 <- 1 # intercept for control data (b0 in diffndiff)
b1 <- 1 # shift on both control and treated after treatment (b1 in
#diffndiff)
b2 <- 2 # difference between intercept on control vs. treated (b2-this is
#the level difference pre-treatment to compare to coef on treat)
b3 <- 3 # shift after treatment that is only for treated group (b3-this is
#the coefficient of interest in diffndiff)
b4 <- 0 # parallel time trend (not measured in diffndiff) biases b0,b1 but
#not b3 that we care about
b5 <- 0 # allows for treated group trend to shift after treatment (0 if
#parallel trends holds)
su <- 4 # std. dev for errors
dnd <- function(n,b0,b1,b2,b3,b4,b5,su){
#initialize a time vector (set observations equal to n)
timelength = 10
t <- c(1:timelength)
num_obs_per_period = n/timelength #allows for multiple observations in one
#time period (can simulate multiple states within one group or something)
t0 <- c(1:timelength)
for (p in 1:(num_obs_per_period-1)){
t <- c(t,t0)
}
T<- 5 #set treatment period
g <- t >T
post <- as.numeric(g)
# assign equal amounts of observations to each state to start with (would
#like to allow selection into treatment at some point)
treat <- vector()
for (m in 1:(round(n/2))){
treat <- c(treat,0)
}
for (m in 1:(round(n/2))){
treat <- c(treat,1)
}
u <- rnorm(n,0,su) #This assumes the mean error is zero
#create my y vector now from the data
y<- b0 + b1*post + b2*treat + b3*treat*post + b4*t + b5*(t-T)*treat*post +u
interaction <- treat*post
#run regression
olsres <- lm(y ~ post + treat + interaction)
olsres$coefficients
# assign the coeeficients
bhat0<- olsres$coefficients[1]
bhat1 <- olsres$coefficients[2]
bhat2<- olsres$coefficients[3]
bhat3<- olsres$coefficients[4]
bhat3_stderr <- coef(summary(olsres))[3, "Std. Error"]
#Here I will use bhat3 to conduct a t-test and determine if this was a pass
#or a fail
tval <- (bhat3-b3)/ bhat3_stderr
#decision at 5% confidence I believe (False indicates the t-stat was less
#than 1.96, and we fail to reject the null)
decision <- abs(tval) > 1.96
decision <- unname(decision)
return(list(decision))
}
#Define a parameter grid to simulate over
from <- -5
to <- 5
increment <- .25
gridparts<- c(from , to , increment)
b5_grid <- seq(from = gridparts[1], to = gridparts[2], by = gridparts[3])
parameter <- list("n" = n, "b0" = b0 , "b1" = b1 ,"b2" = b2 ,"b3" = b3 ,"b4"
=
b4 ,"b5" = b5_grid ,"su" = su)
#Now simulate this multiple times in a monte carlo setting
results <- MonteCarlo(func = dnd ,nrep = 100, param_list = parameter)
And the error that comes up is:
in results[[i]] <- array(NA, dim = c(dim_vec, nrep)) :
attempt to select less than one element in integerOneIndex
This leads me to believe that somewhere something is attempting to access the "0th" element of a vector, which doesn't exist in R as far as I understand. I don't think the part that is doing this arises from my code vs. internal to this package however, and I can't make sense of the code that runs when I run the package.
I am also open to hearing about other methods that will essentially replace simulate() from Stata.
The function passed to MonteCarlo must return a list with named components. Changing line 76 to
return(list("decision" = decision))
should work

Folding data in R for k fold cross validation based on a condition

I am working on resource selection of black-tailed deer using a random effect model as below:
suma<-read.csv("sum_active.csv", header=TRUE)
suma$bps1<-as.factor(suma$bps)
suma$sclass1<-as.factor( suma$sclass1)
suma$sclass2<-as.factor( suma$sclass2)
suma$sclass3<-as.factor( suma$sclass3)
suma$sclass4<-as.factor( suma$sclass4)
suma$sclass5<-as.factor( suma$sclass5)
suma$sclass6<-as.factor( suma$sclass6)
stdsuma<- suma
stdsuma$cc <- scale(stdsuma$cc)
stdsuma$edgedensi <- scale(stdsuma$edgedensi)
stdsuma$dem <- scale(stdsuma$dem)
stdsuma$dist <- scale(stdsuma$dist)
stdsuma$waterdst <- scale(stdsuma$waterdst)
stdsuma$vrm <- scale(stdsuma$vrm)
stdsuma$slp <- scale(stdsuma$slp)
stdsuma$sin <- scale(stdsuma$sin)
stdsuma$cos <- scale(stdsuma$cos)
stdsuma$july <- scale(stdsuma$july)
sumaf1<-glmer( UA ~ dem+cos+sin+slp+edgedensi+bps1*sclass1+vrm+(1 | ID), data=stdsuma, family=binomial, glmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 100000)))
I am running a 5 fold cross validation to test the robustness of assumption of my best model using the formula
##Divide the data
k = 5 #Folds
# sample from 1 to k, nrow times (the number of observations in the data)
data$id <- sample(1:k, nrow(data), replace = TRUE)
list <- 1:k
# prediction and testset data frames that we add to with each iteration over
# the folds
prediction <- data.frame()
testsetCopy <- data.frame()
#Creating a progress bar to know the status of CV
progress.bar <- create_progress_bar("text")
progress.bar$init(k)
PTOT=NULL
for (i in 1:k)
{
# remove rows with id i from dataframe to create training set
# select rows with id i to create test set
trainingset <- subset(data, id %in% list[-i])
testset <- subset(data, id %in% c(i))
M1 <- glmer( UA ~ dem+cos+sin+slp+edgedensi+bps1*sclass1+vrm+(1 | ID), data = trainingset, family=binomial, glmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 100000)))
P1=predict(M1, testset)
names(P1)=NULL
P1
PTOT= c(PTOT, P1)
}
However, given that my location data are nested within each individual, I am worried that when I split the dataset randomly into 5 parts the location from the same individual is getting split and placed into different folds which will impact the prediction power of my validation.
To address this problem I have created a column based on the summer ranges my individuals use (they are clustered in 5 different summer range within my study area) and want to use one cluster as the test set and randomize the rest into four folds as training set and loop it over for each cluster.
P.S. link to data: https://www.dropbox.com/s/33u1lqav8v718ah/sum_active.csv?dl=0

Resources