I am trying to create a Bayesian latent class model through R software and OpenBUGS for 2 populations using 2 diagnostic tests. I have created the equations for most parameters but I also have inconclusive results. Therefore I want to expand the model to include the probability of an inconclusive result in test 1 in an infected individual and an uninfected individual, and then the same for test 2. I have been trying to create the equations for these probabilities (p1/2[1] - p1/2[4] in the model attached) but I am struggling and don't think I am quite there yet as the model won't run. Would you be able to help me with these equations? The error code I get every time I run it at the moment is invalid integer value for x1[1]
#Bayesian LCA model creation
library(R2OpenBUGS)
setwd("mywd")
model3=paste0("model3{
#multinomial model for the data
x1[1:8] ~ dmulti(p1[1:8], n1)
x2[1:8] ~ dmulti(p2[1:8], n2)
#Observed prevalence
#Pop 1 with 2 tests and unknown prevalence
p1[1] <- prev1*Se1*Se2*(1-IncT1Inf)+(1-prev1)*(1-Sp1)*(1-Sp2)*IncT1NonInf
p1[2] <- prev1*Se1*(1-Se2)*(1-IncT1NonInf)+(1-prev1)*(1-Sp1)*Sp2*IncT1Inf
p1[3] <- prev1*Se1*Se2*(1-IncT2Inf)+(1-prev1)*(1-Sp1)*(1-Sp2)*IncT2NonInf
p1[4] <- prev1*(1-Se1)*Se2*(1-IncT2NonInf)+(1-prev1)*Sp1*(1-Sp2)*IncT2Inf
p1[5] <- prev1*Se1*Se2+(1-prev1)*(1-Sp1)*(1-Sp2)
p1[6] <- prev1*Se1*(1-Se2)+(1-prev1)*(1-Sp1)*Sp2
p1[7] <- prev1*(1-Se1)*Se2+(1-prev1)*Sp1*(1-Sp2)
p1[8] <- prev1*(1-Se1)*(1-Se2)+(1-prev1)*Sp1*Sp2
#Pop 2 with 2 tests (same tests) and unknown prevalence
p2[1] <- prev2*Se1*Se2*(1-IncT1Inf)+(1-prev2)*(1-Sp1)*(1-Sp2)*IncT1NonInf
p2[2] <- prev2*Se1*(1-Se2)*(1-IncT1NonInf)+(1-prev2)*(1-Sp1)*Sp2*IncT1Inf
p2[3] <- prev2*Se1*Se2*(1-IncT2Inf)+(1-prev2)*(1-Sp1)*(1-Sp2)*IncT2NonInf
p2[4] <- prev2*(1-Se1)*Se2*(1-IncT2NonInf)+(1-prev2)*Sp1*(1-Sp2)*IncT2Inf
p2[5] <- prev2*Se1*Se2+(1-prev2)*(1-Sp1)*(1-Sp2)
p2[6] <- prev2*Se1*(1-Se2)+(1-prev2)*(1-Sp1)*Sp2
p2[7] <- prev2*(1-Se1)*Se2+(1-prev2)*Sp1*(1-Sp2)
p2[8] <- prev2*(1-Se1)*(1-Se2)+(1-prev2)*Sp1*Sp2
#Priors taken using median values from total lit search and 95th percentile provided in lit
prev1 ~ dbeta(8.89,60.26)
prev2 ~ dbeta(4.35,76.63)
Se1 ~ dbeta(14.59,0.86)
Sp1 ~ dbeta(14.95,0.86)
Se2 ~ dbeta(78.55,15.96)
Sp2 ~ dbeta(1.71,0.38)
IncT1Inf ~ dbeta(1,1)
IncT1NonInf ~ dbeta(1,1)
IncT2Inf ~ dbeta(1,1)
IncT2NonInf ~ dbeta(1,1)
}")
#write to temporary text file
write.table(model3, file="model3.txt", quote=FALSE, sep="", row.names=FALSE, col.names=FALSE)
#Data
#Sanctuary 1
n1=19
x1<-matrix(c(3,3,1,0,1,0,0,6,5),byrow=T,ncol=3,dimnames=list(c("TST+", "TST-",
"TSTInc"),c("QFT+", "QFT-", "QFTInc")))
as.numeric(x1)
x1 <- as.numeric(x1)
#Sanctuary 2
n2=12
x2<-matrix(c(0,0,0,4,8,0,0,0,0),byrow=T,ncol=3,dimnames=list(c("TST+", "TST-",
"TSTInc"),c("QFT+", "QFT-", "QFTInc")))
as.numeric(x2)
x2 <- as.numeric(x2)
#set data inputs to BUGS
dat <- list(x1=x1,n1=sum(x1),x2=x2,n2=sum(x2))
dat
#Set parameters desired to monitor
paras <- c("Se1","Sp1","Se2","Sp2","prev1","prev2","IncT1Inf","IncT1NonInf", "IncT2Inf",
"IncT2NonInf")
#Initialising values for 3 chains done using median and mean values from initial prior
calculation
inits<-list(list(Se1=0.964, Sp1=0.965, Se2=0.834, Sp2=0.969, prev1=0.125,
prev2=0.050,IncT1Inf=0.100,IncT1NonInf=0.250,IncT2Inf=0.150,IncT2Inf=0.250), list(Se1=0.965,
Sp1=0.965, Se2=0.834, Sp2=0.969, prev1=0.125, prev2=0.05,
IncT1Inf=0.200,IncT1NonInf=0.150,IncT2Inf=0.050,IncT2Inf=0.200), list(Se1=0.960,
Sp1=0.965,Se2=0.858,Sp2=0.937,IncT1Inf=0.100,IncT1NonInf=0.250,IncT2Inf=0.150,IncT2Inf=0.250)
)
#run model in R2OpenBUGS
niterations=12000
#Running with initial 12000 iterations, burning first 1000 and thinning every 10
bug.out <- bugs(dat, inits, paras, model.file="model3.txt", n.iter=niterations, n.burnin=1000,
n.thin=10, n.chains=3, saveExec=F, restart=F, debug=T, DIC=T, digits=6, codaPkg=F,
working.directory="mywd", clearWD=F, useWINE=F, WINE=NULL,
newWINE=F, WINEPATH=NULL, bugs.seed=1, summary.only=FALSE, over.relax = F)
Thanks for the help! I've got a latent variable occupancy model with p (detection) and Psi (occupancy) covariates that I set up using matrices rather than vectors for convenience. It is based on the jags model from Tobler et al. (2019) but I have converted to nimble and added detection covariates.
Most sites have 3 visits per year in each of three years, so I have a response variable array of:
y[1:n_sites , 1:n_species , 1:3 (years), 1:3 (surveys per year)]
I have a similar array of detection covariates:
Xobs[1:n_sites , 1:n_p.covariates , 1:3 (years), 1:3 (surveys per year)]
And occupancy covariates (same for each visit in a given year):
Xocc[1:n_sites , 1:n_Psi.covariates , 1:3 (years)]
The model runs fine when data are included for all sites, years, and surveys. However, some sites (e.g. 5%) were only surveyed twice in a given year, and a few (3%) were only surveyed in one or two (rather than all three) years. This results in a 'ragged array' of covariates, where some (e.g. z, estimated true presence, for a site that wasn't surveyed in a given year) covariates don't need to be estimated.
When I try to run the model with the missing surveys filled with either a) zeros or b) NAs, I get several problems:
some parameters fail to initialize (this isn't necessarily a problem though), and more importantly...
some chains for z (estimated true presence) for a given site and year get stuck on 1 (this seems to occur for site_year combinations that have fewer than 3 surveys
So my main questions is how I should be dealing with those missing values? The solution I tried (in the code below) is to index year and survey numbers for each site. For years I do this with a years-per-site vector (yrs), and for surveys per year I do this with a matrix of k[1:n_sites,1:n years] that shows how many visits there were.
[side note: this model doesn't yet deal with the non-independence of sites between years; I will deal with that once I get the basic model working but am open to suggestions]
The code below makes a reproducible example (albeit perhaps not in the most efficient way). Run as-is it consists of 'full' input data (no missing years or surveys) and runs fine (as far as I can tell). If you then comment out these lines....
##############################################################
#now to make sure model runs properly with all arrays full (no missing sites):
##############################################################
k[]=3
yrsV[]=3
...and un-comment the lines that follow it (setting options for what to use to replace missing X and Y values (NA vs. zero), then you will have a dataset that resembles our true data (and gives problems in MCMC output, although it runs).
Thank you!
library(nimble)
library(mcmcplots)
library(abind)
###########################
#create detection data
###########################
#n species
n.sp=10
#mean p per species
p.sp=(1:10)/12;p.sp
#mean Psi per species
psi.intercept=0.4 #all species
#n sites
n.sites=100
#n years each site surveyed
yrs.site=sample(c(3,3,3,2,2,1),n.sites,replace = TRUE);yrs.site
#n surveys in each year at each site
surv.yr.site=data.frame(matrix(0,n.sites,max(yrs.site)))
for (i in 1:max(yrs.site)) {
names(surv.yr.site)[i]=paste0("yr_",i)
surv.yr.site[,i]=sample(c(3,3,3,2,2,1),n.sites,replace = TRUE)
for (j in 1:n.sites) {
if (yrs.site[j]<i) {
surv.yr.site[j,i]=NA
}
}
};surv.yr.site
#make true presence/abs by year
z1=z2=z3=matrix(0,n.sites,n.sp)
for (i in 1:n.sp) {
z1[,i]=rbinom(n=n.sites,size=1,prob=psi.intercept)
z2[,i]=rbinom(n=n.sites,size=1,prob=psi.intercept)
z3[,i]=rbinom(n=n.sites,size=1,prob=psi.intercept)
};head(z1)
#make detections by year
y1=y2=y3=array(0,c(n.sites,n.sp,max(as.matrix(surv.yr.site),na.rm = TRUE)));dim(y1)
for (i in 1:max(as.matrix(surv.yr.site),na.rm = TRUE)) {
for (j in 1:n.sp) {
y1[,j,i]=z1[,j]*rbinom(n.sites,1,p.sp[j])
y2[,j,i]=z2[,j]*rbinom(n.sites,1,p.sp[j])
y3[,j,i]=z3[,j]*rbinom(n.sites,1,p.sp[j])
}
};head(y1)
#merge into y array y[sites,species,years,surveys]
y=array(0,c(n.sites,n.sp,max(yrs.site),max(as.matrix(surv.yr.site),
na.rm = TRUE)));dim(y)
for (i in 1:dim(y)[3]) {
for (j in 1:dim(y)[4]) {
y[,,i,j]=get(paste0("y",i))[,,j]
}
};head(y)
#make four p covariates for each visit (with no simulated effects)
Xobs=array(0,c(n.sites,4,max(yrs.site),max(as.matrix(surv.yr.site),
na.rm = TRUE)));dim(Xobs)
Xobs[]=rnorm(length(Xobs))
head(Xobs)
#make four Psi covariates for each year (with no simulated effect)
Xocc=array(0,c(n.sites,4,max(yrs.site)));dim(Xocc)
Xocc[]=rnorm(length(Xocc))
head(Xocc)
#generate constants
#n species
n=n.sp;n
#J sites
J=n.sites;J
#nYr years
yrsV=yrs.site;yrsV
#surveys per year per site
k=surv.yr.site
#Vocc psi covars
Vocc=ncol(Xocc);Vocc
#Vobs p covars
Vobs=ncol(Xobs);Vobs
#set n latent variables (1/4 to 1/2 n)
nlv=5
#
#############################
#nimble model
#############################
code <- nimbleCode({
#model modified from Tobler et al. 2019 by Ryan C. Burner ryan.c.burner#gmail.com
#original paper https://doi.org/10.1002/ecy.2754
#modified to include detection covariates and derived parameters
# community hyper-priors for occupancy covariates
for(m in 1:Vocc) { #n occ covars
mu.u.b[m] ~ T(dnorm(0, 0.01),-10,10)
tau.u.b[m] <- pow(sigma.u.b[m], -2)
sigma.u.b[m] ~ dunif(0, 10)
}
# community hyper-priors for detection covariates
for(m in 1:Vobs) { #n detect covars
mu.v.b[m] ~ T(dnorm(0, 0.01),-10,10)
tau.v.b[m] <- pow(sigma.v.b[m], -2)
sigma.v.b[m] ~ dunif(0, 10)
}
# Latent variables
for(j in 1:J) { #n sites
for(l in 1:nlv){ # n lv
LV[j,l] ~ dnorm(0,1)
}
}
# Latent variable coefficients with constraints
# diagonal elements positive, upper diagonal 0
for(l in 1:(nlv-1)){ # n lv
for(l2 in (l+1):nlv){ # n lv
lv.coef[l,l2] <- 0
}
}
# Sign constraints on diagonal elements
for(l in 1:nlv) { # n lv
lv.coef[l,l] ~ dunif(0,1)
}
# lower diagonal free
for(l in 2:nlv){ # n lv
for(l2 in 1:(l-1)){
lv.coef[l,l2] ~ dunif(-1,1)
}
}
# other elements free
for(i in (nlv+1):n) { # n lv
for(l in 1:nlv){ # n lv
lv.coef[i,l] ~ dunif(-1,1)
}
}
# loop over all SPECIES
for (i in 1:(n)) { #n species
# random [beta] effect priors by species for occupancy
for(m in 1:Vocc) { #n occ covars
u.b[i, m] ~ T(dnorm(mu.u.b[m], tau.u.b[m]),-10,10)
} #prior, i species by m occ covars
# random [beta] effect priors by species for detection
for(m in 1:Vobs) { #n obs covars
v.b[i, m] ~ T(dnorm(mu.v.b[m], tau.v.b[m]),-10,10)
} #prior, i species by m det covars
# loop over all SITES
for (j in 1:J) { #n sites
for (yr in 1:(yrs[j])) {
eta[j,i,yr] <- inprod(u.b[i,1:Vocc], Xocc[j, ,yr]) +
inprod(lv.coef[i,1:nlv],LV[j,1:nlv])
}
# draw from normal, constrain variance to 1
# j sites x i species
#for each year
for (yr in 1:(yrs[j])) {
u[j,i,yr] ~ dnorm(eta[j,i,yr],1/(1-sum(lv.coef[i,1:nlv]^2)))
z[j,i,yr] <- step(u[j,i,yr])
}
# loop over all VISITS (SURVEYS) in all YEARS
for (yr in 1:(yrs[j])) {
for (kv in 1:(k[j,yr])) {
# logistic model for detection
logit(p[j,i,yr,kv]) <- inprod(v.b[i,1:Vobs], Xobs[j, ,yr,kv])
#p sites x species x years x visits
#estimated p x site x species x year x survey
mu.p[j,i,yr,kv] <- p[j,i,yr,kv]*z[j,i,yr]
y[j,i,yr,kv] ~ dbern(mu.p[j,i,yr,kv])
} # i species, j sites, yr years, kv visits
}
}
}
###Derived parameters###
#proportion of sites (j) occupied in 1+ yrs by each species (i) #could divide by pool/year
for (i in 1:(n)) { #n species
for (j in 1:(J)) { #J sites
site.occ.ever.bySp[j,i] <- max(z[j,i,1:(yrs[j])]) #indexing trouble here??
}
prop.occ.ever.bySp[i] <- sum(site.occ.ever.bySp[1:J,i])/J # j x i
}
})
occ.inits = function() { #function is used so that it is repeated per chain
#for u.b., I use estimates from a glm. If NA, fill with random
u.b<-t(sapply(seq_len(ncol(y[,,,1])),
function(x) {xx=unname(coef(glm(((apply(y, c(1,2),
max,na.rm=TRUE)>0)*1)[, x] ~ Xocc[, -1,1],
family=binomial(link=logit),
na.action = 'na.omit')))
#random values replace errors
xx[is.na(xx)==TRUE]<-rnorm(length(xx[is.na(xx)==TRUE]))
return(xx)}))
colnames(u.b)<-colnames(Xocc)
rownames(u.b)<-colnames(y)
#for really high values, bring in line with priors
u.b[abs(u.b)>10]=runif(length(u.b[abs(u.b)>10]),0,5)*sign(u.b[abs(u.b)>10])
lv.coef=matrix(runif(nlv*n,-0.2,0.2),n,nlv)
diag(lv.coef)=abs(diag(lv.coef))
lv.coef[upper.tri(lv.coef)]=0
v.b=matrix(rnorm(Vobs*n),c(n,Vobs))
colnames(v.b)=colnames(Xobs)
rownames(v.b)<-colnames(y)
u=apply(y, c(1,2,3), max,na.rm=TRUE)-runif(J*n*max(yrsV),0.1,0.8)
#replace Inf/-Inf with zero (not sure if it matters)
u[is.infinite(u)]=runif(length(u[is.infinite(u)]),-0.3,-0.1)
list(
mu.u.b=rnorm(Vocc,0, 0.01),
sigma.u.b=runif(Vocc, 0, 10),
mu.v.b=rnorm(Vobs,0, 0.01),
sigma.v.b=runif(Vobs, 0, 10),
u.b=u.b,
v.b=v.b,
u=u,
lv.coef=lv.coef,
LV=matrix(rnorm(nlv*J),J,nlv)
)
}
#indicate which parameters to monitor
params <- c('u.b','v.b','mu.u.b','tau.u.b',
prop.occ.ever.bySp, #if derived parameter included in model
'mu.v.b','tau.v.b','z')
##############################################################
#now to make sure model runs properly with all arrays full (no missing sites):
##############################################################
k[]=3
yrsV[]=3
#####################################################
#alternatively, what happens if we remove the surveys that didn't happen
#(need to rerun all code above, skipping previous two lines)
#####################################################
# #choose values to pad X covars and Y responses with
# # X covars:
# #padX=NA
# padX=0
# # Y responses:
# padY=NA
# #padY=0
#
# #fill in missing surveys with pad value
# for (i in 1:max(yrs.site)) {
# for (j in 1:nrow(y)) {
# if (yrsV[j]<i) {
# y[j,,i,][]=padY
# Xobs[j,,i,][]=padX
# Xocc[j,,i][]=padX
# } else {
# for (l in 1:max(as.matrix(surv.yr.site),na.rm = TRUE)) {
# if (k[j,i]<l) {
# y[j,,i,l][]=padY
# Xobs[j,,i,l][]=padX
# }
# }
# }
# }
# }
#
# k[is.na(k)==TRUE]=padVal
##get the x and y data##
# DATA DESCRIPTION
# n is number of species
# J is number of sites
# y is detections
# k is number of surveys in each of 3 years
# yrsV is number of years surveyed for each point
# nlv is number of latent variables
# Vocc and Vobs are numbers of occ (Psi) and obs (p) covariates
# Xocc and Xobs are matrices of covariates (incl. intercept)
occ.data = list(n=n, J=J, k=k, y=y,
Xocc=Xocc,Xobs=Xobs,Vocc=Vocc,Vobs=Vobs,nlv=nlv,yrs=yrsV)#
names(occ.data)
data <- occ.data[c(4:6)];names(data)
constants=occ.data[c(1:3,7:length(occ.data))];names(constants)
#set the initial values
inits <- occ.inits()
#inspect inits
i=1
head(inits[[i]]);names(inits)[i];i=i+1
######################################
###to run the model in Nimble...
######################################
# 1) create the model object
glmmModel <- nimbleModel(code = code, data = data, constants = constants,
inits = inits, check = getNimbleOption("checkModel"))
glmmModel$initializeInfo()
# 2) compile model
CglmmModel <- compileNimble(glmmModel,showCompilerOutput = TRUE)
# 3) configure mcmc
fglmmMCMC <- configureMCMC(glmmModel)
# 4) add monitors for parameters
fglmmMCMC$addMonitors(params)
# 5) build mcmc
BglmmMCMC<- buildMCMC(fglmmMCMC)
# 6) compile model
CglmmMCMC <- compileNimble(BglmmMCMC,showCompilerOutput = TRUE)
#setup mcmc
ni <- 800
nb <- 150
nt <- 10
nc <- 2
#run mcmc
s=Sys.time();s
samples <- runMCMC(CglmmMCMC, niter = ni, nburnin = nb, nchains = nc, thin = nt,
setSeed = 3,
summary = FALSE, samplesAsCodaMCMC = TRUE)
s-Sys.time();ss=s-Sys.time()
print(ss)
beepr::beep(5)
#check for z (occupancy) parameters where chains are stuck on 1:
problem=which(colMeans(samples$chain1[,grep("z",colnames(samples$chain1))])==1)
length(problem)
#view a subset (can run this line repeatedly)
#z parameters are z[site,species,year]
#can run this line repeatedly to see random sample
samples$chain1[1:5,grep("z",colnames(samples$chain1))[sample(problem,10)]]
#to check number of years (columns) and surveys (values) in a given year:
k
#to check y (observation) input values for a given site enter number here:
#y is y[sites,species,years,surveys]
#e.g.....
y[26,10,1,]
#check for z (occupancy) parameters where chains are stuck on 0:
problem2=which(colMeans(samples$chain1[,grep("z",colnames(samples$chain1))])==0)
length(problem2)
#view a subset (can run this line repeatedly)
#z parameters are z[site,species,year]
#can run this line repeatedly to see random sample
samples$chain1[1:5,grep("z",colnames(samples$chain1))[sample(problem2,10)]]
#to check number of years (columns) and surveys (values) in a given year:
k
#to check y (observation) input values for a given site enter number here:
#y is y[sites,species,years,surveys]
#e.g.....
y[26,10,1,]
#get summaries of the chains
xx=summary(samples)
head(xx)
#here I get an error that there are NAs (from unused parameters due to ragged array??)
#get plots of chains
mcmcplot(samples)
beepr::beep(5)
#why does this error occur on so many (or all) of the z's
#(and some other parameters with my real data)
#z[1, 1, 1]. Error in bw.SJ(x, method = "ste"): sample is too sparse to find TD
For a n number of observations, I want to generate a latent variable (unobserved), I can assume or not that this variable has a specific distribution or not, from a set of other variables that proxy this latent variable. For my specific case, I want to generate latent ability, from a set of variables that proxy ability (observed ability). One variable is discrete, and exhibit normality, another is binary but very skewed, and the last one is an ordered categorical variable. This looks like my data, and I would like to estimate a response for each observation.
set.seed(123877)
# number of units
n <- 1000L
# age
age <- sample(rnorm(n, 25, 10))
# cum laude
hon <- sample(0L:1L, n, TRUE, prob = c(.9, .1) )
# prestige of university
pres <- factor(sample(1L:25L, n, TRUE), labels = 25L:1L, ordered = T)
dat <- data.frame(id=1L:n, age, hon, pres)
I found a solution, using the ltm package, here is the code:
set.seed(123877)
u.latent <- vector()
class(u.latent) <- 'try-error'
library('ltm')
while (class(u.latent)=='try-error') {
# numer of units
n <- 1000L
# age
age <- round(rnorm(n, 25, 10))
# cum laude
hon <- sample(0L:1L, n, TRUE, prob = c(.9, .1) )
# prestige of university
pres <- sample(1L:10L, n, TRUE)
# pres <-factor(pres, levels = 1L:25L, ordered = TRUE)
dat <- data.frame(age, hon, pres)
# latent variable
u.latent <- try(gpcm(dat))
}
We can test if the model fits the data:
GoF.gpcm(u.latent)
#H0 the model fits the data
#Ha: the model does not fit the data
The estimates of the latent variables are straight forward:
u.estimates <-factor.scores(u.latent)
hist(u.estimates$score.dat$z1)
I am trying to run a Monte Carlo simulation of a difference in differences estimator, but I am running into an error. Here is the code I am running:
# Set the random seed
set.seed(1234567)
library(MonteCarlo)
#Set up problem, doing this before calling the function
# set sample size
n<- 400
# set true parameters: betas and sd of u
b0 <- 1 # intercept for control data (b0 in diffndiff)
b1 <- 1 # shift on both control and treated after treatment (b1 in
#diffndiff)
b2 <- 2 # difference between intercept on control vs. treated (b2-this is
#the level difference pre-treatment to compare to coef on treat)
b3 <- 3 # shift after treatment that is only for treated group (b3-this is
#the coefficient of interest in diffndiff)
b4 <- 0 # parallel time trend (not measured in diffndiff) biases b0,b1 but
#not b3 that we care about
b5 <- 0 # allows for treated group trend to shift after treatment (0 if
#parallel trends holds)
su <- 4 # std. dev for errors
dnd <- function(n,b0,b1,b2,b3,b4,b5,su){
#initialize a time vector (set observations equal to n)
timelength = 10
t <- c(1:timelength)
num_obs_per_period = n/timelength #allows for multiple observations in one
#time period (can simulate multiple states within one group or something)
t0 <- c(1:timelength)
for (p in 1:(num_obs_per_period-1)){
t <- c(t,t0)
}
T<- 5 #set treatment period
g <- t >T
post <- as.numeric(g)
# assign equal amounts of observations to each state to start with (would
#like to allow selection into treatment at some point)
treat <- vector()
for (m in 1:(round(n/2))){
treat <- c(treat,0)
}
for (m in 1:(round(n/2))){
treat <- c(treat,1)
}
u <- rnorm(n,0,su) #This assumes the mean error is zero
#create my y vector now from the data
y<- b0 + b1*post + b2*treat + b3*treat*post + b4*t + b5*(t-T)*treat*post +u
interaction <- treat*post
#run regression
olsres <- lm(y ~ post + treat + interaction)
olsres$coefficients
# assign the coeeficients
bhat0<- olsres$coefficients[1]
bhat1 <- olsres$coefficients[2]
bhat2<- olsres$coefficients[3]
bhat3<- olsres$coefficients[4]
bhat3_stderr <- coef(summary(olsres))[3, "Std. Error"]
#Here I will use bhat3 to conduct a t-test and determine if this was a pass
#or a fail
tval <- (bhat3-b3)/ bhat3_stderr
#decision at 5% confidence I believe (False indicates the t-stat was less
#than 1.96, and we fail to reject the null)
decision <- abs(tval) > 1.96
decision <- unname(decision)
return(list(decision))
}
#Define a parameter grid to simulate over
from <- -5
to <- 5
increment <- .25
gridparts<- c(from , to , increment)
b5_grid <- seq(from = gridparts[1], to = gridparts[2], by = gridparts[3])
parameter <- list("n" = n, "b0" = b0 , "b1" = b1 ,"b2" = b2 ,"b3" = b3 ,"b4"
=
b4 ,"b5" = b5_grid ,"su" = su)
#Now simulate this multiple times in a monte carlo setting
results <- MonteCarlo(func = dnd ,nrep = 100, param_list = parameter)
And the error that comes up is:
in results[[i]] <- array(NA, dim = c(dim_vec, nrep)) :
attempt to select less than one element in integerOneIndex
This leads me to believe that somewhere something is attempting to access the "0th" element of a vector, which doesn't exist in R as far as I understand. I don't think the part that is doing this arises from my code vs. internal to this package however, and I can't make sense of the code that runs when I run the package.
I am also open to hearing about other methods that will essentially replace simulate() from Stata.
The function passed to MonteCarlo must return a list with named components. Changing line 76 to
return(list("decision" = decision))
should work