I'm trying to study Bayesian analysis based on book "Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan (2015)".
In this book, there are examples. So, I'm trying to replicate this example in R.
However, I have got an error message in this example.
To be specific, this is the data of example.
data
y s
1 1 Reginald
2 0 Reginald
3 1 Reginald
4 1 Reginald
5 1 Reginald
6 1 Reginald
7 1 Reginald
8 0 Reginald
9 0 Tony
10 0 Tony
11 1 Tony
12 0 Tony
13 0 Tony
14 1 Tony
15 0 Tony
y<-data$y
s<-as.numeric(data$s)
Ntotal=length(y)
Nsubj=length(unique(s))
dataList=list(y=y, s=s, Ntotal=Ntotal, Nsubj=Nsubj)
Also, this is my model.
modelString="
model{
for(i in 1:Ntotal){
y[i] ~ dbern(theta[s[i]])
}
for(s in 1:Nsubj){
theta[s] ~ dbeta(2,2)
}
}
"
writeLines(modelString, con="TEMPmodel.txt")
library(rjags)
library(runjags)
jagsModel=jags.model(file="TEMPmodel.txt",data=dataList)
In this case, I got an error message.
Error in jags.model(file = "TEMPmodel.txt", data = dataList) :
RUNTIME ERROR:
Cannot insert node into theta[1...2]. Dimension mismatch
I don't know what I have made mistake in this code.
Please give me advice.
Thanks in advance.
As suggested by #nicola, the problem is that you're passing s as data to your model, but also using s as the counter iterating over 1:Nsubj. As JAGS indicates, this causes confusion about the dimension of theta... does it have length 15, or 2?
The following works:
model{
for(i in 1:Ntotal){
y[i] ~ dbern(theta[s[i]])
}
for(j in 1:Nsubj){
theta[j] ~ dbeta(2,2)
}
}
Related
I am trying to run a regression loop based on code that I have found in a previous answer (How to Loop/Repeat a Linear Regression in R) but I keep getting an error. My outcomes (dependent) are 940 variables (metabolites) and my exposure (independent) are "bmi","Age", "sex","lpa2c", and "smoking". where BMI and Age are continuous. BMI is the mean exposure, and for others, I am controlling for them.
So I'm testing the effect of BMI on 940 metabolites.
Also, I would like to know how I can extract coefficient, p-value, standard error, and confidence interval for BMI only and when it is significant.
This is the code I have used:
y<- c(1653:2592) # response
x1<- c("bmi","Age", "sex","lpa2c", "smoking") # predictor
for (i in x1){
model <- lm(paste("y ~", i[[1]]), data= QBB_clean)
print(summary(model))
}
And this is the error:
Error in model.frame.default(formula = paste("y ~", i[[1]]), data = QBB_clean, :
variable lengths differ (found for 'bmi').
y1 y2 y3 y4 bmi age sex lpa2c smoking
1 0.2875775201 0.59998896 0.238726027 0.784575267 24 18 1 0.470681834 1
2 0.7883051354 0.33282354 0.962358936 0.009429905 12 20 0 0.365845473 1
3 0.4089769218 0.48861303 0.601365726 0.779065883 18 15 0 0.121272054 0
4 0.8830174040 0.95447383 0.515029727 0.729390652 16 21 0 0.046993681 0
5 0.9404672843 0.48290240 0.402573342 0.630131853 18 28 1 0.262796304 1
6 0.0455564994 0.89035022 0.880246541 0.480910830 13 13 0 0.968641168 1
7 0.5281054880 0.91443819 0.364091865 0.156636851 11 12 0 0.488495482 1
8 0.8924190444 0.60873498 0.288239281 0.008215520 21 23 0 0.477822030 0
9 0.5514350145 0.41068978 0.170645235 0.452458394 18 17 1 0.748792881 0
10 0.4566147353 0.14709469 0.172171746 0.492293329 20 15 1 0.667640231 1
If you want to loop over responses you will want something like this:
respvars <- names(QBB_clean[1653:2592])
predvars <- c("bmi","Age", "sex","lpa2c", "smoking")
results <- list()
for (v in respvars) {
form <- reformulate(predvars, response = v)
results[[v]] <- lm(form, data = QBB_clean)
}
You can then print the results with something like lapply(results, summary), extract coefficients, etc.. (I have a little trouble seeing how it's going to be useful to just print the results of 940 regressions ... are you really going to inspect them all?
If you want coefficients etc. for BMI, I think this should work (not tested):
t(sapply(results, function(m) coef(summary(m))["bmi",]))
Or for coefficients:
t(sapply(results, function(m) confint(m)["bmi",]))
I'm trying to use genetic algorithm for classification problem. However, I didn't succeed to get a summary for the model nor a prediction for a new data frame. How can I get the summary and the prediction for the new dataset?
Here is my toy example:
library(genalg)
dat <- read.table(text = " cats birds wolfs snakes
0 3 9 7
1 3 8 7
1 1 2 3
0 1 2 3
0 1 2 3
1 6 1 1
0 6 1 1
1 6 1 1 ", header = TRUE)
evalFunc <- function(x) {
if (dat$cats < 1)
return(0) else return(1)
}
iter = 100
GAmodel <- rbga.bin(size = 7, popSize = 200, iters = iter, mutationChance = 0.01,
elitism = T, evalFunc = evalFunc)
###########summary try#############
cat(summary.rbga(GAmodel))
# Error in cat(summary.rbga(GAmodel)) :
# could not find function "summary.rbga"
############# prediction try###########
dat$pred<-predict(GAmodel,newdata=dat)
# Error in UseMethod("predict") :
# no applicable method for 'predict' applied to an object of class "rbga"
Update:
After reading the answer given and reading this link:
Pattern prediction using Genetic Algorithm
I wonder how can I programmatically use the GA as part of a prediction mechanism? According to the link's text, one can use the GA for optimizing regression or NN and then use the predict function provided by them/
Genetic Algorithms are for optimization, not for classification. Therefore, there is no prediction method. Your summary statement was close to working.
cat(summary(GAmodel))
GA Settings
Type = binary chromosome
Population size = 200
Number of Generations = 100
Elitism = TRUE
Mutation Chance = 0.01
Search Domain
Var 1 = [,]
Var 0 = [,]
GA Results
Best Solution : 1 1 0 0 0 0 1
Some additional information is available from Imperial College London
Update in response to updated question:
I see from the paper that you mentioned how this makes sense. The idea is to use the genetic algorithm to optimize the weights for a neural network, then use the neural network for classification. This would be a big task, too big to respond here.
I am trying to run a nls model for a pavement data ,I have different sections and I need to run a same model (of course with different variables for each section)
following is my model
data<-read.csv(file.choose(),header=TRUE)
data
SecNum=data[,1]
t=c()
PCI=c()
t
PCI
for(i in 1:length(SecNum)){
if(SecNum[i] == SecNum[i+1]){
tt=data[i,2]
PCIt=data[i,3]
t=c(t, tt)
PCI=c(PCI, PCIt)
} else {
tt=data[i,2]
PCIt=data[i,3]
t=c(t, tt)
PCI=c(PCI, PCIt)
alpha=1
betha=1
theta=1
fit = nls(PCI ~ alpha-beta*exp((-theta*(t^gama))),
start=c(alpha=min(c(PCI, PCIt)),beta=1,theta=1,gama=1))
fitted(fit)
resid(fit)
print(t)
print(PCI)
t= c()
PCI= c()
}
}
after running my model I received an error like this
Error in nls(PCI ~ alpha - beta * exp((-theta * (t^gama))), start = c(alpha = min(c(PCI, :
singular gradient
My professor told me to use "nlsList" instead and that might solve the problem.
since I am new in R I don't know how to do that .I would be really thankful if anyone can advice me how to do it.
here is a sample of my data.
SecNum t PCI AADT ESAL
1 962 1 90.46 131333 3028352
2 962 2 90.01 139682 3213995
3 962 3 86.88 137353 2205859
4 962 4 86.36 137353 2205859
5 962 5 84.56 137353 2205859
6 962 6 85.11 137353 2205859
7 963 1 91.33 91600 3726288
I'm trying to add a bit of code to a data-augmentation capture-recapture model and am coming up with some errors I haven't encountered before. In short, I want to estimate a series of survivorship phases that each last more than a single time interval. I want the model to estimate the length of each survivorship phase and use that to improve the capture recapture model. I tried and failed with a few different approaches, and am now trying to accomplish this using a switching state array for the survivorship phases:
for (t in 1:(n.occasions-1)){
phi1switch[t] ~ dunif(0,1)
phi2switch[t] ~ dunif(0,1)
phi3switch[t] ~ dunif(0,1)
phi4switch[t] ~ dunif(0,1)
psphi[1,t,1] <- 1-phi1switch[t]
psphi[1,t,2] <- phi1switch[t]
psphi[1,t,3] <- 0
psphi[1,t,4] <- 0
psphi[1,t,5] <- 0
psphi[2,t,1] <- 0
psphi[2,t,2] <- 1-phi2switch[t]
psphi[2,t,3] <- phi2switch[t]
psphi[2,t,4] <- 0
psphi[2,t,5] <- 0
psphi[3,t,1] <- 0
psphi[3,t,2] <- 0
psphi[3,t,3] <- 1-phi3switch[t]
psphi[3,t,4] <- phi3switch[t]
psphi[3,t,5] <- 0
psphi[4,t,1] <- 0
psphi[4,t,2] <- 0
psphi[4,t,3] <- 0
psphi[4,t,4] <- 1-phi4switch[t]
psphi[4,t,5] <- phi4switch[t]
psphi[5,t,1] <- 0
psphi[5,t,2] <- 0
psphi[5,t,3] <- 0
psphi[5,t,4] <- 0
psphi[5,t,5] <- 1
}
So this creates a [5,t,5] array where the survivorship state can only switch to the subsequent state and not backwards (e.g. 1 to 2, 4 to 5, but not 4 to 3). Now I create a vector where the survivorship state is defined:
PhiState[1] <- 1
for (t in 2:(n.occasions-1)){
# State process: draw PhiState(t) given PhiState(t-1)
PhiState[t] ~ dcat(psphi[PhiState[t-1], t-1,])
}
We start in state 1 always, and then take a categorical draw at each time step 't' for remaining in the current state or moving on to the next one given the probabilities within the array. I want a maximum of 5 states (assuming that the model will be able to functionally produce fewer by estimating the probability of moving from state 3 to 4 and onwards near 0, or making the survivorship value of subsequent states the same or similar if they belong to the same survivorship value in reality). So I create 5 hierarchical survival probabilities:
for (a in 1:5){
mean.phi[a] ~ dunif(0,1)
phi.tau[a] <- pow(phi_sigma[a],-2)
phi.sigma[a] ~ dunif(0,20)
}
Now this next step is where the errors start. Now that I've assigned values 1-5 to my PhiState vector it should look something like this:
[1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 5
or maybe
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
and I now want to assign a mean.phi[] to my actual phi[] term, which feeds into the model:
for(t in 1:(n.occasions-1)){
phi[t] ~ dnorm(mean.phi[PhiState[t]],phi.tau[PhiState[t]])
}
However, when I try to run this I get the following error:
Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, :
RUNTIME ERROR:
Cannot insert node into mean.phi[1:5]. Dimension mismatch
It's worth noting that the model works just fine when I use the following phi[] determinations:
phi[t] ~ dunif(0,1) #estimate independent annual phi's
or
phi[t] ~ dnorm(mean.phi,phi_tau) #estimate hierarchical phi's from a single mean.phi
or
#Set fixed survial periods (this works the best, but I don't want to have to tell it when
#the periods start/end and how many there are, hence the current exercise):
for (a in 1:21){
surv[a] ~ dnorm(mean.phi1,phi1_tau)
}
for (b in 22:30){
surv[b] ~ dnorm(mean.phi2,phi2_tau)
}
for (t in 1:(n.occasions-1)){
phi[t] <- surv[t]
}
I did read this post: https://sourceforge.net/p/mcmc-jags/discussion/610037/thread/36c48f25/
but I don't see where I'm redefining variables in this case... Any help fixing this or advice on a better approach would be most welcome!
Many thanks,
Josh
I'm a bit confused as to what are your actual data (the phi[t]?), but the following might give you a starting point:
nt <- 29
nstate <- 5
M <- function() {
phi_state[1] <- 1
for (t in 2:nt) {
up[t-1] ~ dbern(p[t-1])
p[t-1] <- ifelse(phi_state[t-1]==nstate, 0, p_[t-1])
p_[t-1] ~ dunif(0, 1)
phi_state[t] <- phi_state[t-1] + equals(up[t-1], 1)
}
for (k in 1:nstate) {
mean_phi[k] ~ dunif(0, 1)
phi_sigma[k] ~ dunif(0, 20)
}
for(t in 1:(nt-1)){
phi[t] ~ dnorm(mean_phi[phi_state[t]], phi_sigma[phi_state[t]]^-2)
}
}
library(R2jags)
fit <- jags(list(nt=nt, nstate=nstate), NULL,
c('phi_state', 'phi', 'mean_phi', 'phi_sigma', 'p'),
M, DIC=FALSE)
Note that above, p is a vector of probabilities of moving up to the next (adjacent) state.
I am trying to conduct an hierarchical bayesian analysis but am having a little trouble with R and WinBUGS code. I don't have balanced data and am struggling with the coding. I have temperature data collected daily with iButtons (temperature recording devices) in transects and am trying to generate a model that relates this to remote sensing data. Unfortunately, each transect has a different number of iButtons so creating a 3D matrix of button(i), in transect(j), repeatedly "sampled" on day(t) is a problem for me.
Ultimately, my model will be something like:
Level 1
Temp[ijk] ~ N(theta[ijk], tau)
theta[ijk] = b0 + b1*x1 + . . . + bn*xn
Level 2
b0 = a00 + a01*y1 + . . . an*yn
b1 = a10 + a11*y1 ...
Level 3 (maybe?) - random level 2 intercepts
Normally I would do something like this:
Wide <- reshape(Data1, idvar = c("iButton","block"), timevar = "julian", direction = "wide")
J <- length(unique(Data$block))
I <- length(unique(Data$iButton))
Ti <- length(unique(Data$julian))
Temp <- array(NA, dim = c(I, Ti, J))
for(t in 1:Ti) {
sel.rows <- Wide$block == t
Temp[,,t] <- as.matrix(Wide)[sel.rows, 3:Ti]
}
Then I could have a 3D matrix that I could loop through in WinBUGS or OpenBUGS as such:
for(i in 1:J) { # Loop over transects/blocks
for(j in 1:I) { # Loop over buttons
for(t in 1:Ti) { # Loop over days
Temp[i,j,t] ~ dnorm(theta[i,j,t])
theta[i,j,t] <- alpha.lam[i] + blam1*radiation[i,j] + blam2*cwd[i,j] + blam3*swd[i,j]
}}}
Anyway, don't worry about the details of the code above, it's just thrown together as an example from other analyses. My main question is how to do this type of analysis when I don't have a balanced design with equal numbers of iButtons per transect? Any help would be greatly appreciated. I'm clearly new to R and WinBUGS and don't have much previous computer coding experience.
Thanks!
oh and here is what the data look like in long (stacked) format:
> Data[1:15, 1:4]
iButton julian block aveT
1 1 1 1 -4.5000000
2 1 2 1 -5.7500000
3 1 3 1 -3.5833333
4 1 4 1 -4.6666667
5 1 5 1 -2.5833333
6 1 6 1 -3.0833333
7 1 7 1 -1.5833333
8 1 8 1 -8.3333333
9 1 9 1 -5.0000000
10 1 10 1 -2.4166667
11 1 11 1 -1.7500000
12 1 12 1 -3.2500000
13 1 13 1 -3.4166667
14 1 14 1 -2.0833333
15 1 15 1 -1.7500000
Create a vector or array of lengths and use subindexing.
Using your example:
J <- length(unique(Data$block))
I <- tapply(Data$iButton, Data$block, function(x) length(unique(x))
Ti <- tapply(Data$julian, list(Data$iButton, Data$block), function(x) length(unique(x))
for(i in 1:J) { # Loop over transects/blocks
for(j in 1:I[i]) { # Loop over buttons
for(t in 1:Ti[i, j]) { # Loop over days
Temp[i,j,t] ~ dnorm(theta[i,j,t])
theta[i,j,t] <- alpha.lam[i] + blam1*radiation[i,j] + blam2*cwd[i,j] + blam3*swd[i,j]
}}}
I think it would work, but I haven't tested since there no data.
Can you try using a list instead?
This allows a variable length for each item in the list where each index would correspond to the transect.
So something like this:
theta <- list()
for(i in unique(Data$block)) {
ibuttons <- unique(Data$iButton[Data$block==i])
days <- unique(Data$julian[Data$block==i])
theta[[i]] <- matrix(NA, length(ibuttons), length(days)) # Empty matrix with NA's
for(j in 1:length(ibuttons)) {
for(t in 1:length(days)) {
theta[[i]][j,t] <- fn(i, ibuttons[j], days[t])
}
}
}