Syntax error when trying to parse Bayesian model using RJAGS - r

I'm running the following code to attempt Bayesian modelling using rjags but coming up with the syntax error below.
Error in jags.model(file = "RhoModeldef.txt", data = ModelData, inits
= ModelInits, : Error parsing model file: syntax error on line 4 near "~"
RhoModel.def <- function() {
for (s in 1:S) {
log(rhohat[s]) ~ dnorm(log(rho[s]),log(rhovar[s]))
rho[s] ~ dgamma(Kappa,Beta)
}
Kappa ~ dt(0,2.5,1) # dt(0, pow(2.5,-2), 1) https://stackoverflow.com/questions/34935606/cauchy-prior-in-jags https://arxiv.org/pdf/0901.4011.pdf
sig.k <- abs(Kappa)
Beta ~ dt(0,2.5,1)
sig.b <- abs(Beta)
}
S <- length(africasad21)-1 # integer
Rhohat <- afzip30$Rho # vector
Rhovar <- afzip30$RhoVar # vector
ModelData <-list(S=S,rhohat=Rhohat,rhovar=Rhovar)
ModelInits <- list(list(rho = rep(1,S),Kappa=0.1,Beta=0.1))
Model.1 <- jags.model(file = 'RhoModeldef.txt',data = ModelData,inits=ModelInits,
n.chains = 4, n.adapt = 100)
Does anyone have any ideas how I might be able to fix this? I'm thinking it might have something to do with my attempts to fit a logged model? Please let me know if more details are needed.
Thanks!

Line 4 of the file 'RhoModeldef.txt' is most likely this one:
log(rhohat[s]) ~ dnorm(log(rho[s]),log(rhovar[s]))
JAGS does not allow log transformations on the left hand side of stochastic relations, only deterministic ones. Given that you are providing rhohat as data, the easiest solution is to do the log transformation in R and drop that part in JAGS i.e.:
log_rhohat[s] ~ dnorm(log(rho[s]), log(rhovar[s]))
ModelData <-list(S=S, log_rhohat=log(Rhohat), rhovar=Rhovar)
Alternatively, you could use dlnorm rather than dnorm in JAGS.
Does that solve your problem? Your example is not self-contained so I can't check myself, but I guess it should work now.

Related

Getting Error Bootstrapping to test predictive model

rsq <- function(formula, Data1, indices) {
d <- Data1[indices,] # allows boot to select sample
fit <- lm(formula, Data1=d)
return(summary(fit)$r.square)
}
results = boot(data = Data1, statistic = rsq, R = 500)
When I execute the code, I get the following error:
Error in Data1[indices,] : incorrect number of dimensions
Background info: I am creating a predictive model using Linear Regressions. I would like to test my Predictive Model and through some research, I decided to use the Bootstrapping Method.
Credit goes to #Rui Barradas, check comments for original post.
If you read the help page for function boot::boot you will see that the function it calls has first argument data, then indices, then others. So change the order of your function definition to rsq <- function(Data1, indices, formula)
Another problem that I had was that I didn't define the Function.

Fitting Step functions

AIM: The aim here was to find a suitable fit, using step functions, which uses age to describe wage, in the Wage dataset in the library ISLR.
PLAN:
To find a suitable fit, I'll try multiple fits, which will have different cut points. I'll use the glm() function (of the boot library) for the fitting purpose. In order to check which fit is the best, I'll use the cv.glm() function to perform cross-validation over the fitted model.
PROBLEM:
In order to do so, I did the following:
all.cvs = rep(NA, 10)
for (i in 2:10) {
lm.fit = glm(wage~cut(Wage$age,i), data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
But this gives an error:
Error in model.frame.default(formula = wage ~ cut(Wage$age, i), data =
list( : variable lengths differ (found for 'cut(Wage$age, i)')
Whereas, when I run the code given below, it runs.(It can be found here)
all.cvs = rep(NA, 10)
for (i in 2:10) {
Wage$age.cut = cut(Wage$age, i)
lm.fit = glm(wage~age.cut, data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
Hypotheses and Results:
Well, it might be possible that cut() and glm() might not work together. But this works:
glm(wage~cut(age,4),data=Wage)
Question:
So, basically we're using the cut() function, saving it's results in a variable, then using that variable in the glm() function. But we can't put the cut function inside the glm() function. And that too, only if the code is in a loop.
So, why is the first version of the code not working?
This is confusing. Any help appreciated.

R JAGS: invalid parent value in node (combining dcat and dnorm)

I am a newbie when it comes to using Jags and Rjags.
modelString = "
model {
for (i in 1:N){
y[i] ~ dt(mu,tau,nu)
}
tau <- nuless2/(sig^2)
nuless2 <- nu / nuless
nuless <- (nu-2)
sig ~ dnorm(0.02045457,10000000)
mu ~ dnorm(0.0013942308,4000000000)
nu ~ dcat(pi[])
pi <- c(0,0,3,3,3,3,3,3,3)
}"
writeLines(modelString,con="model.txt")
line_data = list("y"=ret,"N"=length(ret))
init_value = list("mu"=0.003)
model <- jags.model("model.txt", data=line_data, n.chains=2)
I have been struggling to get this code to work and I have no idea while it is giving me the error.
Error: Error in node nuless2
Invalid parent values
I experienced around to see what the problem lies and read most of discussion but I couldnt see any reason.
It would be great if someone can tell me where my stupid mistake is.
Also, I am quite new to JAGS and would like to know how the flow of the system works. For example, while I define y ~ dt(), I also supply data for y. By doing so, am I telling the system this is what the data is and all the parameter in dt() need to be "verified" using this data? It would be great to have a deeper understanding of such system.

R Passing linear model to another function inside a function

I am trying to find the optimal "lambda" parameter for the Box-Cox transformation.
I am using the implementation from the MASS package, so I only need to create the model and extract the lambda.
Here is the code for the function:
library(MASS)
find_lambda <- function(x) {
# Function to find the best lambda for the Box-Cox transform
my_tmp <- data.frame(x = x) # Create a temporary data frame, to use it with the lm
str(my_tmp) # Gives the expected output
the_lm <- lm(x ~ 1, data = my_tmp) # Creates the linear model, no error here
print(summary(the_lm)) # Prints the summary, as expected
out <- boxcox(the_lm, plotit=FALSE) # Gives the error
best_lambda <- out$x[which.max(out$y)] # Extracting the best fitting lambda
return(best_lambda)
}
find_lambda(runif(100))
It gives the following error:
Error in is.data.frame(data) : object 'my_tmp' not found
The interesting thing is that the very same code is working outside the function. In other words, for some reason, the boxcox function from the MASS package is looking for the variable in the global environment.
I don't really understand, what exactly is going on... Do you have any ideas?
P.S. I do not provide a software/hardware specification, since this error was sucessfully replicated on a number of my friends' laptops.
P.P.S. I have found the way to solve the initial problem in the forecast package, but I still would like to know, why this code is not working.
Sometimes user contributed packages don't always do a great job tracking the environments where calls were executed when manipulating functions calls. The quickest fix for you would be to change the line from
the_lm <- lm(x ~ 1, data = my_tmp)
to
the_lm <- lm(x ~ 1, data = my_tmp, y=True, qr=True)
Because if the y and qr are not requested from the lm call, the boxcox function tries to re-run lm with those parameters via an update call and things get mucked up inside a function scope.
Why don't let box-cox do the fitting?
find_lambda <- function(x) {
# Function to find the best lambda for the Box-Cox transform
my_tmp <- data.frame(x = x) # Create a temporary data frame, to use it with the lm
out <- boxcox(x ~ 1, data = my_tmp, plotit=FALSE) # Gives the error
best_lambda <- out$x[which.max(out$y)] # Extracting the best fitting lambda
return(best_lambda)
}
I think your scoping issue is with update.default which calls eval(call, parent.frame()) and my_tmp doesn't exist in the boxcox environment. Please correct me if I'm wrong on this.
boxcox cannot find your data. This maybe because of some scoping issue.
You can feed data in to boxcox function.
find_lambda <- function(x) {
# Function to find the best lambda for the Box-Cox transform
my_tmp <- data.frame(x = x) # Create a temporary data frame, to use it with the lm
str(my_tmp) # Gives the expected output
the_lm <- lm(x ~ 1, data = my_tmp) # Creates the linear model, no error here
print(summary(the_lm)) # Prints the summary, as expected
out <- boxcox(the_lm, plotit=FALSE, data = my_tmp) # feed data in here
best_lambda <- out$x[which.max(out$y)] # Extracting the best fitting lambda
return(best_lambda)
}
find_lambda(runif(100))

Select Features for Naive Bayes Clasification in R

i want to use naive Bayes classifier to make some predictions.
So far i can make the prediction with the following (sample) code in R
library(klaR)
library(caret)
Faktor<-x <- sample( LETTERS[1:4], 10000, replace=TRUE, prob=c(0.1, 0.2, 0.65, 0.05) )
alter<-abs(rnorm(10000,30,5))
HF<-abs(rnorm(10000,1000,200))
Diffalq<-rnorm(10000)
Geschlecht<-sample(c("Mann","Frau", "Firma"),10000,replace=TRUE)
data<-data.frame(Faktor,alter,HF,Diffalq,Geschlecht)
set.seed(5678)
flds<-createFolds(data$Faktor, 10)
train<-data[-flds$Fold01 ,]
test<-data[flds$Fold01 ,]
features <- c("HF","alter","Diffalq", "Geschlecht")
formel<-as.formula(paste("Faktor ~ ", paste(features, collapse= "+")))
nb<-NaiveBayes(formel, train, usekernel=TRUE)
pred<-predict(nb,test)
test$Prognose<-as.factor(pred$class)
Now i want to improve this model by feature selection. My real data is about 100 features big.
So my question is , what woould be the best way to select the most important features for naive Bayes classification?
Is there any paper dor reference?
I tried the following line of code, bit this did not work unfortunately
rfe(train[, 2:5],train[, 1], sizes=1:4,rfeControl = rfeControl(functions = ldaFuncs, method = "cv"))
EDIT: It gives me the following error message
Fehler in { : task 1 failed - "nicht-numerisches Argument für binären Operator"
Calls: rfe ... rfe.default -> nominalRfeWorkflow -> %op% -> <Anonymous>
Because this is in german you may please reproduce this on your machine
How can i adjust the rfe() call to get a recursive feature elimination?
This error appears to be due to the ldaFuncs. Apparently they do not like factors when using matrix input. The main problem can be re-created with your test data using
mm <- ldaFuncs$fit(train[2:5], train[,1])
ldaFuncs$pred(mm,train[2:5])
# Error in FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) :
# non-numeric argument to binary operator
And this only seems to happens if you include the factor variable. For example
mm <- ldaFuncs$fit(train[2:4], train[,1])
ldaFuncs$pred(mm,train[2:4])
does not return the same error (and appears to work correctly). Again, this only appears to be a problem when you use the matrix syntax. If you use the formula/data syntax, you don't have the same problem. For example
mm <- ldaFuncs$fit(Faktor ~ alter + HF + Diffalq + Geschlecht, train)
ldaFuncs$pred(mm,train[2:5])
appears to work as expected. This means you have a few different options. Either you can use the rfe() formula syntax like
rfe(Faktor ~ alter + HF + Diffalq + Geschlecht, train, sizes=1:4,
rfeControl = rfeControl(functions = ldaFuncs, method = "cv"))
Or you could expand the dummy variables yourself with something like
train.ex <- cbind(train[,1], model.matrix(~.-Faktor, train)[,-1])
rfe(train.ex[, 2:6],train.ex[, 1], ...)
But this doesn't remember which variables are paired in the same factor so it's not ideal.

Resources