I am trying to fit a multi-state model using R package R2BayesX. How can I do so correctly? There is no example in the manual. Here is my attempt.
activity is 1/0 ie the states
time is time
patient id is the random effect I want
f <- activity ~ sx(time,bs="baseline")+sx(PatientId, bs="re")
b <- bayesx(f, family = "multistate", method = "MCMC", data=df)
Note: created new output directory
Warning message:
In run.bayesx(file.path(res$bayesx.prg$file.dir, prg.name = res$bayesx.prg$prg.name), :
an error occurred during runtime of BayesX, please check the BayesX
logfile!
I'm not sure what kind of model exactly you want to specify but I tried to provide an artificial non-sensical data set to make the error above reproducible:
set.seed(1)
df <- data.frame(
activity = rbinom(1000, prob = 0.5, size = 1),
time = rep(1:50, 20),
id = rep(1:20, each = 50)
)
Possibly, you could provide an improved example. And then I can run your code:
library("R2BayesX")
f <- activity ~ sx(time, bs = "baseline") + sx(id, bs = "re")
b <- bayesx(f, family = "multistate", method = "MCMC", data = df)
This leads to the warning above and you can inspect BayesX's logfile via:
bayesx_logfile(b)
which tells you (among other information):
ERROR: family multistate is not allowed for method regress
So here only REML estimation appears to be supported, but:
b <- bayesx(f, family = "multistate", method = "REML", data = df)
also results in an error, the logfile says:
ERROR: Variable state has to be specified as a global option!
So the state has to be provided in a different way. I guess that you tried to do so by the binary response but it seems that the response should be the time variable (as in survival models) and then an additional state indicator needs to be provided somehow. I couldn't find an example for this in the BayesX manuals, though. I recommend that you contact the BayesX mailing list and/or the R2BayesX package maintainer with a more specific question and a reproducible example.
Related
Sorry this is crossposting from https://stats.stackexchange.com/questions/593717/nlme-regression-with-weights-syntax-in-r, but I thought it might be more appropriate to post it here.
I am trying to fit a power curve to model some observations in an nlme. However, I know some observations to be less reliable than others (reliability of each OBSID reflected in the WEIV in the dummy data), relatively independent of variance, and I quantified this beforehand and wish to include it as weights in my model. Moreover, I know a part of my variance is correlated with my independent variable so I cannot use directly the variance as weights.
This is my model:
coeffs_start = lm(log(DEPV)~log(INDV), filter(testdummy10,DEPV!=0))$coefficients
nlme_fit <- nlme(DEPV ~ a*INDV^b,
data = testdummy10,
fixed=a+b~ 1,
random = a~ 1,
groups = ~ PARTID,
start = c(a=exp(coeffs_start[1]), b=coeffs_start[2]),
verbose = F,
method="REML",
weights=varFixed(~WEIV))
This is some sample dummy data (I know it is not a great fit but it's fake data anyway) : https://github.com/FlorianLeprevost/dummydata/blob/main/testdummy10.csv
This runs well without the "weights" argument, but when I add it I get this error and I am not sure why because I believe it is the correct syntax:
Error in recalc.varFunc(object[[i]], conLin) :
dims [product 52] do not match the length of object [220]
In addition: Warning message:
In conLin$Xy * varWeights(object) :
longer object length is not a multiple of shorter object length
Thanks in advance!
This looks like a very long-standing bug in nlme. I have a patched version on Github, which you can install via remotes::install_github() as below ...
remotes::install_github("bbolker/nlme")
testdummy10 <- read.csv("testdummy10.csv") |> subset(DEPV>0 & INDV>0)
coeffs_start <- coef(lm(log(DEPV)~log(INDV), testdummy10))
library(nlme)
nlme_fit <- nlme(DEPV ~ a*INDV^b,
data = testdummy10,
fixed=a+b~ 1,
random = a~ 1,
groups = ~ PARTID,
start = c(a=exp(coeffs_start[1]),
b=coeffs_start[2]),
verbose = FALSE,
method="REML",
weights=varFixed(~WEIV))
packageVersion("nlme") ## 3.1.160.9000
library(mboost)
### a simple two-dimensional example: cars data
cars.gb <- gamboost(dist ~ speed, data = cars, dfbase = 4,
control = boost_control(mstop = 50))
set.seed(1)
cars_new <- cars + rnorm(nrow(cars))
> predict(cars.gb, newdata = cars_new$speed)
Error in check_newdata(newdata, blg, mf) :
‘newdata’ must contain all predictor variables, which were used to specify the model.
I fit a model using the example on the help(gamboost) page. I want to use this model to predict on a new dataset, cars_new, but encountered the above error. How can I fix this?
predict function looks for a variable called speed but when you subset it with $ sign it has no name anymore.
so, this variant of prediction works;
predict(cars.gb, newdata = data.frame(speed = cars_new$speed))
or keep the original name as is;
predict(cars.gb, newdata = cars_new['speed'])
Whenever I run the predict function multiple times on a bsts model using the same prediction data, I get different answers. So my question is, is there a way to return consistent answers given I keep my predictor dataset the same?
Example using the iris data set (I know it's not time series but it will illustrate my point)
iris_train <- iris[1:100,1:3]
iris_test <- iris[101:150,1:3]
ss <- AddLocalLinearTrend(list(), y = iris_train$Sepal.Length)
iris_bsts <- bsts(formula = Sepal.Length ~ ., data = iris_train,
state.specification = ss,
family = 'gaussian', seed = 1, niter = 500)
burn <- SuggestBurn(0.1,iris_bsts)
Now if I run this following line say, 10 times, each result is different:
iris_predict <- predict(iris_bsts, newdata = iris_test, burn = burn)
iris_predict$mean
I understand that it is running MCMC simulations, but I require consistent results and have therefore tried:
Setting the seed in bsts and before predict
Setting the state space standard deviation to near 0, which just creates unstable results.
And neither seem to work. Any help would be appreciated!
I encountered the same problem. To fix it, you need to set the random seed in the embedded C code. I forked the packaged and made the modifications here: BSTS.
For package installation only, download bsts_0.7.1.1.tar.gz in the build folder. If you already have bsts installed, replace it with this version via:
remove.packages("bsts")
# assumes working directory is whre file is located
install.packages("bsts_0.7.1.1.tar.gz", repos=NULL, tyype="source")
If you do not have bsts installed, please install it first to ensure all dependencies are there. (This may require installing Rtools, Boom, and BoomSpikeSlab individually.)
This package version only modifies the predict function from bsts, all code should work as is. It automatically sets the random seed to 1 each time predict is called. If you want predictions to vary, you'll need to explicitly set the predict parameter each time.
You can make a function to specify seed each time (set.seed was unnecessary...):
reproducible_predict <- function(S) {
iris_bsts <- bsts(formula = Sepal.Length ~ ., data = iris_train, state.specification = ss, seed = S, family = 'gaussian', niter = 500)
burn <- SuggestBurn(0.1,iris_bsts)
iris_predict <- predict(iris_bsts, newdata = iris_test, burn = burn)
return(iris_predict$mean)
}
reproducible_predict(1)
[1] 7.043592 6.212780 6.789205 6.563942 6.746156
reproducible_predict(1)
[1] 7.043592 6.212780 6.789205 6.563942 6.746156
reproducible_predict(200)
[1] 7.013679 6.173846 6.763944 6.567651 6.715257
reproducible_predict(200)
[1] 7.013679 6.173846 6.763944 6.567651 6.715257
I have come across the same issue.
The problem comes from setting the seed within the model definition only.
To solve your problem, you have to set a seed within the predict function such as:
iris_predict <- predict(iris_bsts, newdata = iris_test, burn = burn, seed=X)
Hope this helps.
I am new to the MCMCglmm package in R, and rather new to glm models in general. I have a dataset of species traits and whether or not they have been introduced outside of their native range.
I would like to test whether being introduced (as a binary 0/1 response variable) can be explained by any of the species traits. I would also like to correct for phylogeny between species.
I was told that for a binary response I could use family =“threshold” and I should fix the residual variance at 1. But I am having some trouble with the other parameters needed for the prior.
I've specified the R value for the random effects, but if I specify R I must also specify G and it is not clear to me how to decide the values for this parameter. I've tried putting default values but I get error messages:
Error in MCMCglmm(fixed, random = ~species, data = data2, family = "threshold", :
prior$G has the wrong number of structures
I have read the help vignettes and course but have not found an example with a binary response, and it is not clear to me how to decide the values for the priors. This is what I have so far:
fixed=Intro_binary ~ Trait1+ Trait2 + Trait3
Ainv=inverseA(redTree1)$Ainv
binary_model = MCMCglmm(fixed, random=~species, data = data, family = "threshold", ginverse=list(species=Ainv),
prior = list(
G = list(), #not sure about the parameters for random effects.
R = list(V = 1, fix = 1)), #to fix the residual variance at one
nitt = 60000, burnin = 10000)
Any help or feedback would be greatly appreciated!
This one is a bit tricky with the information you provide. I'd say you can define G as a "weak" prior using:
priors <- list(R = list(V = 1, nu = 0.002),
G = list(V = 1, fix = 1)))
binary_model <- MCMCglmm(fixed, random = ~species, data = data,
family = "threshold",
ginverse = list(species = Ainv),
prior = priors,
nitt = 60000, burnin = 10000)
However, without more information on your analysis, I strongly suggest you plot your posteriors to have a look at the results and see if anything looks wrong. Have a look at the MCMCglmm package Course Notes for more info on how to set these priors (especially on what not to do in section 1.5 - you can also find more specific info on how to tune it to your model if it fits in the categories of the tutorial).
I am using the party package.
When I run:
tree1 <- mob(incarcerated~priors+opens+concrearr+postrearr+anyrearr+postconvfel+postconvmis+
ag_vfo+ag_cla2+in_custody |PRIOR_FELONY_ARREST ,
data = jamaal,
control = ctrl,
model = glinearModel,
family = binomial)
I get the error
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
But I checked and every factor variable has at least 2 levels.
I then tried a much simpler tree
treetest <- mob(incarcerated~priors|in_custody,
data = jamaal,
control = ctrl,
model = glinearModel,
family = binomial)
and got one of the infamous R error messages
Error: object of type 'closure' is not subsettable
Any help appreciated
UPDATE
I found the source of the first error (it was a problem with how I was using factor()) but not the second. Also, rpart works on the same data with no problem.
The data are confidential, but I will check with the client if posting a small subset is OK
FURTHER UPDATE
Here is an small example with made up data:
priors <- c(rep('Y', 5), rep('N', 5))
incarcerated <- rep(c('Y', 'N'), 5)
in_custody <- rep(c(rep('Y', 3), rep('N', 2)),2)
testdata <- data.frame(cbind(priors, incarcerated, in_custody))
treetest <- mob(incarcerated~priors|in_custody, data = testdata,
model = glinearModel, family = binomial)
gives the same error.
party is looking for the results of a binomial() call, rather than the function binomial or the string "binomial". (In my opinion the glm() function in base R has made things very confusing by accepting any of these three as acceptable variants.)
priors <- c(rep('Y', 5), rep('N', 5))
incarcerated <- rep(c('Y', 'N'), 5)
in_custody <- rep(c(rep('Y', 3), rep('N', 2)),2)
testdata <- data.frame(cbind(priors, incarcerated, in_custody))
library(party)
treetest <- mob(incarcerated~priors|in_custody, data = testdata,
model = glinearModel, family = binomial())
In hindsight, this error message is at least somewhat informative -- it tells us to look for a function that it is being passed somewhere that R expects an object that has elements that can be extracted ...