I am trying to create a raster with predictions for a model, using glmmTMB.
This is based on a model, and a rasterstack.
I converted the rasterstack to a data frame, as I think this is a requirement for the function predict.glmmTMB to run.
The model
model6 <- glmmTMB(Used~scale(Road_density)+scale(nonforprop)+scale(devprop)+
scale(forprop)+scale(nonfordist_cap3000)+scale(fordist_cap3000)+
scale(agridist_cap3000)+scale(devdist_cap3000)+(1|animal_ID),
data=rasterpoints3,na.action=na.omit,family=binomial(link="logit"))
The data frame containing the rasterstack values to predict for
predstack <- as.data.frame(stack2)
The error
glmmTMB:::predict.glmmTMB(model6,predstack,re.form=NA)
Error in eval(predvars, data, env) : object 'animal_ID' not found
I was hoping someone more experienced could help me resolve this. animal_ID is the random intercept in my glmmTMB object model 6. I am using this package, and not e.g. raster::predict, exactly because it should be able to deal with random effects. To my understanding, re.form=NA should take care of this?
There's an open issue about this, but the workaround should be easy: define
predstack$animal_ID <- NA
The random effect variable has to exist in the data, but it's not used. (Due to the internal structure of glmmTMB it's not completely trivial to fix this at the package level.)
Given Ben's answer, this ought to work as well with either the raster or terra package:
p <- predict(stack2, model6, const=data.frame(animal_ID=NA), re.form=NA)
(But in the absence of an example I cannot check it)
Related
'bst' is the name of an xgboost model that I built in R. It gives me predicted values for the test dataset using this code. So it is definitely an xgboost model.
pred.xgb <- predict(bst , xdtest) # get prediction in test sample
cor(ytestdata, pred.xgb)
Now, I would like to save the model so another can use the model with their data set which has the same predictor variables and the same variable to be predicted.
Consistent with page 4 of xgboost.pdf, the documentation for the xgboost package, I use the xgb.save command:
xgb.save(bst, 'xgb.model')
which produces the error:
Error in xgb.save(bst, "xgb.model") : model must be xgb.Booster.
Any insight would be appreciated. I searched the stack overflow and could not locate relevant advice.
Mike
It's hard to know exactly what's going on without a fully reproducible example. But just because your model can make predictions on the test data, it doesn't mean it's an xgboost model. It can be any type of model with a predict method.
You can try class(bst) to see the class of your bst object. It should return "xgb.Booster," though I suspect it won't here (hence your error).
On another note, if you want to pass your model to another person using R, you can just save the R object rather than exporting to binary, via:
save(bst, model.RData)
Good afternoon, all--thank you in advance for your help! I'm somewhat new to R, so my apologies if this is a trivial or otherwise inappropriate question.
TL;DR: I'm trying to determine Variable Importance (VIM) for factor variables with a random forest model built-in RandomForestSRC, which is not a built-in feature of that package. Using both the LIME and DALEX packages, I encounter the same error: cannot coerce class 'c("rfsrc, "predict", "class")' to a data.frame. Any assistance resolving this error, or alternate approaches, would be greatly appreciated!
I have a random forest model I've built in R, using the RandomForestSRC package. The model seems to work great--training and testing went fine, got the predicted output I needed, results seem in-line with what I would expect. Unfortunately, one of the requirements is that I need to be able to indicate how the model arrived at its conclusions (eg, I need to also include variable importance as a part of the output), for both continuous and factor variables.
This doesn't seem to be a built-in feature with the RandomForestSRC package, so I've looked into both the LIME and DALEX packages, both of which should be able to break out VIM from the existing RF model. Unfortunately, neither have native support for the RFSRC package, which means I've needed to build in the prediction functions myself, as recommended by this vignette:https://uc-r.github.io/dalex
model_type.rfsrc <- function (x, ...) {
return ('classification')
}
predict_model.rfsrc <- function (x, newdata, type, ...) {
as.data.frame(predict(x, newdata, ...)
}
Unfortunately, in running the VIM section of the model (in both LIME and DALEX), I'm asked to pass both the predicted output and the model that created that output. In doing so, it hits an error with the above predict_model function:
error in as.data.frame.default(predict(model, (newdata))):
cannot coerce class 'c("rfsrc, "predict", "class")' to a data.frame
And, like...of course, it can't; it's trying to turn the model itself into a data frame. Unfortunately, while I think I understand why R is giving me that error, that's about as far as I've been able to figure out on my own.
Additionally, I'm using the RandomForestSRC package for two reasons: it doesn't put a limit on the number of factor variables, and it can handle imbalanced data. I'm working with medical data, so both of these are necessary (eg, there are ~100,000 different medical codes that can be encoded in a single data variable, and the ratio of "people-who-don't-have-this-condition" vs "people-who-do-have-this-condition" is frequently 100 to 1). If anyone has any suggestions for alternative packages that handle these issues, though, and have built-in VIM functionality (or integrate with DALEX / LIME), that would be fantastic as well.
Thank you all very much for your help!
Ran into this problem while trying to get the empirical distribution of the K-R degrees of freedom...
This seems like fairly dangerous behaviour? Does it constitute a bug?
Reproducible example:
## import lmerTest package
library(lmerTest)
## an object of class merModLmerTest
m <- lmer(Informed.liking ~ Gender+Information+Product +(1|Consumer), data=ham)
# simulate data from fitted model
simData=ham
simData$Informed.liking=unlist(simulate(m))
# fit model to simulated data
m1 <- lmer(Informed.liking ~ Gender+Information+Product +(1|Consumer), data=simData)
stats:::anova(m1)
lmerTest:::anova(m1)
# simulate again, WITHOUT refitting
simData$Informed.liking=unlist(simulate(m))
stats:::anova(m1) # same as before
lmerTest:::anova(m1) # not same as before!
my response does not constitute a solid answer, rather an extended comment:
this looks pretty bad - in fact I have discovered today that almost all the analyses that I conducted in a project that was on the verge of submission have to be redone because of a related behavior of lmerTest.
The problem I have run into was when I used a short function that fits a model with lmer and then returns coef(summary(model)) - simple stuff, two lines of code. However the input to this function was named data and I also had a dataframe called data in the workspace. It seems that although during fitting with lmer the local variable from the function scope was correctly used, during summary the workspace data variable was used (which often was not the same as the dataframe passed to the function) leading to invalid t values and degrees of freedom leading to incorrect p values (the estimates and their standard error was ok however).
So, answering your question:
This seems like fairly dangerous behaviour? Does it constitute a bug?
It seems dangerous indeed and I would definitelly call this a bug.
In the demo for ROC, there are models that when plotted have a spread, like hiv.svm$predictions which contains 10 estimates of response. Can someone remind me how to calculate N estimates of a model. I'm using RPART and neural network to estimate a single output (true/false). How can I run 10 different sampling for training data to get 10 different model responses to the input. I think the function is called bootstraping, but I don't know how to implement it.
I need to do this outside of caret, cause when I use caret I keep getting the message "Error in tab[1:m, 1:m] : subscript out of bounds". Is there a "simple" bootstrap function?
Obviously the answer is too late, but you could have used caret just by simply renaming the levels of your factor, because caret doesn't work if your binary response is of type logical. For example:
factor(responseWithTrueFalseLevel,
levels=c(TRUE,FALSE),
labels=c("myTrueLevel","myFalseLevel"))
I am trying to impute missing values using regression and have searched thoroughly online and it hasn't been of much help. I read the FNN package documentation for the knn.reg function and find it difficult to interpret. I have a column of missing values in the test data which i want to predict using my training data and have a code like this ::
regress<-knn.reg(data.train[data.train[,4]==1,][c(1,2,3)],test=data.test[c(1,2,3)],data.test[c(2)],5)
But I get the following error:: Error in get.knnx(train, test, k, algorithm) : Data include NAs. The column which contains missing values is col #2. When I exclude the column which has NA values i.e.
regress<-knn.reg(data.train[data.train[,4]==1,][c(1,2,3)],test=data.test[c(1,3)],data.test[c(2)],5)
I get an error:: Error in get.knnx(train, test, k, algorithm) : Number of columns must be same!. Please help !!
You might want to consider the mice package (and read part of the paper).
Using standard settings which have been proven to a good starting point:
library(mice)
mi <- mice(dataset)
mi.reg <- with(data=mi,exp=glm(y~x+z))
Here, simply calling mice() on your data will fill in each NA value. Finer tuning is of course possible (and needed if it would take too long to converge, or if you have reason to believe it is not accurate). Many different types of imputations are possible and are listed on page 16.