I have an older random forest model I built with the Rborist package, and I recently stopped being able to make predictions with it. I think older versions of the package produced models of class Rborist and starting with version 0.3-1 it makes models of class rfArb. So the predict method also changed from predict.Rborist to predict.rfArb.
I was originally getting a message about how there was "no applicable method for 'predict' applied to an object of class 'Rborist'" (since it had been depreciated, I think). Then I manually changed the class of my old model to rfArb and started getting a new error message:
Error in predict,rfArb(surv_mod_rf, newdata = trees) :
Sampler state needed for prediction
It looks to me like this has to do with the way rfArb objects are constructed (they have a sampler vector that shows how many times each observation in the sample was sampled, or some such), which is different than the way Rborist objects are constructed. But please let me know if I'm misunderstanding this.
Is there any way to update my old model so I can use the recent version of Rborist to make predictions, or are the only options using an older version of the package or rebuilding the model?
Related
I’m hoping to get some guidance on how, or even whether, to update an XGBoost model to the current version. The model object was saved with an old JSON model format. When I apply the model in R to new data via predict(), the following warnings are generated.
WARNING: amalgamation/../src/learner.cc:1040: If you are loading a
serialized model (like pickle in Python, RDS in R) generated by older
XGBoost, please export the model by calling Booster.save_model from
that version first, then load it back in current version. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for more details about differences between saving model and
serializing. WARNING: amalgamation/../src/learner.cc:749: Found JSON
model saved before XGBoost 1.6, please save the model using current
version again. The support for old JSON model will be discontinued in
XGBoost 2.3.
Perusing the suggested website suggests re-saving the model object using
xgb.save(bst, 'model_file_name.json')
Unfortunately, I cannot do this because I did not fit my model directly with the xgboost package but via the tidymodels package parsnip using the boost_tree() function and specifying xgboost as the engine. I’ve searched through the package reference at https://parsnip.tidymodels.org/reference/details_boost_tree_xgboost.html, but I don’t see how to find and update the xgboost model within the model_fit object.
I want to avoid reaching a point in the future where the model can no longer be applied to new data. However, that day might be far since the release of XGBoost 2.3 is unknown. I have considered refitting the model, but I could also load an older version of xgboost when applying this model.
Thank you for your thoughts.
I am doing an SEM analysis in R using lavaan. My analysis is complete, but I am not satisfied with the diagrams of the measurement model and structural model provided by semPlot and lavaanPlot. In lavaanPlot, the path arrows are curly, giving a hand-drawn impression, in semPlot, the variables and paths are a bit cluttered.
Is it possible to mimic diagrams produced by AMOS in R (not necessarily the colours etc. But at least the overall structure)? I tried using semPlotModel_AMOS(model.fit) (here model.fit is the object I got by fitting my SEM model to data).
But it throws an error,
Error in file.info(object) : invalid filename argument
In addition: Warning message:
In semPlotModel_Amos(model3.fit) :
(Residual) variances of Amos model is not yet supported
You can use the Onyx package to build a lavaan (or OpenMx) model starting with the path diagram, similar to how Amos works. I think the WebSEM site still provides similar functionality if you want to fit your model using a web interface.
To automatically design a graphic from a fitted lavaan model, I am unaware of options other than the semPlot and lavaanPlot packages you already mentioned. But you could fairly easily build your own custom path diagram using the DiagrammeR package (see their video demonstration). Unfortunately, there must always be a trade-off between ease-of-use/automation and flexibility/control to get exactly the picture you want.
Good afternoon, all--thank you in advance for your help! I'm somewhat new to R, so my apologies if this is a trivial or otherwise inappropriate question.
TL;DR: I'm trying to determine Variable Importance (VIM) for factor variables with a random forest model built-in RandomForestSRC, which is not a built-in feature of that package. Using both the LIME and DALEX packages, I encounter the same error: cannot coerce class 'c("rfsrc, "predict", "class")' to a data.frame. Any assistance resolving this error, or alternate approaches, would be greatly appreciated!
I have a random forest model I've built in R, using the RandomForestSRC package. The model seems to work great--training and testing went fine, got the predicted output I needed, results seem in-line with what I would expect. Unfortunately, one of the requirements is that I need to be able to indicate how the model arrived at its conclusions (eg, I need to also include variable importance as a part of the output), for both continuous and factor variables.
This doesn't seem to be a built-in feature with the RandomForestSRC package, so I've looked into both the LIME and DALEX packages, both of which should be able to break out VIM from the existing RF model. Unfortunately, neither have native support for the RFSRC package, which means I've needed to build in the prediction functions myself, as recommended by this vignette:https://uc-r.github.io/dalex
model_type.rfsrc <- function (x, ...) {
return ('classification')
}
predict_model.rfsrc <- function (x, newdata, type, ...) {
as.data.frame(predict(x, newdata, ...)
}
Unfortunately, in running the VIM section of the model (in both LIME and DALEX), I'm asked to pass both the predicted output and the model that created that output. In doing so, it hits an error with the above predict_model function:
error in as.data.frame.default(predict(model, (newdata))):
cannot coerce class 'c("rfsrc, "predict", "class")' to a data.frame
And, like...of course, it can't; it's trying to turn the model itself into a data frame. Unfortunately, while I think I understand why R is giving me that error, that's about as far as I've been able to figure out on my own.
Additionally, I'm using the RandomForestSRC package for two reasons: it doesn't put a limit on the number of factor variables, and it can handle imbalanced data. I'm working with medical data, so both of these are necessary (eg, there are ~100,000 different medical codes that can be encoded in a single data variable, and the ratio of "people-who-don't-have-this-condition" vs "people-who-do-have-this-condition" is frequently 100 to 1). If anyone has any suggestions for alternative packages that handle these issues, though, and have built-in VIM functionality (or integrate with DALEX / LIME), that would be fantastic as well.
Thank you all very much for your help!
I built a Caret ensemble model by stacking models together.
The model ran successfully and I got encouraging results.
The challenge came when I tried to use Lime to interpret the black box predictions. I got an error saying "The class of model must have a model_type method"
The only time I encountered such error was when using Lime in H20. Subsequently, the guys behind Lime have released an update that supports H20 in Lime.
Does anyone know if any work has been done to include CaretStack for use with Lime? Or know of a workaround to solve this issue.
According to the Lime documentation, these are the supported models
Out of the box, lime supports the following model objects:
train from caret
WrappedModel from mlr
xgb.Booster from xgboost
H2OModel from h2o
keras.engine.training.Model from keras
lda from MASS (used for low-dependency examples)
If your model is not one of the above you'll need to implement support yourself. If the model has a predict interface mimicking that of predict.train() from caret, it will be enough to wrap your model in as_classifier()/as_regressor() to gain support.
Otherwise you'll need need to implement a predict_model() method and potentially a model_type() method (if the latter is omitted the model should be wrapped in as_classifier()/as_regressor(), everytime it is used in lime()).
Solution to your question:
For your case, CaretStack has a predict interface mimicking that of predict.train(), so wrapping your model in as_classifier() or as_regressor() should suffice
I'm using the glmulti package to do variable selection on the fixed effects of a mixed model in lme4. I had the same problem retrieving coefficients and confidence intervals that was solved by the author of the package in this thread. Namely using the coef or coef.multi gives an check.names error and the coefficients are listed as NULL when calling the predict method. So I tried the solution listed on the thread linked above, using:
setMethod('getfit', 'merMod', function(object, ...) {
summ=summary(object)$coef
summ1=summ[,1:2]
if (length(dimnames(summ)[[1]])==1) {
summ1=matrix(summ1, nr=1, dimnames=list(c("(Intercept)"),c("Estimate","Std. Error")))
}
cbind(summ1, df=rep(10000,length(fixef(object))))
})
I fixed the missed " in the original post and the code ran. But, now instead of getting
Error in data.frame(..., check.names = FALSE) :arguments imply
differing number of rows: 1, 0
I get this error for every single model...
Error in calculation of the Satterthwaite's approximation. The output
of lme4 package is returned summary from lme4 is returned some
computational error has occurred in lmerTest
I'm using lmerTest and it doesn't surprise me that it would fail if glmulti can't pull the correct info from the model. So really it's the first two lines of the error that are probably what should be focussed on.
A description of the original fix is on the developers website here. Clearly the package hasn't been updated in awhile, and yes I should probably learn a new package...but until then I'm hoping for a fix. I'll contact the developer directly through his website. But, in the mean time, has anyone tried this and found a fix?
lme4 glmulti rJava and other related packages have all been updated to the latest version.