'bst' is the name of an xgboost model that I built in R. It gives me predicted values for the test dataset using this code. So it is definitely an xgboost model.
pred.xgb <- predict(bst , xdtest) # get prediction in test sample
cor(ytestdata, pred.xgb)
Now, I would like to save the model so another can use the model with their data set which has the same predictor variables and the same variable to be predicted.
Consistent with page 4 of xgboost.pdf, the documentation for the xgboost package, I use the xgb.save command:
xgb.save(bst, 'xgb.model')
which produces the error:
Error in xgb.save(bst, "xgb.model") : model must be xgb.Booster.
Any insight would be appreciated. I searched the stack overflow and could not locate relevant advice.
Mike
It's hard to know exactly what's going on without a fully reproducible example. But just because your model can make predictions on the test data, it doesn't mean it's an xgboost model. It can be any type of model with a predict method.
You can try class(bst) to see the class of your bst object. It should return "xgb.Booster," though I suspect it won't here (hence your error).
On another note, if you want to pass your model to another person using R, you can just save the R object rather than exporting to binary, via:
save(bst, model.RData)
Related
Hi I am a newbie in Depp learning fields.
I ran a neural network model (regression) with 2 hidden layers in R (neuralnet Package). then I used the the compute function to get the predicted probabilities.Now I want to regenerate predicted output using the equation used in the neural net. for example, following are weights received from the model object
Intercept.to.1layhid1 4.55725020215
Var1.to.1layhid1 -13.61221477737
VAr2.to.1layhid1 0.30686384857
var1.to.1layhid2 0.23527690062
var2.to.1layhid2 0..67345678
1layhid.1.to.target 1.95414397785
1layhid.2.to.target 3.68009136857
Can any one help me derive a equation with the above weights so that I can replicate the output
Thanks
In order to get the output for new data, you can always use predict function using the fitted model, which is the returned object from neuralnet function.
For instance, if your model is the following:
neuralFit = neuralnet(trainData)
Then you reproduce the output with the following:
predict(neuralFit,newdata)
Otherwise, you'll need to compute the result manually. But you need to understand your network architecture first.
I have been unable to find information on how exactly predict.cv.glmnet works.
Specifically, when a prediction is being made are the predictions based on a fit that uses all the available data? Or are predictions based on a fit where some data has been discarded as part of the cross validation procedure when running cv.glmnet?
I would strongly assume the former but was unable to find a sentence in the documentation that clearly states that after a cross validation is finished, the model is fitted with all available data for a new prediction.
If I have overlooked a statement along those lines, I would also appreciate a hint on where to find this.
Thanks!
In the documentation for predict.cv.glmnet :
"This function makes predictions from a cross-validated glmnet model, using the stored "glmnet.fit" object ... "
In the documentation for cv.glmnet (under value):
"glmnet.fit a fitted glmnet object for the full data."
So I'm looking for a quick way to create a table of DIC scores from MCMCglmm models in R. I've run 10 different models and could extract the DIC from each separately using the following code, where the model is called m1:
m1.DIC <- m1$DIC
But then I have to do this for each model, and then create the dataframe, which is tedious. I've looked at the documentation for the MCMCglmm package and haven't found any hints about whether I can get a quick summary across models through some built-in function. Is there one? Is there another package that can do this? I know the rethinking package uses compare to get quick and easy model comparisons, but this doesn't appear to work with MCMCglmm outputs, as I get the following error message:
> compare(m1, m2, WAIC=FALSE)
Error in UseMethod("logLik") :
no applicable method for 'logLik' applied to an object of class "MCMCglmm"
In addition: Warning message:
In DIC(z, n = n) :
No specific DIC method for object of class MCMCglmm. Returning AIC instead. applied to an object of class "MCMCglmm"
Is there a similar method that will work to compare MCMCglmm models?
EDIT: Also note that the compare function in rethinking calculates weights for the models, from the DIC. Maybe this just doesn't exist in a form that works with the MCMCglmm package.
If you want to generate and compare a list of possible models from scratch you can use the dredge function in the MuMIn package http://cran.at.r-project.org/web/packages/MuMIn/MuMIn.pdf which supports MCMCglmm objects.
First you need to make the MCMCglmm call updateable though (so that dredge can change the model composition):
MCMCglmm.updateable<- updateable(MCMCglmm).
Then you can run your global model:
global.model<- MCMCglmm.updateable(y~x1+...etc.)
The call to dredge would then be something like
dredge.MCMCglmm<- dredge(global.model, rank="DIC" ...)
You can also get standardized coefficients and adjusted R^2.
If you already have a list of fitted models you can use model.sel (in the same package) to generate a ranked table with model weights etc:
model.sel(model1, model2, model3, rank="DIC")
Good luck!
Best,
Adrian
I'm exporting an R randomForest model to PMML. The resulting PMML always has the class as the first element of the DataDictionary element, which is not always true.
Is there some way to fix this or at least increment the PMML with custom Extension elements? That way I could put the class index there.
I've looked in the pmml package documentation, as well as in the pmmlTransformations packages, but couldn't find anything there that could help me solve this issue.
By PMML class I assume you mean the model type (classification vs regression) in the PMML model attributes?
If so, it is not true that the model type is determined from the data type of the first element of the DataDictionary....these are completely independent. The model type is determined from the model type R thinks it is. The R random forest object determines the type it thinks it is (model$type) and that is the model type exported by the pmml function. If you want your model to be a certain type, just make sure you let R know that...for example, if you are using the iris data set, if your predicted variable is Sepal.Length, R will correctly assume it is a regression model. If you insist on treating it as a classification model, try using as.factor(Sepal.Length) instead.
I'm using randomForest in order to find out the most significant variables. I was expecting some output that defines the accuracy of the model and also ranks the variables based on their importance. But I am a bit confused now. I tried randomForest and then ran importance() to extract the importance of variables.
But then I saw another command rfcv (Random Forest Cross-Valdidation for feature selection), which should be the most appropriate for this purpose I suppose, but the question I have regarding this is: how to get the list of the most important variables? How to see the output after running it? Which command to use?
Another thing: What is the difference between randomForest and predict.randomForest?
I am not very familiar with randomforest and R therefore any help would be appreciated.
Thank you in advance!
After you have made a randomForest model you use predict.randomForest to use the model you created on new data e.g. build a random forest with training data then run your validation data through that model with predict.randomForest.
As for the rfcv there is an option recursive which (from the help):
whether variable importance is (re-)assessed at each step of variable
reduction
Its all in the help file