I used H2O version 3.26.0.5 to train a GBM model in a binary problem, to predict the probability of positive class. I saved the model file as MOJO and used this file to generate predictions in new data:
## first, restart R session ##
# load the model
library(h2o)
h2o.init(nthreads = -1)
model <- h2o.import_mojo("path_to_mojo_file")
# load the new data input
input <- read_csv("path_to_new_data")
input_h2o <- as.h2o(input)
# predictions
predictions <- predict(model, input_h2o)
When I run this in my computer I get different predictions than when I use the same MOJO file to predict in a production environment.
Does this should happen with the MOJO file? I believed that once the model was saved in MOJO format, you could make predictions in any environment and get the same results. Does anyone knows why this is happening?
When I run this in my computer I get different predictions than when I use the same MOJO file to predict in a production environment.
Is the production environment running the exact same R script?
In the end I found out there was an error in the script for the production environment. After it was fixed, the predictions became pretty close.
Related
'bst' is the name of an xgboost model that I built in R. It gives me predicted values for the test dataset using this code. So it is definitely an xgboost model.
pred.xgb <- predict(bst , xdtest) # get prediction in test sample
cor(ytestdata, pred.xgb)
Now, I would like to save the model so another can use the model with their data set which has the same predictor variables and the same variable to be predicted.
Consistent with page 4 of xgboost.pdf, the documentation for the xgboost package, I use the xgb.save command:
xgb.save(bst, 'xgb.model')
which produces the error:
Error in xgb.save(bst, "xgb.model") : model must be xgb.Booster.
Any insight would be appreciated. I searched the stack overflow and could not locate relevant advice.
Mike
It's hard to know exactly what's going on without a fully reproducible example. But just because your model can make predictions on the test data, it doesn't mean it's an xgboost model. It can be any type of model with a predict method.
You can try class(bst) to see the class of your bst object. It should return "xgb.Booster," though I suspect it won't here (hence your error).
On another note, if you want to pass your model to another person using R, you can just save the R object rather than exporting to binary, via:
save(bst, model.RData)
I have trained a model with keras/tensorflow in R Studio and stored it.
I now reload the model with
my_model <- load_model_hdf5("my_model.h5")
with
summary(my_model)
I get a summary of the model, including number and size of layers.
Yet, I don't see what the activation functions are.
Is there a way to the access the activation functions?
Secondly, is there also a way to access the hyperparameters as epoch number, batch size,.. that were used in training this model?
You can check Netron with loading your .h5 file. Netron is highly useful.
I created a model based on a very large dataset and had the program save the results using
saveRDS(featVarLogReg.mod, file="featVarLogReg.mod.RDS")
Now I'm trying to load the model to evaluate, but readRDS runs out of memory.
featVarLR.mod <- readRDS(file = "featVarLogReg.mod.RDS")
Is there a way to load the file that takes less memory? Or at least the same amount of memory that was used to save it?
The RDS file ended up being 1.5GB in size for this logistic regression using caret. My other models using the same dataset and very similar caret models were 50MB in size so I can load them.
The caret linear model saves the training data in the model object. You could try to use returnData = FALSE in the trainControl argument to train. I don't recall if this fixed my issue in the past.
https://www.rdocumentation.org/packages/caret/versions/6.0-77/topics/trainControl
You could also try to just export the coefficients into a dataframe and use a manual formula to score new data.
Use coef(model_object)
stackoverflowers , I need some help from tensorflow experts. Actually I've buid a multi-layer perceptron, trained it, tested it and everything seemed ok. However, When I restored the model and tried to use it again, its accuracy does not correspond to the trained model and the predictions are pretty different from the real labels. The code I am using for the restoring - prediction is the following : (I'm using R)
pred <- multiLayerPerceptron(test_data)
init <- tf$global_variables_initializer()
with(tf$Session() %as% sess, {
sess$run(init)
model_saver$restore(sess, "log_files/model_MLP1")
test_pred_1 <- sess$run(pred, feed_dict= dict(x = test_data))
})
Is everything Ok with the code ? FYI I wanted by this part of the code to get the predictions of my model for test_data.
Your code does not show where model_saver is initialized, but it should be created after you create the computational graph. If not, it does not know which variables to restore/save. So create your model_saver after pred <- multiLayerPerceptron(test_data).
Note that, if you made the same mistake during training, your checkpoint will be empty and you will need to retrain your model first.
I'm working on an R package where I need to call predict.lm on a model I've already fit. I've saved the linear model as a file which I can put in the data folder of the package. I'm worried about slowing things down if I load the model every time the function is called. The function that uses this model is the meat of the package and gets called on every iteration of a simulation, so I'd prefer to read the saved model once when the package is loaded. Is there a way to do that?
Why not just save the coefficients and then "predict" with them?
c.vec <- coef(fit) # Intercept + terms
Yhat <- c.vec * c(1, data.vec)