Get activation functions (and hyperparamters) of trained model keras/tensorflow in R Studio - r

I have trained a model with keras/tensorflow in R Studio and stored it.
I now reload the model with
my_model <- load_model_hdf5("my_model.h5")
with
summary(my_model)
I get a summary of the model, including number and size of layers.
Yet, I don't see what the activation functions are.
Is there a way to the access the activation functions?
Secondly, is there also a way to access the hyperparameters as epoch number, batch size,.. that were used in training this model?

You can check Netron with loading your .h5 file. Netron is highly useful.

Related

How should trained ML models be incorporated into a package that USES those trained models

I have been working on a ML project for which that work (done inside an R-project) resulted in some ML models (built with caret) ALONG WITH code that uses those models for additional analysis.
As the next phase, I am "deploying" these models by creating an R-package that my collaborators can use for analysis of new data, where that analysis includes USING the trained ML models. This package includes functions that generate reports, where, embedded in that report is the application of the trained ML models against the new data sets.
I am trying to identify the "right" way to include those trained models in the package. (Note, currently each model is saved in its own .rds file).
I want to be able to use those models inside of package functions.
I also want to consider the possibility of "updating" the models to a new version at a later date.
So ... should I:
Include the .rda files in inst/exdata
Include as part of sysdata.rda
Put them in an external data package (which seems reasonable, except almost all examples in tutorials expect a data package to
include data.frame-ish objects.)
With regard to that third option ... I note that these models likely imply that there are some additional "NAMESPACE" issues at play, as the models will require a whole bunch of caret related stuff to be useable. Is that NAMESPACE modification required to be in the "data" package or the package that I am building that will "use" the models?
My first intention is to go for 1. There is no need to go for other formats as PMML as you only want to run it within R. So I consider Rda as natively best. As long, as your models are not huge, it should be fine to share with collaborators (but maybe not for a CRAN package). I see, that 3. sounds convenient but why seperate models and functions? Freshly trained models then would come with a new package version, as you anayway would need to go with a data package. I dont see gaining much this way, but I have not much experiance with data packages.

R XGBoost - xgb.save or xgb.load loss of data

After training an XGBoost model in R, I am presented with a model object called xgb which is a list of 7.
When I save the model using xgb.save and then reload using xgb.load, I am presented with what seems to be a 'smaller' model object which is a list of 2.
Obviously I can't share the code as you would need the training data which is massive, so all I can really show is a picture of the variable editor.
Below is model object xgb which is the original model after training, vs. the model object test1 which is the same model but saved and reloaded:
Why does this happen and am I losing valuable information upon saving/loading my models?
Any help is appreciated.
Maybe late, but I was having the same problem and found a solution.
Saving the xgb-model as "rds" doesn't loose any information and the reloaded model xgb_ does generate the same forecast values as the original xgb when I tested it. Hope that helps!
saveRDS(xgb, "model.rds")
xgb_ <- readRDS("model.rds")
all.equal(xgb, xgb_)
You are loosing information due to rounding errors after save/load. See this issue. I believe it is currently a bug.
As to why the loaded model is a smaller list, see here. So again, you are loosing information such as the callbacks and parameters. But these are not essential for prediction and not portable to e.g. python.

Used saveRDS to save a model but not enough memory to readRDS?

I created a model based on a very large dataset and had the program save the results using
saveRDS(featVarLogReg.mod, file="featVarLogReg.mod.RDS")
Now I'm trying to load the model to evaluate, but readRDS runs out of memory.
featVarLR.mod <- readRDS(file = "featVarLogReg.mod.RDS")
Is there a way to load the file that takes less memory? Or at least the same amount of memory that was used to save it?
The RDS file ended up being 1.5GB in size for this logistic regression using caret. My other models using the same dataset and very similar caret models were 50MB in size so I can load them.
The caret linear model saves the training data in the model object. You could try to use returnData = FALSE in the trainControl argument to train. I don't recall if this fixed my issue in the past.
https://www.rdocumentation.org/packages/caret/versions/6.0-77/topics/trainControl
You could also try to just export the coefficients into a dataframe and use a manual formula to score new data.
Use coef(model_object)

How to export an R Random Forest model for use in Excel VBA without API calls

Problem:
I have a Random Forest model trained in R. I need to deploy this model in a standalone Excel tool that will be used by 350 people across a sales network to perform real-time predictions based on data entered into the spreadsheet by users.
How can I do this?
Constraints:
It is not an option to require users to install R on their local machines.
It is not an option to have a server (physical or cloud) providing a scoring API.
What have I done so far?
1. PMML
I can export the model in PMML (XML structure). From research I can see there are libraries for loading and executing PMML inputs in Python and Java. However I haven't found anything implemented in VBA / VB.
2. Zementis
I looked into a solution called Zementis which offers an Excel add-in to deploy PMML models. However from my understanding this requires web-service calls to a cloud server (e.g. AWS) where the actual model execution happens. My IT security department will not allow this.
3. Others
The most common recommendation seems to be to call R to load the model and run the predict function. As noted above, this is not a viable option.
Detailed Context:
The Random Forest model is trained in R, with c. 30 variables. The model is used to recommend "personalised" prices for products as part of a sales process.
The model needs to be distributed to the sales network, with about 350 users. The business's preference is to integrate the model into an existing spreadsheet tool that sales teams currently use to calculate deal profitability.
This means that I need to be able to export the model in a way that it can be implemented in Excel VBA.
Given timescales, the implementation needs to be self-contained with no IT infrastructure or additional application installs. We are working with the organisation's IT team on a server based solution, however their deployment timescales are 12 months+ which means we need a tactical solution in the short-term.
Here's one approach to get the "rules" for the trees (example using the mtcars dataset)
install.packages("randomForest")
library(randomForest)
head(mtcars)
set.seed(1)
fit <- randomForest(mpg ~ ., data=mtcars, importance=TRUE, proximity=TRUE)
print(fit)
## Look at variable importance:
importance(fit)
# Print the rules for each tree in the forest
install.packages("rattle")
library(rattle)
printRandomForests(fit)
It is probably unrealistic to use the rules for 500 trees, but maybe you could implement 100 trees in your vba and then take an average of the results (for a continuous response) or predict the class with the most votes across the trees (for a categorical response).
Maybe you could recreate the model on a Worksheet.
As far as I know, Excel can import XML structures (on the Development Tools ribbon).
Edit: 1) save pmml structure in plaintext editor as .xml file.
2) Open the file in Excel 2013 (maybe other versions also do it)
3) Click through the error message and open the file anyway. Trees open as a table, a bit funny, but recognizable.
4) Create prediction calculation (generic fn in VBA) to operate on the tree.

How to retrain model using old model + new data chunk in R?

I'm currently working on trust prediction in social networks - from obvious reasons I model this problem as data stream. What I want to do is to "update" my trained model using old model + new chunk of data stream. Classifiers that I am using are SVM, NB (e1071 implementation), neural network (nnet) and C5.0 decision tree.
Sidenote: I know that this solution is possible using RMOA package by defining "model" argument in trainMOA function, but I don't think I can use it with those classifiers implementations (if I am wrong please correct me).
According to strange SO rules, I can't post it as comment, so be it.
Classifiers that you've listed need full data set at the time you train a model, so whenever new data comes in, you should combine it with previous data and retrain the model. What you are probably looking for is online machine learning. One of the very popular implementations is Vowpal Wabbit, it also has bindings to R.

Resources