I am currently building a R package and I want to use a trained model in one of my R script. Is it possible to load the model (saved in .rds form)?
Yes, it's exactly the way you described it. Save object with saveRDS function and load it with readRDS. You have to remember to load every package you used for the model, and to prepare data in exact way for prediction.
Related
I am writing an R package with very large internal data consisting of several models created with caret, which add up to almost 2 GB. The idea is that this package will live on my computer exclusively, and several other packages I built will be able to use it to make predictions. My computer has plenty of memory, but I can't figure out how to set up the package so that the models work efficiently.
I can install the package successfully if I store the large models in inst/extdata, set lazy loading to false in the DESCRIPTION file, and load the models inside the function that uses them. (I think I could also so this by putting the models in the data directory, turning off lazy loading, and loading them inside the function.) But this is very slow, since my other packages call the prediction function repeatedly and it has to load the models every time. It would work much better if the models were loaded along with the package, and just stayed in memory.
Other things I have tried made it so that the package couldn't be installed at all. When I try I get the error "long vectors not supported yet." These include:
storing the models in inst/extdata with lazy loading
storing the models in R/sysdata.rda (with or without lazy loading)
storing the models in the data directory (so they're exported) with lazy loading
Is there a better way to do this, that keeps the models loaded when the package is loaded? Or is there some better alternative to using an R package?
I have been working on a ML project for which that work (done inside an R-project) resulted in some ML models (built with caret) ALONG WITH code that uses those models for additional analysis.
As the next phase, I am "deploying" these models by creating an R-package that my collaborators can use for analysis of new data, where that analysis includes USING the trained ML models. This package includes functions that generate reports, where, embedded in that report is the application of the trained ML models against the new data sets.
I am trying to identify the "right" way to include those trained models in the package. (Note, currently each model is saved in its own .rds file).
I want to be able to use those models inside of package functions.
I also want to consider the possibility of "updating" the models to a new version at a later date.
So ... should I:
Include the .rda files in inst/exdata
Include as part of sysdata.rda
Put them in an external data package (which seems reasonable, except almost all examples in tutorials expect a data package to
include data.frame-ish objects.)
With regard to that third option ... I note that these models likely imply that there are some additional "NAMESPACE" issues at play, as the models will require a whole bunch of caret related stuff to be useable. Is that NAMESPACE modification required to be in the "data" package or the package that I am building that will "use" the models?
My first intention is to go for 1. There is no need to go for other formats as PMML as you only want to run it within R. So I consider Rda as natively best. As long, as your models are not huge, it should be fine to share with collaborators (but maybe not for a CRAN package). I see, that 3. sounds convenient but why seperate models and functions? Freshly trained models then would come with a new package version, as you anayway would need to go with a data package. I dont see gaining much this way, but I have not much experiance with data packages.
How do I save multiple formulas in a R Package? Currently I have to run the as.formula code in order to retrieve the models, would be great if I can package them all in R.
I have saved package data in rda but don't know how to deal with "formula" class. Thanks!
Update: Saved a list of formulas in a list as rda file, now I can call them from the package now. Only the formulas are needed, not the environment in my case.
I am working on the "randomForest" R package to change sampling method for feature subset selection at the nodes of trees in the forest. Currently random forest uses simple random sampling to do that. I tried to look at the R code by using the commands
library(randomForest)
getAnywhere(randomForest.default)
but could not find the relevant code chunk where "mtry" features are selected. How can I do this change in the source code?
I also tried using the S3 and S4 methods described in this SO question, but did not see all the functions in the randomForest package, and more important, did not see the randomForest() method listed.
However, if you navigate to the CRAN page for randomForest, you will see a link to the source code for the package:
https://cran.r-project.org/web/packages/randomForest/index.html
You can download a TAR file which has all the source code for the package from the above link. The actual source code seems to be in the code folder, e.g. rf.c which looks like it might be the file you want to refactor.
I created an xml file using pmml function from pmml library in R.
adamodel_iOS=ada(label~.,data=train_iOS, iter=ntrees, verbose=TRUE, loss="ada", bag.frac=0.7, nu=0.1, control=defctrl, type="real")
Ptrain_iOS = predict(adamodel_iOS,newdata=train_iOS, type="prob")
library(pmml)
adapmml_iOS=pmml(adamodel_iOS)
saveXML(adapmml_iOS,"model_iOS.xml")
save.image()
After, training model in the first line, I found the corresponding probabilities for the training data.
Now I want to use this xml file to generate predictions on a set of data(basically the training set again). How do I do that in R? I see that in java and spark, we can load xml file generated by pmml function and then there are functions which can make predictions.
Basically, I am looking for a function in R that can take this xml file as an input and then return an object which in turn takes some datapoints as input and return their probabilities of having label 0 and 1.
I found a link:
Can PMML models be read in R?
but it does not help
Check this link for the list of PMML producers and consumers. As you can see R is listed as producer not consumer. Also, algorithms for which R can produce the corresponding PMML files are listed.
The most comprehensive tool for PMML validator, convertor, and also for scoring data using PMML models is ADAPA, which is not free.
KNIME is an open source drag & drop analytics tool which supports both import and export of PMML files (not for all models and the features are limited.) It supports R, Python, and Java too.
Although it's a long time ago, I still want to share that you can use the "reticulate" to call the python pypmml package to implement your ideas in R, and in order to be more friendly and make the prediction look more like the predict function in R, I will It is encapsulated, the address of the package is here enter link description here