Getting class labels from a keras model in R - r

I am developing an image classification workflow which uses keras through R. The workflow will likely be run multiple times, potentially by multiple users. I save a custom trained version of keras' Iv3 model, as a .h5 file.
Once the file is saved and loaded back in with load_model_hdf5(), is there a way to see the class labels with which the model has been trained?
I understand that the classes are the alphabetized names of the folders in the training directory but there will be cases where the model is loaded on a new machine without access to the training directory.
Right now I am manually loading in the list of class labels (as strings) which is not a good solution.
Ideally, I would load in my trained model and then access a list of class labels...
pseudocode might look like this
model_fn <- # some example model file (.h5)
model <- load_model_hdf5(model_fn)
classes <- model$classes

Related

How to save my trained Random Forest model and apply it to test data files one by one?

This is a long shot and more of a code designing sort of ask for a rookie like me but I think it has real value for real world applications
The core questions are:
Can I save a trained ML model, such as Random Forest (RF), in R and call/use it later without the need to reload all the data used for training it?
When, in real life, I have a massive folder of hundreds and thousands files of data to be tested, can I load that model I saved somewhere in R and ask it to go read the unknown files one by one (so I am not limited by RAM size) and perform regression/classification etc analysis for each of the file read in, and store ALL the output together into a file.
For example,
If I have 100,000 csv files of data in a folder, and I want to use 30% of them as training set, and the rest as test for a Random Forest (RF) classification.
I can select the files of interest, call them "control files". Then use fread() then randomly sample 50% of the data in those files, call the CARET library or RandomForest library, train my "model"
model <- train(,x,y,data,method="rf")
Now can I save the model somewhere? So I don't have to load all the control files each time I want to use the model?
Then I want to apply this model to all the remaining csv files in the folder, and I want it to read those csv files one by one when applying the model, instead of reading them all in, due to RAM issue.

How should trained ML models be incorporated into a package that USES those trained models

I have been working on a ML project for which that work (done inside an R-project) resulted in some ML models (built with caret) ALONG WITH code that uses those models for additional analysis.
As the next phase, I am "deploying" these models by creating an R-package that my collaborators can use for analysis of new data, where that analysis includes USING the trained ML models. This package includes functions that generate reports, where, embedded in that report is the application of the trained ML models against the new data sets.
I am trying to identify the "right" way to include those trained models in the package. (Note, currently each model is saved in its own .rds file).
I want to be able to use those models inside of package functions.
I also want to consider the possibility of "updating" the models to a new version at a later date.
So ... should I:
Include the .rda files in inst/exdata
Include as part of sysdata.rda
Put them in an external data package (which seems reasonable, except almost all examples in tutorials expect a data package to
include data.frame-ish objects.)
With regard to that third option ... I note that these models likely imply that there are some additional "NAMESPACE" issues at play, as the models will require a whole bunch of caret related stuff to be useable. Is that NAMESPACE modification required to be in the "data" package or the package that I am building that will "use" the models?
My first intention is to go for 1. There is no need to go for other formats as PMML as you only want to run it within R. So I consider Rda as natively best. As long, as your models are not huge, it should be fine to share with collaborators (but maybe not for a CRAN package). I see, that 3. sounds convenient but why seperate models and functions? Freshly trained models then would come with a new package version, as you anayway would need to go with a data package. I dont see gaining much this way, but I have not much experiance with data packages.

R - is it possible to save model definition without data?

I have app that could calculate some defined models (let's say lm model). The app is hosted on the server. Now I would like to add functionality, that user could define "on the side" any model (arima or any defined by user), add it to app and then calculate estimates by using that model.
The best solution would be if user could define model on own R instance, export it to a file and import via front-end on the server. The best solution for me, because user don't have to have any permissions to the server.
I was thinking about saving model definition as a RDS file and then importing it to app. However if model is saved via:
modelTest <- glm(y ~ x, data = df)
saveRDS(modelTest, file = "modelTest.rds")
And then after import:
modelTest2 <- readRDS("modelTest.rds")
df2$prediction <- predict(modelTest2, newdata=df2)
In the above example, the whole glm object is saved. It means that predicted values are also saved, so the file could be large if many predicted values are saved. Is it possible to use another method and save model with only model definition without data?

Split a pre-trained CoreML model into two

I have a Sound Classification model from turicreate example here:
https://apple.github.io/turicreate/docs/userguide/sound_classifier/
I am trying to split this model into two and save the two parts as separate CoreML Models using coremltools library. Can anyone please guide me on how to do this?
I am able to load the model and even print out the spec of the model. But don't know where to go from here.
import coremltools
mlmodel = coremltools.models.MLModel('./EnvSceneClassification.mlmodel')
# Get spec from the model
spec = mlmodel.get_spec()
Output should be two CoreML Models i.e. the above model split into two parts.
I'm not 100% sure on what the sound classifier model looks like. If it's a pipeline, you can just save each sub-model from the pipeline as its own separate mlmodel file.
If it's not a pipeline, it requires some model surgery. You will need to delete layers from the spec (with del spec.neuralNetworkClassifier.layers[a:b]).
You'll also need to change the inputs of the first model and the outputs of the second model to account for the deleted layers.

Large neural network model size in neuralnet R

I see very large file sizes when I try to save my trained NN model using neuralnet in R. I know one (main) reason for that is that the model contains extra information that I may not need in order to reuse the model, like, the results (net$net.result), and some other elements that are saved with the network.
Anyway I can remove them from the model before saving?

Resources