I see very large file sizes when I try to save my trained NN model using neuralnet in R. I know one (main) reason for that is that the model contains extra information that I may not need in order to reuse the model, like, the results (net$net.result), and some other elements that are saved with the network.
Anyway I can remove them from the model before saving?
Related
I am developing an image classification workflow which uses keras through R. The workflow will likely be run multiple times, potentially by multiple users. I save a custom trained version of keras' Iv3 model, as a .h5 file.
Once the file is saved and loaded back in with load_model_hdf5(), is there a way to see the class labels with which the model has been trained?
I understand that the classes are the alphabetized names of the folders in the training directory but there will be cases where the model is loaded on a new machine without access to the training directory.
Right now I am manually loading in the list of class labels (as strings) which is not a good solution.
Ideally, I would load in my trained model and then access a list of class labels...
pseudocode might look like this
model_fn <- # some example model file (.h5)
model <- load_model_hdf5(model_fn)
classes <- model$classes
This is a long shot and more of a code designing sort of ask for a rookie like me but I think it has real value for real world applications
The core questions are:
Can I save a trained ML model, such as Random Forest (RF), in R and call/use it later without the need to reload all the data used for training it?
When, in real life, I have a massive folder of hundreds and thousands files of data to be tested, can I load that model I saved somewhere in R and ask it to go read the unknown files one by one (so I am not limited by RAM size) and perform regression/classification etc analysis for each of the file read in, and store ALL the output together into a file.
For example,
If I have 100,000 csv files of data in a folder, and I want to use 30% of them as training set, and the rest as test for a Random Forest (RF) classification.
I can select the files of interest, call them "control files". Then use fread() then randomly sample 50% of the data in those files, call the CARET library or RandomForest library, train my "model"
model <- train(,x,y,data,method="rf")
Now can I save the model somewhere? So I don't have to load all the control files each time I want to use the model?
Then I want to apply this model to all the remaining csv files in the folder, and I want it to read those csv files one by one when applying the model, instead of reading them all in, due to RAM issue.
I have a Sound Classification model from turicreate example here:
https://apple.github.io/turicreate/docs/userguide/sound_classifier/
I am trying to split this model into two and save the two parts as separate CoreML Models using coremltools library. Can anyone please guide me on how to do this?
I am able to load the model and even print out the spec of the model. But don't know where to go from here.
import coremltools
mlmodel = coremltools.models.MLModel('./EnvSceneClassification.mlmodel')
# Get spec from the model
spec = mlmodel.get_spec()
Output should be two CoreML Models i.e. the above model split into two parts.
I'm not 100% sure on what the sound classifier model looks like. If it's a pipeline, you can just save each sub-model from the pipeline as its own separate mlmodel file.
If it's not a pipeline, it requires some model surgery. You will need to delete layers from the spec (with del spec.neuralNetworkClassifier.layers[a:b]).
You'll also need to change the inputs of the first model and the outputs of the second model to account for the deleted layers.
I created a model based on a very large dataset and had the program save the results using
saveRDS(featVarLogReg.mod, file="featVarLogReg.mod.RDS")
Now I'm trying to load the model to evaluate, but readRDS runs out of memory.
featVarLR.mod <- readRDS(file = "featVarLogReg.mod.RDS")
Is there a way to load the file that takes less memory? Or at least the same amount of memory that was used to save it?
The RDS file ended up being 1.5GB in size for this logistic regression using caret. My other models using the same dataset and very similar caret models were 50MB in size so I can load them.
The caret linear model saves the training data in the model object. You could try to use returnData = FALSE in the trainControl argument to train. I don't recall if this fixed my issue in the past.
https://www.rdocumentation.org/packages/caret/versions/6.0-77/topics/trainControl
You could also try to just export the coefficients into a dataframe and use a manual formula to score new data.
Use coef(model_object)
I'm currently working on trust prediction in social networks - from obvious reasons I model this problem as data stream. What I want to do is to "update" my trained model using old model + new chunk of data stream. Classifiers that I am using are SVM, NB (e1071 implementation), neural network (nnet) and C5.0 decision tree.
Sidenote: I know that this solution is possible using RMOA package by defining "model" argument in trainMOA function, but I don't think I can use it with those classifiers implementations (if I am wrong please correct me).
According to strange SO rules, I can't post it as comment, so be it.
Classifiers that you've listed need full data set at the time you train a model, so whenever new data comes in, you should combine it with previous data and retrain the model. What you are probably looking for is online machine learning. One of the very popular implementations is Vowpal Wabbit, it also has bindings to R.