Loading in multiple .rda files into a list in r - r

I have run various models (glm, rpart, earth etc) and exported the model object from each respective one into a folder on my computer. So I now have a folder with ~60 different models stored as seperate .rda files.
This was done by creating a model function and then applying it to a list of model types through the purrr map package (to avoid errors and termination).
I now want to load them back into r and compare them. Unfortunatley when I wrote my intial model script each model is stored as the same ie "Model.Object" (I didnt know how to do otherwise) so when I try to load each one individually into r it just overides each other. Each file is saved as glm.rda, rpart.rda, earth.rda etc but the model within is labelled Model.Object (for clarification).
So I guess I have a few questions;
1. It is possible to load in multiple .rda files into r into a list that can then be indexed
2. How to alter the model function that has been applied so that the 'model.object' name reads as the model type (e.g. glm, rpart etc)
Code:
Model.Function = function(Model.Type){
set.seed(0)
Model.Output = train(x = Pred.Vars.RVC.Data, y = RVC, trControl = Tcontrolparam,
preProcess = Preprocessing.Options, tuneLength = 1, metric = "RMSE",
method = Model.Type)
save(Model.Object, file = paste("./RVC Models/",Model.Type,".rda", sep = ""))
return(Model.Object)
}
Possibly.Model.Function = possibly(Model.Function, otherwise = "something wrong here")
result.possible = map(c("glm","rpart","earth"), Possibly.Model.Function)

For now, a rescue operation of your existing files might look something like this (following #nicola's comment about using the envir argument to load()):
rda2list <- function(file) {
e <- new.env()
load(file, envir = e)
as.list(e)
}
folder <- "./RVC Models"
files <- list.files(folder, pattern = ".rda$")
models <- Map(rda2list, file.path(folder, files))
names(models) <- tools::file_path_sans_ext(files)
Going forward, it would be easier to save your models as .Rds files with saveRDS() rather than using save(). Then reassignment is easy upon loading the file. See e.g. this question and answer for more details on the matter.

Related

Where to put a sample(, J) function in R (inside or outside?) a regsubsets() which is the FUN in an lapply() so it only runs J possible models?

The end goal here is to run a random sample (without replacement) of J different possible regression models rather than all 2^k - 1 possible models as in a traditional All Subsets Regression aka Best Subset Regression (also sometimes called Exhaustive Regression) on each of I different csv file formatted datasets all located within the same file folder.
Here is my code (it is in my GitHub Repository for this project, it is called 'EER script'):
# Load all libraries needed for this script.
# The library specifically needed to run a basic ASR is the 'leaps' library.
library(dplyr)
library(tidyverse)
library(stats)
library(ggplot2)
library(lattice)
library(caret)
library(leaps)
library(purrr)
directory_path <- "~/DAEN_698/sample obs"
filepath_list <- list.files(path = directory_path, full.names = TRUE, recursive = TRUE)
# reformat the names of each of the csv file formatted datasets
DS_names_list <- basename(filepath_list)
DS_names_list <- tools::file_path_sans_ext(DS_names_list)
datasets <- lapply(filepath_list, read.csv)
# code to run a normal All Subsets Regression
ASR_fits <- lapply(datasets, function(i)
regsubsets(x = as.matrix(select(i, starts_with("X"))),
y = i$Y, data = i, nvmax = 15,
intercept = TRUE, method = "exhaustive"))
ASR_fits_summary <- summary(ASR_fits)
This is the part I am completely stuck, I got the above to run and the ASR_fits_summary object is a list with I elements, all of the class 'regsubsets' which is exactly what I was hoping for, but that is still just a list of the estimates made by a traditional ASR, what I need to figure out is where I should insert a sample(, J) function within this lapply function so that each regsubsets chooses the optimal model out of just J randomly selected models from the 2k - 1 possible models to made it computational feasible.
I am guessing I will have to either nest another lapply within my current lapply function, or write a simple custom function that takes J random samples without replacement, but I just don't know at what step to put it.

How do I save my model to use in another project in mlr3?

I would like to divide my working pipeline in 2:
One place (internal) where to benchmark and auto-tune the alrithms to select the final model.
Apply the selected models to new datasets (external).
For the second part, I will need to somehow save the resulting model object to later use
model$predict_newdata() and transporting it without needing to re-train the algorithm and taking with it the original training data.
The idea is synthesized with the following error:
library("mlr3")
task = tsk("iris")
learner = lrn("classif.rpart")
learner$train(task, row_ids = 1:120)
predictions = learner$predict(task, row_ids = 121:150)
predictions
So far so good, but now I have to save this model into an object outside the R Session, but of course, this won't work:
store_model = learner$model
save(store_model, 'model_rpart.RData')
The solution is to save the whole object as an .rds object as Brian suggested.
saveRDS(learner, 'learner_rpart.rds')
model <- readRDS('learner_rpart.rds')
predictions = model$predict(task, row_ids = 121:150)
predictions$confusion

Create list of predefined S3 objects

I am busy with comparing different machine learning techniques in R. This is the case: I made several functions that, in an automated way are able to create each a different prediction model (e.g: logistic regression, random forest, neural network, hybrid ensemble , etc.) , predictions, confusion matrices, several statistics (e.g AUC and Fscore) ,and different plots.
I managed to create an S3 object that is able to store the required data.
However, when I try to create a list of my defined object, this fails and all data is stored sequentially in 1 big list.
This is my S3 object (as this is the first time that I create S3, I am not really sure that the code is a 100% correct):
modelObject <- function(modelName , modelObject, modelPredictions , rocCurve , aUC , confusionMatrix )
{
modelObject <- list(
model.name = modelName,
model.object = modelObject,
model.predictions = modelPredictions,
roc.curve = rocCurve,
roc.auc = aUC,
confusion.matrix = confusionMatrix
)
## Set the name for the class
class(modelObject) <- "modelObject"
return(modelObject)
}
at the end of each machine learning function, I define and return the object:
shortened example:
NeuralNetworkAnalysis<- function() {
#I removed the unnecessary code, as only the end of the code is relevant
nn.model <- modelObject(modelName = "Neural.Network" , modelObject = NN , modelPredictions = predNN , rocCurve = roc , aUC = auc , confusionMatrix = confu )
return(nn.model)
}
At last, in my 'script' function, I create an empty list and try to append the different objects
#function header and arguments before this part are irrelevant
# Build predictive model(s)
modelList = list("model" = modelObject)
modelList <- append(modelList , NeuralNetworkAnalysis())
modelList <- append(modelList, RandomForestAnalysis())
mod <<- RandomForestAnalysis() #this is to test what the outcome is when I do not put it in a list
return(modelList) } #end of the function ModelBuilding
models <- ModelBuilding( '01/01/2013' , '01/01/2014' , '02/01/2014' , '02/01/2015' )
Now, when I take a look at the models list, I don't have a list of objects, I just have a list with all the data of each algorithm.
class(models) [1] "list"
class(mod) [1] "modelObject"
How can i fix this problem, so that i can have a list that contains for example:
list$random.forest$variable.I.want.to.access (most favorable)
or
list[i]$variable.of.random.forest.that.I.want.to.access
thx in advance!
Olivier
Not sure if I understand correctly, but maybe the issue is only how your model list is built. If you try
modelList[["neural.network"]] <- NeuralNetworkAnalysis()
modelList[["random.forest"]] <- RandomForestAnalysis()
etc., does that give you the access methods you are looking for?

Random forest object not loading

I am saving two random forest objects as a rda files. When I load them- One loads as character and the other loads as randomForest object! Can someone explain this?
Here is my code snippet :
fit1 <- load("rfModel_pw2.rda")
fit2 <- load("rfModel_pw3.rda")
Pred1 <- predict(get(fit1), test, "prob")
#Error in get(fit1) : invalid first argument
Pred2 <- predict(get(fit2), test, "prob")
class(fit1)
#[1] "randomForest.formula" "randomForest"
> class(fit2)
#[1] "character"
load() places loaded objects from .rda file in global environment automatically and returns only the character names of loaded objects. Instead of using get([name]) simply use the same object-name before saving and after loading, as in example. Otherwise if you like the loader function to return loaded object, you can replace load() / save() with saveRDS() / readRDS().
library(randomForest)
X = replicate(2,rnorm(1000))
y = apply(X,1,sum)
rf = randomForest(X,y)
save(rf,file="./rf.rda")
rm(list=ls())
load(file="./rf.rda") #object is restored in global enviroment by former name
predict(rf,replicate(2,rnorm(1000)))

Saving jags.model in RData file

Is there any way to save a jags.model() object into a RData or txt file ?
To perform MCMC on a better computer, I have to save my model on one and using it in a new workspace. But I've some difficulty to use "save()" and "load()" function on R.
Thanks for your advices.
Added:
I tried:
jags <- jags.model('regression.bug',
data = my.data,
n.chains = 4,
n.adapt = 1000)
Then I would like save "jags"
save( jags , file="jags.RData")
It's look like if it's saved. But, when I try:
ld.jags <- load( "jags.RData" )
ld.jags
[1] "jags"
And I don't know How I could use "ld.jags" to perform my analysis.

Resources