Random forest object not loading - r

I am saving two random forest objects as a rda files. When I load them- One loads as character and the other loads as randomForest object! Can someone explain this?
Here is my code snippet :
fit1 <- load("rfModel_pw2.rda")
fit2 <- load("rfModel_pw3.rda")
Pred1 <- predict(get(fit1), test, "prob")
#Error in get(fit1) : invalid first argument
Pred2 <- predict(get(fit2), test, "prob")
class(fit1)
#[1] "randomForest.formula" "randomForest"
> class(fit2)
#[1] "character"

load() places loaded objects from .rda file in global environment automatically and returns only the character names of loaded objects. Instead of using get([name]) simply use the same object-name before saving and after loading, as in example. Otherwise if you like the loader function to return loaded object, you can replace load() / save() with saveRDS() / readRDS().
library(randomForest)
X = replicate(2,rnorm(1000))
y = apply(X,1,sum)
rf = randomForest(X,y)
save(rf,file="./rf.rda")
rm(list=ls())
load(file="./rf.rda") #object is restored in global enviroment by former name
predict(rf,replicate(2,rnorm(1000)))

Related

Error in class(x) while creating panel data using plm function

I'm trying to create a Panel data using the plm function for pooling a model from a balanced Panel data that I imported from Excel.
When I run the code I get the following error:
Error in class(x) <- setdiff(class(x), "pseries") : invalid to set
the class to matrix unless the dimension attribute is of length 2 (was
0)
library(plm)
library(readxl)
library(tidyr)
library(rJava)
library(xlsx)
library(xlsxjars)
all_met<- read_excel("data.xlsx", sheet = "all_met")
attach(all_met)
Y_all_met <- cbind(methane)
X_all_met <- cbind(gdp, ecogr, trade)
pdata_all_met <- plm.data(all_met, index=c("id","time"))
pooling_all_met <- plm(Y_all_met ~ X_all_met, data=pdata_all_met, model= "pooling")
After running the code I was supposed to get summary statistics of a pooled ols regression of my data. Can someone tell me how I can fix this issue? Thanks in advance.
1st:
Avoid plm.data and use pdata.frame instead:
pdata_all_met <- pdata.frame(all_met, index=c("id","time"))
If plm.data does not give you a deprecation warning, use a newer version of the package.
2nd (and addressing the question):
Specify the column names in the formula, not the variables from the global environment if you use the data argument of plm, i.e., try this:
plm(methane ~ gdp + ecogr + trade, data=pdata_all_met, model= "pooling")
check in the structure of your data if variables used in the regression are declared as factor, you can do that by typing: str(all_met).
if yes, then you should declare it as double, or as numeric, (try not to use as.numeric() function, it could change values in your data).
personally i've changed that by the next specification in the import code:
data <- read_csv("C:/Users/Uness/Desktop/Mydata.csv",
col_types = cols(variable1 = col_double(),
variable2 = col_double()))
View(data)
where variable1 and variable2 are the names of the variables I use, make sure you change that if you copy the code ;)

Loading in multiple .rda files into a list in r

I have run various models (glm, rpart, earth etc) and exported the model object from each respective one into a folder on my computer. So I now have a folder with ~60 different models stored as seperate .rda files.
This was done by creating a model function and then applying it to a list of model types through the purrr map package (to avoid errors and termination).
I now want to load them back into r and compare them. Unfortunatley when I wrote my intial model script each model is stored as the same ie "Model.Object" (I didnt know how to do otherwise) so when I try to load each one individually into r it just overides each other. Each file is saved as glm.rda, rpart.rda, earth.rda etc but the model within is labelled Model.Object (for clarification).
So I guess I have a few questions;
1. It is possible to load in multiple .rda files into r into a list that can then be indexed
2. How to alter the model function that has been applied so that the 'model.object' name reads as the model type (e.g. glm, rpart etc)
Code:
Model.Function = function(Model.Type){
set.seed(0)
Model.Output = train(x = Pred.Vars.RVC.Data, y = RVC, trControl = Tcontrolparam,
preProcess = Preprocessing.Options, tuneLength = 1, metric = "RMSE",
method = Model.Type)
save(Model.Object, file = paste("./RVC Models/",Model.Type,".rda", sep = ""))
return(Model.Object)
}
Possibly.Model.Function = possibly(Model.Function, otherwise = "something wrong here")
result.possible = map(c("glm","rpart","earth"), Possibly.Model.Function)
For now, a rescue operation of your existing files might look something like this (following #nicola's comment about using the envir argument to load()):
rda2list <- function(file) {
e <- new.env()
load(file, envir = e)
as.list(e)
}
folder <- "./RVC Models"
files <- list.files(folder, pattern = ".rda$")
models <- Map(rda2list, file.path(folder, files))
names(models) <- tools::file_path_sans_ext(files)
Going forward, it would be easier to save your models as .Rds files with saveRDS() rather than using save(). Then reassignment is easy upon loading the file. See e.g. this question and answer for more details on the matter.

How to correctly `dput` a fitted linear model (by `lm`) to an ASCII file and recreate it later?

I want to persist a lm object to a file and reload it into another program. I know I can do this by writing/reading a binary file via saveRDS/readRDS, but I'd like to have an ASCII file instead of a binary file. At a more general level, I'd like to know why my idioms for reading in dput output in general is not behaving as I'd expect.
Below are examples of making a simple fit, and successful and unsuccessful recreations of the model:
dat_train <- data.frame(x=1:4, z=c(1, 2.1, 2.9, 4))
fit <- lm(z ~ x, dat_train)
rm(dat_train) # Just to make sure fit is not dependent upon `dat_train existence`
dat_score <- data.frame(x=c(1.5, 3.5))
## This works (of course)
predict(fit, dat_score)
# 1 2
# 1.52 3.48
Saving to binary file works:
## http://stackoverflow.com/questions/5118074/reusing-a-model-built-in-r
saveRDS(fit, "model.RDS")
fit2 <- readRDS("model.RDS")
predict(fit2, dat_score)
# 1 2
# 1.52 3.48
So does this (dput it in the R session not to a file):
fit2 <- eval(dput(fit))
predict(fit2, dat_score)
# 1 2
# 1.52 3.48
But if I persist file to disk, I cannot figure out how to get back into normal shape:
dput(fit, file = "model.R")
fit3 <- source("model.R")$value
# Error in is.data.frame(data): object 'dat_train' not found
predict(fit3, dat_score)
# Error in predict(fit3, dat_score): object 'fit3' not found
Trying to be explicit with the eval does not work either:
## http://stackoverflow.com/questions/9068397/import-text-file-as-single-character-string
dput(fit, file="model.R")
fit4 <- eval(parse(text=paste(readLines("model.R"), collapse=" ")))
# Error in is.data.frame(data): object 'dat_train' not found
predict(fit4, dat_score)
# Error in predict(fit4, dat_score): object 'fit4' not found
In both cases above, I expect fit3 and fit4 to both work, but they don't recompile into a lm object that I can use with predict().
Can anyone advise me on how I can persist a model to a file with a structure(...) ASCII-like structure, and then re-read it back in as a lm object I can use in predict()? And why my current methods are not working?
Step 1:
You need to control de-parsing options:
dput(fit, control = c("quoteExpressions", "showAttributes"), file = "model.R")
You can read more on all possible options in ?.deparseOpts.
The "quoteExpressions" wraps all calls / expressions / languages with quote, so that they are not evaluated when you later re-parse it. Note:
source is doing parsing;
call field in your fitted "lm" object is a call:
fit$call
# lm(formula = z ~ x, data = dat_train)
So, without "quoteExpressions", R will try to evaluate lm call during parsing. And if we evaluate it, it is fitting a linear model, and R will aim to find dat_train, which will not exist in your new R session.
The "showAttributes" is another mandatory option, as "lm" object has class attributes. You certainly don't want to discard all class attributes and only export a plain "list" object, right? Moreover, many elements in a "lm" object, like model (the model frame), qr (the compact QR matrix) and terms (terms info), etc all have attributes. You want to keep them all.
If you don't set control, the default setting with:
control = c("keepNA", "keepInteger", "showAttributes")
will be used. As you can see, there is no "quoteExpressions", so you will get into trouble.
You can also specify "keepInteger" and "keepNA", but I don't see the need for "lm" object.
------
Step 2:
The above step will get source working correctly. You can recover your model:
fit1 <- source("model.R")$value
However, it is not yet ready for generic functions like summary and predict to work. Why?
The critical issue is the terms object in fit1 is not really a "terms" object, but only a formula (it is even not a formula, but only a "language" object without "formula" class!). Just compare fit$terms and fit1$terms, and you will see the difference. Don't be surprised; we've set "quoteExpressions" earlier. While that is definitely helpful to prevent evaluation of call, it has side-effect for terms. So we need to reconstruct terms as best as we can.
Fortunately, it is sufficient to do:
fit1$terms <- terms.formula(fit1$terms)
Though this still does not recover all information in fit$terms (like variable classes are missing), it is readily a valid "terms" object.
Why is a "terms" object critical? Because all generic functions rely on it. You may not need to know more on this, as it is really technical, so I will stop here.
Once this is done, we can successfully use predict (and summary, too):
predict(fit1) ## no `newdata` given, using model frame `fit1$model`
# 1 2 3 4
#1.03 2.01 2.99 3.97
predict(fit1, dat_score) ## with `newdata`
# 1 2
#1.52 3.48
-------
Conclusion remark:
Although I have shown you how to get things work, I don't really recommend you doing this in general. An "lm" object will be pretty large when you fit a model to a large dataset, for example, residuals, fitted.values are long vectors, and qr and model are huge matrices / data frames. So think about this.
This is an important update!
As mentioned in the previous answer, the most challenging bit is to recover $terms as best as we can. The suggested method using terms.formula works for OP's example, but not for the following with bs() and poly():
dat <- data.frame(x1 = runif(20), x2 = runif(20), x3 = runif(20), y = rnorm(20))
library(splines)
fit <- lm(y ~ bs(x1, df = 3) + poly(x2, degree = 3) + x3, data = dat)
rm(dat)
If we follow the previous answer:
dput(fit, control = c("quoteExpressions", "showAttributes"), file = "model.R")
fit1 <- source("model.R")$value
fit1$terms <- terms.formula(fit1$terms)
We will see that summary.lm and anova.lm work correctly, but not predict.lm:
predict(fit1, newdata = data.frame(x1 = 0.5, x2 = 0.5, x3 = 0.5))
Error in bs(x1, df = 3) : could not find function "bs"
This is because ".Environment" attribute of $terms is missing. We need
environment(fit1$terms) <- .GlobalEnv
Now run above predict again we see a different error:
Error in poly(x2, degree = 3) :
'degree' must be less than number of unique points
This is because we are missing "predvars" attributes for safe / correct prediction of bs() and poly().
A remedy is that we need to dput such special attribute additionally:
dput(attr(fit$terms, "predvars"), control = "quoteExpressions", file = "predvars.R")
then read and add it
attr(fit1$terms, "predvars") <- source("predvars.R")$value
Now running predict works correctly.
Note that "dataClass" attribute of $terms is also missing, but this does not seem to cause any problem for any generic functions.

Using jags.parallel from within a function (R language Error in get(name, envir = envir) : object 'y' not found)

Using jags.parallel from the command line or a script works fine. I can run this modified example from http://www.inside-r.org/packages/cran/R2jags/docs/jags just fine
# An example model file is given in:
model.file <- system.file(package="R2jags", "model", "schools.txt")
#=================#
# initialization #
#=================#
# data
J <- 8.0
y <- c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2)
sd <- c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6)
jags.data <- list("y","sd","J")
jags.params <- c("mu","sigma","theta")
jags.inits <- function(){
list("mu"=rnorm(1),"sigma"=runif(1),"theta"=rnorm(J))
}
#===============================#
# RUN jags and postprocessing #
#===============================#
# jagsfit <- jags(data=jags.data, inits=jags.inits, jags.params,
# n.iter=5000, model.file=model.file)
# Run jags parallely, no progress bar. R may be frozen for a while,
# Be patient. Currenlty update afterward does not run parallelly
print("Running Parallel")
jagsfit <- jags.parallel(data=jags.data, inits=jags.inits, jags.params,
n.iter=5000, model.file=model.file)
However if I wrap it in a function
testparallel <- functions(out){
# An example model file is given in:
.
.
.
jagsfit <- jags.parallel(data=jags.data, inits=jags.inits, jags.params,
n.iter=5000, model.file=model.file)
print(out)
return(jagsfit)
}
Then I get the error:
Error in get(name, envir = envir) : object 'y' not found
Based on what I found here I know that it is an issue with the environment exported to the cluster and I have fixed it by changing
J <- 8.0
y <- c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2)
sd <- c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6)
to
assign("J",8.0,envir=globalenv())
assign("y",c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2),envir=globalenv())
assign("sd",c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6),envir=globalenv())
Is there a better way to get around this?
Thank you,
Greg
P.S.
I am working on this code for someone else so I don't really want to changes things in the R2jags package to let me pass in the environment to export though I plan on suggesting it to the authors of the package.
So I have contacted the author of R2jags and he has added an addition argument to jags.parallel that lets you pass envir, which is then past onto clusterExport.
This works well except it allows clashes between the name of my data and variables in the jags.parallel function.
if you use intensively JAGS in parrallel I can suggest you to look the package rjags combined with the package dclone. I think dclone is realy powerfull because the syntaxe was exactly the same as rjags.
I have never see your problem with this package.
If you want to use R2jags I think you need to pass your variables and your init function to the workers with the function:
clusterExport(cl, list("jags.data", "jags.params", "jags.inits"))
Without changing the code of R2jags, you can still assign those data variables to the global environment in an easier way by using list2env.
Obviously, there is is a concern that those variable names could be overwritten in the global environment, but you probably can control for that.
Below is the same code as the example given in the original post except I put the data into a list and sent that list's data into the global environment using the list2env function. (Also I took out the unused "out" variable in the function.) This currently runs fine for me; you may have to add more chains and/or add more iterations to see the parallelism in action, though.
testparallel <- function(){
library(R2jags)
model.file <- system.file(package="R2jags", "model", "schools.txt")
# Make a list of the data with named items.
jags.data.v2 <- list(
J=8.0,
y=c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2),
sd=c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6) )
# Store all that data explicitly in the globalenv() as
# was previosly suggesting using the assign(...) function.
# This will do that for you.
# Now R2jags will have access to the data without you having
# to explicitly "assign" each to the globalenv.
list2env( jags.data.v2, envir=globalenv() )
jags.params <- c("mu","sigma","theta")
jags.inits <- function(){
list("mu"=rnorm(1),"sigma"=runif(1),"theta"=rnorm(J))
}
jagsfit <- jags.parallel(
data=names(jags.data.v2),
inits=jags.inits,
jags.params,
n.iter=5000,
model.file=model.file)
return(jagsfit)
}

Using predict in a function call with NLME objects and a formula

I have a problem with the package NLME using the following code:
library(nlme)
x <- rnorm(100)
z <- rep(c("a","b"),each=50)
y <- rnorm(100)
test.data <- data.frame(x,y,z)
test.fun <- function(test.dat)
{
form <- as.formula("y~x")
ran.form <- as.formula("~1|z")
modell <- lme(fixed = form, random=ran.form, data=test.dat)
pseudo.newdata <- test.dat[1,]
predict(modell, newdata= pseudo.newdata) ###THIS CAUSES THE ERROR!
}
test.fun(test.data)
The predict causes an error and I already found what basically causes it.
The modell object saves how it was called and predict seems to use that to make prediction but is unable to find the formula objects form and ran.form becauses it does not look for them in the right namespace. In fact, I can avoid the problem by doing this:
attach(environment(form), warn.conflicts = FALSE)
predict(modell, newdata= pseudo.newdata)
detach()
My long term goal however is to save the modell to disk and use them later. I suppose I could try saving the formula objects as well, but this strikes me as a very annoying and cumbersome way to deal with the problem.
I work with automatically generated formula objects instead of writing them down explicitly because I create many models with different definitions in a sort of batch process so I can not avoid them. So my ideal solution would be a way to create the lme object so that I can forget about the formula object afterwards and predict "just works". Thanks for any help.
Try replacing lme(arg1, arg2, arg3) with do.call(lme, list(arg1, arg2, arg3)).
library(nlme)
x <- rnorm(100)
z <- rep(c("a","b"),each=50)
y <- rnorm(100)
test.data <- data.frame(x,y,z)
test.fun <- function(test.dat)
{
form <- as.formula("y~x")
ran.form <- as.formula("~1|z")
## JUST NEED TO CHANGE THE FOLLOWING LINE
## modell <- lme(fixed = form, random=ran.form, data=test.dat)
modell <- do.call(lme, list(fixed=form, random=ran.form, data=test.data))
pseudo.newdata <- test.dat[1,]
predict(modell, newdata= pseudo.newdata) ###THIS CAUSES THE ERROR!
}
test.fun(test.data)
# a
# 0.07547742
# attr(,"label")
# [1] "Predicted values"
This works because do.call() evaluates its argument list in the calling frame, before evaluating the call to lme() that it constructs. To see why that helps, type debug(predict), and then run your code and mine, comparing the debugging messages printed when you are popped into the browser.

Resources