An error while trying to use glm model for prediction on another computer - r

I would like to save a glm object in one R machine and use it for prediction on another data set located on another machine that has a newer data.I try to use save and load but with no success.What am I doing wrong?
Here is a toy example:
# on machine 1:
glm<-glm(y~x1+x2,data=dat1, family=binomial(link="logit")
save(glm,file="glm.Rdata") # the file is stored in a folder.
# on machine 2:
load(glm.RData) # got an error:"Error in load(glm.RData) : object 'glm.RData' not found"
#I tried :
load(file='glm.RData') # no error was displayed
print(glm) # got an error:"Error in load(glm.RData) : object 'glm.RData' not found"
Any help will be great.

As per #user3710546's advice, I would avoid saving your model using the name glm, as it'll mask (ie. block) the glm() function, making it difficult for you to use it in your session.
Using save() and load()
save() is generally used to save a list of objects to a file, rather than a single object. The first argument to save() is list, 'A character vector containing the names of objects to be saved.' (Emphasis mine.) So you'd want to use it like this:
# On machine 1:
save(list = 'glm', file = '/path/to/glm.RData')
# On machine 2:
load(file = '/path/to/glm.RData')
Note that the file extensions are often case-sensitive: you saved to a file with the extension .RData but loaded from one with the extension .Rdata, which is different. This may explain why the file isn't found.
Using saveRDS() and readRDS()
An alternative to using save() and load is to use saveRDS() and readRDS(), which are designed to be used with one object. They're used slightly differently:
# On machine 1
saveRDS(glm, file = '/path/to/glm.rds')
# On machine 2
glm = readRDS(file = '/path/to/glm.rds')
Note the .rds file extension and the fact that readRDS() isn't automatically put in the environment (it needs to be assigned to something).
Saving parts of a GLM
If you just want the formula saved—that is, the actual text string—you can find it in glm$formula, where glm is the name of your object. It comes back as a formula object, but you can convert it to a string with as.character(glm$formula), to then be written to a text file or whatever.
If, however, you want the model itself without the dataset it was created from (to cut down on disk space), have a look at this article, which discusses which parts of a glm object can be safely deleted.

Related

How to I create a data frame from inbuilt data set 'iris'?

I am a beginner at using Rstudio and have been working through the exercises outlined as part of our course notes.
We are to work with the 'iris' dataset however I haven't been able to successfully save it as a valid data.frame. The best I have done is created an empty dataframe in the global environment with 0 obs. of 0 variables.
Here is some of the codes I have worked through and the outputs. I am very new to R and am struggling a little with using inbuilt data sets in terms of loading and using - I am ok with importing and creating however.
data()
> View(iris)
> iris<-write.csv(iris)
""
> iris
NULL
> str(iris)
NULL
> iris<-data.frame(iris)
> iris<-read.csv(iris.csv)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
object 'iris.csv' not found
> library(datasets)
> data.frame(iris)
data frame with 0 columns and 0 rows
I have tried > write.csv(iris, 'iris.csv') # no luck
First check if iris is already a data.frame by running the following command:-
is.data.frame(iris)
If the answer is TRUE, then run in the following command to write it to a .csv file:-
write.csv(iris, "/location/at/which/you/want/to/save/the/file)
If one wants to save objects as R objects, one has to use save() and the file extension has to be .RData. Like in you case you can run the following command:-
save(iris, file = '/location/iris.RData'
And you can load an .RData file with the load() function in R. In your case it could be :-
load('/location/iris.RData')
Some mistakes that you've made:-
In your second code line where you run
> iris<-write.csv(iris)
you've just provided write.csv with it's first argument called x, but never specified the second argument it requires which is file. And also, one never assign write.csv() function with it's arguments to an object with the help of <- because write.csv() is a function which does not returns a value or an object. Another example of a function like write.csv() could be library().
So the way you code flows is you worte a wrong syntax for by running the following line
> iris<-write.csv(iris)
and hence, you got a NULL object. And the str of a NULL object is itself NULL.
Then you created a data.frame objects by passing iris as a data object, but since earlier iris became a NULL object, data.frame of a NULL object is NULL. Since there never was an iris.csv file written, R won't be able to read it too.
Also in your read.csv() function, you passed the file argument as a data object and not as a path. This is why you got the error object 'iris.csv' not found and not as cannot open file 'iris.csv': No such file or directory. To pass it as a path you should always put the location of your file in quotes, either single or double.
If you ever you don't understand how you have to pass objects in a function, please run the command ?function_name, for example ?write.csv(), ?library, ?read.csv. This will provide you with documentation on the function. It will also provide you with usage examples.
I hope this helps.

rxDataStep in RevoScaleR package crashing

I am trying to create a new factor column on an .xdf data set with the rxDataStep function in RevoScaleR:
rxDataStep(nyc_lab1
, nyc_lab1
, transforms = list(RatecodeID_desc = factor(RatecodeID, levels=RatecodeID_Levels, labels=RatecodeID_Labels))
, overwrite=T
)
where nyc_lab1 is a pointer to a .xdf file. I know that the file is fine because I imported it into a data table and successfully created a the new factor column.
However, I get the following error message:
Error in doTryCatch(return(expr), name, parentenv, handler) :
ERROR: The sample data set for the analysis has no variables.
What could be wrong?
First, RevoScaleR has some warts when it comes to replacing data. In particular, overwriting the input file with the output can sometimes causes rxDataStep to fail for unknown reasons.
Even if it works, you probably shouldn't do it anyway. If there is a mistake in your code, you risk destroying your data. Instead, write to a new file each time, and only delete the old file once you've verified you no longer need it.
Second, any object you reference that isn't part of the dataset itself, has to be passed in via the transformObjects argument. See ?rxTransform. Basically the rx* functions are meant to be portable to distributed computing contexts, where the R session that runs the code isn't be the same as your local session. In this scenario, you can't assume that objects in your global environment will exist in the session where the code executes.
Try something like this:
nyc_lab2 <- RxXdfData("nyc_lab2.xdf")
nyc_lab2 <- rxDataStep(nyc_lab1, nyc_lab2,
transforms=list(
RatecodeID_desc=factor(RatecodeID, levels=.levs, labels=.labs)
),
rxTransformObjects=list(
.levs=RatecodeID_Levels,
.labs=RatecodeID_Labels
)
)
Or, you could use dplyrXdf which will handle all this file management business for you:
nyc_lab2 <- nyc_lab1 %>% factorise(RatecodeID)

R: How far does it go? (Plus venting)

I have an object called defaultPacks, containing the names of packages installed on all the computers I use. Much abbreviated:
defaultPacks <- c(
"AER",
"plyr",
"dplyr"
)
I want to save this object to file in a shared directory all of them can reach. I am using Dropbox for this, with sync always paused when R is running.
save(defaultPacks,
file.path("C:","Users","andrewH","Dropbox","R_PROJ","sharedSettings.rdata"))
Then I want to load the object and install the packages the names of which are in the object defaultPacks.
SyncPacks <- function(fileString){
defaultPacks <- load(file=fileString)
install.packages(defaultPacks, repos="http://cran.us.r-project.org")
}
SyncPacks(file.path("C:","Users","andrewH","Dropbox","R_PROJ","sharedSettings.rdata")
If I do this, I get a warning:
Warning in install.packages: package ‘defaultPacks’ is not available (for R version 3.2.1)
I look what is in defaultPacks immediately after I load and assign it: the string "defaultPacks". So it seems to loading just be a string rather than an object.
So I go back to my save, and try
save(get(defaultPacks), file.path(etc.))
This gives me an different error:
Error in save(get("defaultPacks"), file = file.path("C:", "Users", "andrewH", :
object ‘get("defaultPacks")’ not found.
Then I tried dynGet() with the same result.
So where before it was treating a symbol as a string, now it is treating a function as a string.
So I try the list option for save:
save(list = defaultPacks, file = file.path(etc))
And get yet another error:
Error in save(list = defaultPacks, file = file.path("C:", "Users", "andrewH", :
objects ‘AER’, ‘plyr’, ‘dplyr’, (etc.) not found
So where before I couldn't get to my character vector, now I am shooting right past it, evaluating defaultPacks to find the strings, and then treating each string as a symbol, and evaluating it to its (nonexistent) object.
So, I want to know how to make this work. But I am asking for something more than that. I have this problem, or an analogous problem, all the time. After several years of using R, I still have it a couple of times a week. I don't know how many steps of evaluation R is going to take on any given occasion. I hand a function an object name, and the function treats it as a string. I hand a function a string, and the R function converts it to a symbol and tries to evaluate it. Here, I don't understand why the save function does not save the object I gave it, and then give it back with load.
I've read the discussions on scoping in ten different R books, from Chambers "Software for Data Analysis" to Wickham's "Advanced R." Twice. Four times in some cases. I know about the four environments of a function, and the difference between the call stack and the chain of environmental parents. And yet, it is clear that I am missing something basic. It is not just that I don't know why save does not take a name in its ... argument and save it as an object (unless the problem is at the load end). I don't know how I can know. The function description says, of the ...s, "the names of the objects to be saved (as symbols or character strings)." So why is it saving a name as a string? Or why is load returning a string, if save saved an object? And how could I predict that?
Experienced R programmers, I know you can tell in advance how a given R function is going to treat one of its arguments. You know how far it will be evaluated. You can make it go as far as you want it to, and then STOP. You don't have to write str()'s into your functions every time you want to figure out what the heck it thinks its arguments mean. How do you do it?
Bloody "R Inferno". It's an understatement.
One way of seeing the problem is to note that the value of defaultPacks changes from before to after these operations.
> fname = tempfile()
> orig = defaultPacks = c("AER", "plyr", "dplyr")
> save(defaultPacks, file=fname)
> defaultPacks = load(fname)
> identical(orig, defaultPacks)
[1] FALSE
The problem starts with an understanding of what save() does. From ?save, the object that is saved is named defaultPacks and it has value c("AER", "plyr", "dplyr"). save() could save multiple objects, each with a name and associated value, so it somehow has to save the name of each object.
load() restores the objects that save() has written, and returns (from ?load) a "character vector of the names of objects created". In this case load() restores (creates in the global environment) the symbol defaultPacks, populates it with the character vector of default packages, and returns the name (i.e., character vector of length 1 "defaultPacks") of the object(s) it has restored. The return value then overwrites the restored value, and we have defaultPacks = "defaultPacks".
install.packages doesn't do anything fancy with it's first argument, which from ?install.packages is a "character vector of the names of packages whose current versions should be downloaded". The character vector happens to be the symbol defaultPacks, but the error comes from the value of the symbol, which is the character vector "defaultPacks".
save() and load() more or less have to work the way they do to support multiple objects. On the other hand saveRDS() and readRDS() (ok, why read instead of load?) have a contract to save a single object. The name of the saved object does not need to be stored to be able to recover the values associated with it. So saveRDS(defaultPacks, fname); defaultPacks = readRDS(fname) works, and in particular the value of defaultPacks before and after this series of operations remains unchanged.
> orig = defaultPacks = c("AER", "plyr", "dplyr")
> saveRDS(defaultPacks, fname)
> defaultPacks = readRDS(fname)
> identical(orig, defaultPacks)
[1] TRUE
Without meaning to be too much of a jerk, the answer to the question "Experienced R programmers...how do you do it?" the answer is implied by the ? above -- by carefully reading the manual. Also, there are not that many places in base R code where evaluation is non-standards -- formulas and library are the main culprits -- so recognizing what the problem is not can help to focus on what is actually going on.

Why does load(...) return character name of object instead of the object itself?

The svm model is created with the package e1071 in R. To use the model, I need to save it and read as needed. The package has write.svm, but does not have read.svm. If I use
model <- svm(x, y)
save(model, 'modelfile.rdata')
M <- load('modelfile.rdata')
object M contains just the word 'model'.
How to save the svm model and read back later, to apply to some new data?
Look at the return value for the function load in the help file:
Value:
A character vector of the names of objects created, invisibly.
So "model" is indeed the expected value of M. Your svm has been restored under its original name, which is model.
If you find it a bit confusing that load does not return the object loaded but instead restores it under the name used in saving it, consider using saveRDS and readRDS.
saveRDS(model, 'modelfile.rds')
M <- readRDS('modelfile.rds')
and M should contain your svm model.
I prefer saveRDS and readRDS because with them I know what objects I'm creating in my workspace - see the blog post of Gavin Simpson (linked in his answer) for a detailed discussion.
You misunderstand what load does. It restores an object to the same name it had when you save()d it. What you are seeing in M is the return value of the load() function. Calling load() has the additional side effect of loading the object back under the same name that it was saved with.
Consider:
require("e1071")
data(iris)
## classification mode
# default with factor response:
model <- svm (Species~., data=iris)
## Save it
save(model, file = "my-svm.RData")
## delete model
rm(model)
## load the model
M <- load("my-svm.RData")
Now look at the workspace
> ls()
[1] "iris" "M" "model"
Hence model was restored as a side effect of load().
From ?load we see the reason M contains the name of the objects created (and hence saved originally)
Value:
A character vector of the names of objects created, invisibly.
If you want to restore an object to a new name, use saveRDS() and readRDS():
saveRDS(model, "svm-model.rds")
newModel <- readRDS( "svm-model.rds")
ls()
> ls()
[1] "iris" "M" "model" "newModel"
If you want to know more about saveRDS() and readRDS() see the relevant help ?saveRDS() and you might be interested in a blog post I wrote on this topic.

saving and loading compressed R object

save(something, file="something.RData", compress="xz")
then when I load for reuse
load("something.RData")
print(something)
Error in print(something) : object 'something' not found
It is a random forest object.
Am I missing the unzip code?
This works at the console (where you have no parent environment), but not in a function because of the way load() uses environments (and will assign to the calling function).
Two simple alternatives:
Use saveRDS() and readRDS() for single objects.
Create an environment and use it as shown below.
Here is a short example of the second approach:
ne <- new.env()
load(somefile, ne) # now ls(ne) will show what was loaded
foo <- ne$something

Resources