I want to save within a function, using the input object's name as the file name
saveNew <- function(dat){
# Collect the original name
originalName <- deparse(substitute(dat))
#Do lots of Fun and Interesting Things!
#Now lets save it, First i have to get it
newToSave <- get(originalName, envir = .GlobalEnv)
save(newToSave, file = paste0(originalName, '.Rdata') )
}
But the problem is when i go to save it, it saves the newly created data as newToSave. This is apparent when loading this newly created object with
load('funData.Rdata') the object is no longer funData but now newToSave
How can i get this function to save it as, in the example below, funData, as well as load it as fundata, not newToSave.
Example:
funData <- sample(seq(1,1000,.01))
saveNew(funData)
load("funData.Rdata")
You can use assign to assign dat to originalName
saveNew <- function(dat){
# Collect the original name
originalName <- deparse(substitute(dat))
#Do lots of Fun and Interesting Things!
assign(originalName, dat)
save(list = originalName, file = paste0(originalName, '.Rdata') )
}
# Sample data
funData <- 1:10
# Save
saveNew(funData)
# Remove funData from the current environment
remove(funData)
# Load the RData object
load("funData.RData")
# Confirm that funData is in our current environment
funData
# [1] 1 2 3 4 5 6 7 8 9 10
Note that we need to use save with the list argument to enforce that save writes the value that has been assigned to originalName.
Disclaimer: This isn't really an answer, but as the OP wanted more clarification on the pros and cons of saveRDS, I thought I could put those under an answer. If you consider it should be deleted, please state so in a comment (before downvoting) and I'll be happy to withdraw it.
From ?saveRDS:
Details:
These functions provide the means to save a single R object to a connection (typically a file) and to restore the object, quite possibly under a different name. This differs from ‘save’ and ‘load’, which save and restore one or more named objects into an environment. They are widely used by R itself, for example to store metadata for a package and to store the ‘help.search’ databases: the ‘".rds"’ file extension is most often used.
saveRDS is specifically aimed at saving one object, while save can save one or more, but for me the main difference is that save and load bring back the object to life with the same name it had when saved, so one of its potential drawbacks is that it could rewrite an object already in the environment, whilst saveRDS and its companion readRDS can save and load objects to different objects.
From ?load:
Warning:
...
‘load()’ replaces all existing objects with the same names in the current environment (typically your workspace, ‘.GlobalEnv’) and hence potentially overwrites important data. It is considerably safer to use ‘envir = ’ to load into a different environment, or to ‘attach(file)’ which ‘load()’s into a new entry in the ‘search’ path.
Consider this:
save(iris, "save_file.rdat")
iris[1, 2] <- 20000 # implement a change to iris
load("save_file.rdat") # overwrites iris
saveRDS(iris, "my_file.RDS")
iris[1, 2] <- 20000 # introduce a change to iris
new_iris <- readRDS("my_file.RDS") # modified-iris is kept. New object is created
Related
I am a noob in R and a experience a lot of trouble with the following:
I have to read in over 200 datasets and I want to do this automatically. I wrote some code that works perfectly for Rdata extensions but if I try it for SAS-files it always blocks...
path= "road"
# I make a list of all the different paths of all the files in my folder
File_pathnames <- list.files (path= Road, pattern = "*.sas7bdat", full.names=T)
# I create an empty list
list.data<-list()
# I try to run a loop to load all the SAS files:
for (i in 1: length(File_pathnames))
{
list.data[[i]] <- read_sas(File_pathnames[i])
}
Problem: it does not load the tables into my global environment (when I used the rdata files I used the load function and all the data appeared in the global environment). How Can I solve this?
many thanks!
Actually, your data ARE in the global environment, as elements of list.data (check list.data[[1]], list.data[[2]], ...)
The issue you have is linked to the fact that load loads an object in the environment using the name it had when it was saved. As an example
x <- 10
save(x, file='tmp')
rm(x)
x
load('tmp')
x
save x and reload it, while read_sas only load the data that you have to assign to a variable.
If you want to assign specifically each data set, you have to define a name for each of them and assign the data. Your loop would look like
for (i in 1: 1: length(File_pathnames))
{
namei <- paste0("name",i)
data <- read_sas(File_pathnames[i])
assign(namei, data)
}
and your data would be stored in "name1", "name2", ...
You should the assign each SAS files read in File_pathnames[i] as an object named FilenamesS[i]. Try
for (i in 1: length(File_pathnames))
{
data <- read_sas(File_pathnames[i])
assign (FilenamesS[i], data)
}
I realize this is a pretty basic question, but I want to make sure I do it right, so I wanted to ask just to confirm. I have a vector in one project that I want to be able to use in another project, and I was wondering if there was a simple way to export the vector in a form that I can easily import it to another project.
The way that I figured out how to do it so far is to convert it to a df, then export the df as a csv, then import and unpack it to vector form, but that seems needlessly complicated. It's just a simple numeric vector.
There are a number of ways to read and write data/files in R. For reading, you may want to look at: read.table, read.csv, readLines, source, dget, load, unserialize, and readRDS. For writing, you will want to look write.table, writeLines, dump, dput, save, serialize, and saveRDS.
x <- 1:3
# [1] 1 2 3
save(x, file = "myvector.rda")
# Change x to prove a point.
x <- 4:6
x
# [1] 4 5 6
# Better yet, we could remove it entirely
rm(x)
x
# Error: object 'x' not found
# Now load what we saved to get us back to where we started.
load("myvector.rda")
x
# [1] 1 2 3
Alternatively, you can use saveRDS and readRDS -- best practice/convention is to use the .rds extension; note, however, that loading the object is slightly different as saveRDS does not save the object name:
saveRDS(x, file = "myvector_serialized.rds")
x <- readRDS("myvector_serialized.rds")
Finally, saveRDS is a lower-level function and therefore can only save one object a time. The traditional save approach allows you to save multiple objects at the same time, but can become a nightmare if you re-use the same names in different projects/files/scripts...
I agree that saveRDS is a good way to go, but I also recommend the save and save.image functions, which I will demonstrate below.
# save.image
x <- c(5,6,8)
y <- c(8,9,11)
save.image(file="~/vectors.Rdata") # saves all workspace objects
Or alternatively choose which objects you want to save
x <- c(5,6,8)
y <- c(8,9,11)
save(x, y, file="~/vectors.Rdata") # saves only the selected objects
One advantage to using .Rdata over .Rda (a minor one) is that you can click on the object in the file explorer (i.e. in windows) and it will be loaded into the R environment. This doesn't work with .Rda objects in say Rstudio on windows
I've got a function that has a list output. Every time I run it, I want to export the results with save. After a couple of runs I want to read the files in and compare the results. I do this, because I don't know how many tasks there will be, and maybe I'll use different computers to calculate each task. So how should I name the archived objects, so later I can read them all in?
My best guess would be to dynamically name the variables before saving, and keep track of the object names, but I've read everywhere that this is a big no-no.
So how should I approach this problem?
You might want to use the saveRDS and readRDS functions instead of save and load. The RDS version functions will save and read single objects without the attached name. You would create your object and save it to a file (using paste0 or sprintf to create unique names), then when processing the results you can read in one object at a time, or read several into a list to work with them.
You can use scope to hide the retrieved name inside a function, so first you might save a list to a file:
mybiglist <- list(fred=1, john='dum di dum', mary=3)
save(mybiglist, file='mybiglist1.RData')
Then you can load it back in through a function and give it whatever name you like be it inside another list or just a plain object:
# Use the fact that load returns the name of the object loaded
# and that scope will hide this object
myspecialload <- function(RD.fnam) {
return(eval(parse(text=load(RD.fnam))))
}
# now lets reload that file but put it in another object
mynewbiglist <- myspecialload('mybiglist1.RData')
mynewbiglist
$fred
[1] 1
$john
[1] "dum di dum"
$mary
[1] 3
Note that this is not really a generic 'use it anywhere' type function, as for an RData file with multiple objects it appears to return the last object saved... so best stick with one list object per file for now!
One time I was given several RData files, and they all had only one variable called x. In order to read all of them in my workspace, I loaded sequentially each the variable to its environment, and I used get() to read its value.
tenv <- new.env()
load("file_1.RData", envir = tenv)
ls(tenv) # x
myvar1 <- get(ls(tenv), tenv)
rm(tenv)
....
This code can be repeated for each file.
This question already has answers here:
What are the main differences between R data files?
(2 answers)
Closed 3 years ago.
I have made a dataframe based on a set of twitters in the following form:
rdmTweets <- userTimeline("rdatamining", n=200)
df <- do.call("rbind", lapply(rdmTweets, as.data.frame))
Now I am saving the data frame with save in this way:
save(df, file="data")
How I can load that saved data frame for future use? When I use:
df2 <- load("data")
and I apply dim(df2) it should return the quantity of tweets that data frame has, but it only shows 1.
As #mrdwab points out, save saves the names as well as the data/structure (and in fact can save a number of different R objects in a single file). There is another pair of storage functions that behave more as you expect. Try this:
saveRDS(df, file="mytweets.rds")
df2 <- readRDS("mytweets.rds")
These functions can only handle a single object at a time.
Another option is to save your data frame as a csv file. The benefit of this option is that it provides long term storage, i.e. you will (likely) be able to open your csv file on any platform in ten years time. With an RData file, you can only open it with R and I wouldn't like to bet money on opening it between versions.
To save the file as a csv, just use: read.csv and write.csv, so:
write.csv(df, file="out.csv", row.name=FALSE)
df = read.csv("out.csv", header=TRUE)
Gavin's comment below raised a couple of points:
The CSV route only works for tabular-style data.
Completely correct. But if you are saving a data frame (as the OP is), then your data is in tabular form.
With R you'll always have the
ability to fire up an old version to read the data and export if for
some reason they change save format and don't allow the old format to
be loaded by another function.
To play devil's adovacate, you could use this argument with Excel and save your data as an xls. However, saving your data in a csv format means we never need to worry about this.
R's file format is documented so one could reasonably easily
read the binary data in another system using that open info.
I completely agree - although "easily" is a bit strong. This is why saving as an RData file isn't such a big deal. But if you are saving tabular data, why not use a csv file?
For the record, there are some reasons for saving tabular data as an RData file. For example, the speed in reading/writing the file or file size.
save saves the name of the dataset as well as the data. Thus, you should not not assign a name to load("data") and you should be fine. In other words, simply use:
load("data")
and it will load an object named df (or whatever is contained in the file "data") into your current workspace.
I would suggest a more original name for your file though, and consider adding an extension to help you remember what your script files are, your data files are, and so on.
Work your way through this simple example:
rm(list = ls()) # Remove everything from your current workspace
ls() # Anything there? Nope.
# character(0)
a <- 1:10 # Create an object "a"
save(a, file="myData.Rdata") # Save object "a"
ls() # Anything there? Yep.
# [1] "a"
rm(a) # Remove "a" from your workspace
ls() # Anything there? Nope.
# character(0)
load("myData.Rdata") # Load your "myData.Rdata" file
ls() # Anything there? Yep. Object "a".
# [1] "a"
str(a) # Is "a" what we expect it to be? Yep.
# int [1:10] 1 2 3 4 5 6 7 8 9 10
a2 <- load("myData.Rdata") # What about your approach?
ls() # Now we have 2 objects
# [1] "a" "a2"
str(a2) # "a2" stores the object names from your data file.
# chr "a"
As you can see, save allows you to save and load multiple objects at once, which can be convenient when working on projects with multiple sets of data that you want to keep together.
On the other hand, saveRDS (from the accepted answer) only lets you save single objects. In some ways, this is more "transparent" since load() doesn't let you preview the contents of the file without first loading it.
I am repeatedly applying a function to read and process a bunch of csv files. Each time it runs, the function creates a data frame (this.csv.data) and uses save() to write it to a .RData file with a unique name. Problem is, later when I read these .RData files using load(), the loaded variable names are not unique, because each one loads with the name this.csv.data....
I'd like to save them with unique tags so that they come out properly named when I load() them. I've created the following code to illustrate .
this.csv.data = list(data=c(1:9), unique_tag = "some_unique_tag")
assign(this.csv.data$unique_tag,this.csv.data$data)
# I want to save the data,
# with variable name of <unique_tag>,
# at a file named <unique_tag>.dat
saved_file_name <- paste(this.csv.data$unique_tag,"RData",sep=".")
save(get(this.csv.data$unique_tag), saved_file_name)
but the last line returns:
"Error in save(get(this_unique_tag), file = data_tag) :
object ‘get(this_unique_tag)’ not found"
even though the following returns the data just fine:
get(this.csv.data$unique_tag)
Just name the arguments you use. With your code the following works fine:
save(list = this.csv.data$unique_tag, file=saved_file_name)
My preference is to avoid the name in the RData file on load:
obj = local(get(load('myfile.RData')))
This way you can load various RData files and name the objects whatever you want, or store them in a list etc.
You really should use saveRDS/readRDS to serialize your objects.
save and load are for saving whole environments.
saveRDS(this.csv.data, saved_file_name)
# later
mydata <- readRDS(saved_file_name)
you can use
save.image("myfile.RData")
This worked for me:
env <- new.env()
env[[varname]] <- object_to_save
save(list=c(varname), envir=env, file='out.Rda')
You could probably do it without a new env (but I didn't try this):
.GlobalEnv[[varname]] <- object_to_save
save(list=c(varname), envir=.GlobalEnv, file='out.Rda')
You might even be able to remove the envir variable.