I have found myself in the position of needing to update one or two data objects in an Rdata file previously created using save. If I'm not careful to load the file I can forget to re-save some objects in the file. As an example, I'm working on a package with some objects stored in sysdata.rda (look-up tables for internal use which I do not want to export) and only want to worry about updating individual objects.
I haven't managed to work out if there is a standard way to do this, so created my own function.
resave <- function (..., list = character(), file = stop("'file' must be specified")) {
# create a staging environment to load the existing R objects
stage <- new.env()
load(file, envir=stage)
# get the list of objects to be "resaved"
names <- as.character(substitute(list(...)))[-1L]
list <- c(list, names)
# copy the objects to the staging environment
lapply(list, function(obj) assign(obj, get(obj), stage))
# save everything in the staging environment
save(list=ls(stage, all.names=TRUE), file=file)
}
It does seem like overkill though. Is there a better/easier way to do this?
As an aside, am I right in assuming that a new environment created in the scope of a function is destroyed after the function call?
Here is a slightly shorter version:
resave <- function(..., list = character(), file) {
previous <- load(file)
var.names <- c(list, as.character(substitute(list(...)))[-1L])
for (var in var.names) assign(var, get(var, envir = parent.frame()))
save(list = unique(c(previous, var.names)), file = file)
}
I took advantage of the fact the load function returns the name of the loaded variables, so I could use the function's environment instead of creating one. And when using get, I was careful to only look in the environment from which the function is called, i.e. parent.frame().
Here is a simulation:
x1 <- 1
x2 <- 2
x3 <- 3
save(x1, x2, x3, file = "abc.RData")
x1 <- 10
x2 <- 20
x3 <- 30
resave(x1, x3, file = "abc.RData")
load("abc.RData")
x1
# [1] 10
x2
# [1] 2
x3
# [1] 30
I have added a refactored version of #flodel's answer in the stackoverflow package. It uses environments explicitly to be a bit more defensive.
resave <- function(..., list = character(), file) {
e <- new.env()
load(file, e)
list <- union(list, as.character(substitute((...)))[-1L])
copyEnv(parent.frame(), e, list)
save(list = ls(e, all.names=TRUE), envir = e, file = file)
}
Related
When you save a variable in an R data file using save, it is saved under whatever name it had in the session that saved it. When I later go to load it from another session, it is loaded with the same name, which the loading script cannot possibly know. This name could overwrite an existing variable of the same name in the loading session. Is there a way to safely load an object from a data file into a specified variable name without risk of clobbering existing variables?
Example:
Saving session:
x = 5
save(x, file="x.Rda")
Loading session:
x = 7
load("x.Rda")
print(x) # This will print 5. Oops.
How I want it to work:
x = 7
y = load_object_from_file("x.Rda")
print(x) # should print 7
print(y) # should print 5
If you're just saving a single object, don't use an .Rdata file, use an .RDS file:
x <- 5
saveRDS(x, "x.rds")
y <- readRDS("x.rds")
all.equal(x, y)
I use the following:
loadRData <- function(fileName){
#loads an RData file, and returns it
load(fileName)
get(ls()[ls() != "fileName"])
}
d <- loadRData("~/blah/ricardo.RData")
You can create a new environment, load the .rda file into that environment, and retrieve the object from there. However, this does impose some restrictions: either you know what the original name for your object is, or there is only one object saved in the file.
This function returns an object loaded from a supplied .rda file. If there is more than one object in the file, an arbitrary one is returned.
load_obj <- function(f)
{
env <- new.env()
nm <- load(f, env)[1]
env[[nm]]
}
You could also try something like:
# Load the data, and store the name of the loaded object in x
x = load('data.Rsave')
# Get the object by its name
y = get(x)
# Remove the old object since you've stored it in y
rm(x)
Similar to the other solutions above, I load variables into an environment variable. This way if I load multiple variables from the .Rda, those will not clutter my environment.
load("x.Rda", dt <- new.env())
Demo:
x <- 2
y <- 1
save(x, y, file = "mydata.Rda")
rm(x, y)
x <- 123
# Load 'x' and 'y' into a new environment called 'dt'
load("mydata.Rda", dt <- new.env())
dt$x
#> [1] 2
x
#> [1] 123
Rdata file with one object
assign('newname', get(load('~/oldname.Rdata')))
In case anyone is looking to do this with a plain source file, rather than a saved Rdata/RDS/Rda file, the solution is very similar to the one provided by #Hong Ooi
load_obj <- function(fileName) {
local_env = new.env()
source(file = fileName, local = local_env)
return(local_env[[names(local_env)[1]]])
}
my_loaded_obj = load_obj(fileName = "TestSourceFile.R")
my_loaded_obj(7)
Prints:
[1] "Value of arg is 7"
And in the separate source file TestSourceFile.R
myTestFunction = function(arg) {
print(paste0("Value of arg is ", arg))
}
Again, this solution only works if there is exactly one file, if there are more, then it will just return one of them (probably the first, but that is not guaranteed).
I'm extending the answer from #ricardo to allow selection of specific variable if the .Rdata file contains multiple variables (as my credits are low to edit an answer). It adds some lines to read user input after listing the variables contained in the .Rdata file.
loadRData <- function(fileName) {
#loads an RData file, and returns it
load(fileName)
print(ls())
n <- readline(prompt="Which variable to load? \n")
get(ls()[as.integer(n)])
}
select_var <- loadRData('Multiple_variables.Rdata')
Following from #ricardo, another example of using (effectively) a separate environment
load_rdata <- function(file_path) {
res <- local({
load(file_path)
return(get(ls()))
})
return(res)
}
Similar caveats with only expects one object to be returned
Setup:
Say I have two R functions, x() and y().
# Defining function x
# Simple, but what it does is not really important.
x <- function(input)
{output <- input * 10
return(output)}
x() is contained within an .R file and stored in the same directory as y(), but within a different file.
# Defining function y;
# What's important is that Function y's output depends on function x
y <- function(variable){
source('x.R')
output <- x(input = variable)/0.5
return(output)
}
When y() is defined in R, the environment populates with y() only, like such:
However, after we actually run y()...
# Demonstrating that it works
> y(100)
[1] 2000
the environment populates with x as well, like such:
Question:
Can I add code within y to prevent x from populating the R environment after it has ran? I've built a function that's dependent upon several source files which I don't want to keep in the environment after the function has run. I'd like to avoid unnecessarily crowding the R environment when people use the primary function, but adding a simple rm(SubFunctionName) has not worked and I haven't found any other threads on the topic. Any ideas? Thanks for your time!
1) Replace the source line with the following to cause it to be sourced into the local environment.
source('x.R', local = TRUE)
2) Another possibility is to write y like this so that x.R is only read when y.R is sourced rather than each time y is called.
y <- local({
source('x.R', local = TRUE)
function(variable) x(input = variable) / 0.5
})
3) If you don't mind having x defined in y.R then y.R could be written as follows. Note that this eliminates having any source statements in the code separating the file processing and code.
y <- function(variable) {
x <- function(input) input * 10
x(input = variable) / 0.5
}
4) Yet another possibility for separating the file processing and code is to remove the source statement from y and read x.R and y.R into the same local environment so that outside of e they can only be accessed via e. In that case they can both be removed by removing e.
e <- local({
source("x.R", local = TRUE)
source("y.R", local = TRUE)
environment()
})
# test
ls(e)
## [1] "x" "y"
e$y(3)
## [1] 60
4a) A variation of this having similar advantages but being even shorter is:
e <- new.env()
source("x.R", local = e)
source("y.R", local = e)
# test
ls(e)
## [1] "x" "y"
e$y(3)
## [1] 60
5) Yet another approach is to use the CRAN modules package or the klmr/modules package referenced in its README.
I am trying to load a sequence of files into a list in R. Here below is example and the code I used.
## data
val <- c(1:5)
save(val, file='test1.rda')
val <- c(6:10)
save(val, file='test2.rda')
## file names
files = paste0('test',c(1:2), '.rda')
# "test1.rda" "test2.rda"
## use apply to load data into a list
res <- lapply(files, function(x) load(x))
res
# [[1]]
# [1] "val" # ??? supposed to be 1,2,3,4,5
#
# [[2]]
# [1] "val" # ??? supposed to be 6,7,8,9,10
## use for loops to load data
for (i in c(1:2)){
load(files[i])
}
# data sets are loaded as expected
I cannot see why the apply + load function is not returning the correct list. I appreciate it if anyone can point me to the right direction.
Bottom line up front: load loads data into the calling environment, and that is very different when run from a for loop and from lapply. You can override this to force into which environment the data is loaded.
If you read ?load, you'll see the envir= argument:
Usage:
load(file, envir = parent.frame(), verbose = FALSE)
Arguments:
file: a (readable binary-mode) connection or a character string
giving the name of the file to load (when tilde expansion is
done).
envir: the environment where the data should be loaded.
verbose: should item names be printed during loading?
Since the default is parent.frame(), that means it is being loaded into the environment defined within lapply, not the global environment.
Demonstration:
for (i in 1:2) { print(environment()); }
# <environment: R_GlobalEnv>
# <environment: R_GlobalEnv>
ign <- lapply(1:2, function(ign) print(environment()))
# [[1]]
# <environment: 0x000000006f54b838> # not R_GlobalEnv, aka .GlobalEnv
# [[2]]
# <environment: 0x000000006f54de58>
Also, since
Value:
A character vector of the names of objects created, invisibly.
this means that res <- lapply(files, load) will always only return a character vector, not the values itself.
While I agree with Samet Sökel's premise that readRDS provides a more functional interface (meaning: it returns something, it doesn't operate solely on side-effect), the workaround is not too difficult:
Load into the global environment:
res <- lapply(files, load, envir = .GlobalEnv)
This will return the name of all variables loaded into res, and all data appearing in the global environment.
Load into a user-defined environment:
e <- new.env(parent = emptyenv())
res <- lapply(files, load, envir = e)
# all data is now in 'e'
res will also contains just the names, but this is a little bit closer to a functional interface in that the data is going into a very specific place you define.
Don't dismiss this quickly: if you ever choose to "productionize" your code that loads all of the .rda files, it might be nice for it to load the data into an environment other than .GlobalEnv. For one, loading while inside a function and putting the data in global is really bad practice, and it might not always work smoothly for your function. Okay, it's just "one", side-effect in a production-type function/package is a bad thing (imo): it often breaks reproducibility, it can really mess with users who happen to have same-named variables in their environment ... and overwriting them is an irreversible operation that can quickly lead to anger and lost productivity. Side-effect is also very difficult to troubleshoot when something goes wrong.
load function is not a good way to assign saved R objects because it loads the object directly in your environment (as you did in your for loop, without assigning a new named object)
saveRDS and readRDS would serve you to assign a saved file to a new object in your environment;
val <- c(1:5)
saveRDS(val, file='test1.rds')
val <- c(6:10)
saveRDS(val, file='test2.rds')
files = paste0('test',c(1:2), '.rds')
res <- lapply(files, function(x) readRDS(x))
res
output;
1. 1 2 3 4 5
2. 6 7 8 9 10
I am fairly new to R
and will try my best to make myself understood.
Suppose if I have an existing rdata file with multiple objects.
Now I want to add a data frame to it how do i do that?
I tried the following:
write.data.loc <- 'users/Jim/Objects'
rdataPath <- 'users/Jim/Objects.Rda'
myFile<- read.csv("myFile.csv")
loadObjects <- load(rdataPath)
save(loadObjects,myFile,file=paste(write.data.loc,".Rda",sep=""))
But this does not seem to work?
I'm not certain of your actual use-case, but if you must "append" a new object to an rda file, here is one method. This tries to be clever by loading all of the objects from the rda file into a new environment (there are many tutorials and guides that discuss the use and relevance of environments, Hadley's "Advanced R" is one that does a good job, I think).
This first step loads all of the objects into a new (empty) environment. It's useful to use an otherwise-empty environment so that we can get all of the objects from it rather easily using ls.
e <- new.env(parent = emptyenv())
load("path/to/.rda", envir = e)
The object you want to add should be loaded into a variable within the environment. Note that the dollar-sign access looks the same as lists, which makes it both (1) easy to confuse the two, and (2) easy to understand the named indexing that $ provides.
e$myFile <- read.csv("yourFile.csv")
This last piece, re-saving the rda file, is an indirect method. The ls(envir = e) returns the variable names of all objects within the environment. This is good, because save can deal with objects or with their names.
do.call("save", c(ls(envir = e), list(envir = e, file = "newsave.rda")))
Realize that this is not technically appending the data.frame to the rda file, it's over-writing the rda file with a new one that happens to contain all the previous objects and the new one data.frame.
I wrote this solution that can add dataframes, list, matrices or lists. By default it will overwrite an existing object but can be reversed with overwrite=TRUE.
add_object_to_rda <- function(obj, rda_file, overwrite = FALSE) {
.dummy <- NULL
if (!file.exists(rda_file)) save(.dummy, file = rda_file)
old_e <- new.env()
new_e <- new.env()
load(file = rda_file, envir = old_e)
name_obj <- deparse(substitute(obj)) # get the name of the object
# new_e[[name_obj]] <- get(name_obj) # use this only outside a function
new_e[[name_obj]] <- obj
# merge object from old environment with the new environment
# ls(old_e) is a character vector of the object names
if (overwrite) {
# the old variables take precedence over the new ones
invisible(sapply(ls(new_e), function(x)
assign(x, get(x, envir = new_e), envir = old_e)))
# And finally we save the variables in the environment
save(list = ls(old_e), file = rda_file, envir = old_e)
}
else {
invisible(sapply(ls(old_e), function(x)
assign(x, get(x, envir = old_e), envir = new_e)))
# And finally we save the variables in the environment
save(list = ls(new_e), file = rda_file, envir = new_e)
}
}
When you save a variable in an R data file using save, it is saved under whatever name it had in the session that saved it. When I later go to load it from another session, it is loaded with the same name, which the loading script cannot possibly know. This name could overwrite an existing variable of the same name in the loading session. Is there a way to safely load an object from a data file into a specified variable name without risk of clobbering existing variables?
Example:
Saving session:
x = 5
save(x, file="x.Rda")
Loading session:
x = 7
load("x.Rda")
print(x) # This will print 5. Oops.
How I want it to work:
x = 7
y = load_object_from_file("x.Rda")
print(x) # should print 7
print(y) # should print 5
If you're just saving a single object, don't use an .Rdata file, use an .RDS file:
x <- 5
saveRDS(x, "x.rds")
y <- readRDS("x.rds")
all.equal(x, y)
I use the following:
loadRData <- function(fileName){
#loads an RData file, and returns it
load(fileName)
get(ls()[ls() != "fileName"])
}
d <- loadRData("~/blah/ricardo.RData")
You can create a new environment, load the .rda file into that environment, and retrieve the object from there. However, this does impose some restrictions: either you know what the original name for your object is, or there is only one object saved in the file.
This function returns an object loaded from a supplied .rda file. If there is more than one object in the file, an arbitrary one is returned.
load_obj <- function(f)
{
env <- new.env()
nm <- load(f, env)[1]
env[[nm]]
}
You could also try something like:
# Load the data, and store the name of the loaded object in x
x = load('data.Rsave')
# Get the object by its name
y = get(x)
# Remove the old object since you've stored it in y
rm(x)
Similar to the other solutions above, I load variables into an environment variable. This way if I load multiple variables from the .Rda, those will not clutter my environment.
load("x.Rda", dt <- new.env())
Demo:
x <- 2
y <- 1
save(x, y, file = "mydata.Rda")
rm(x, y)
x <- 123
# Load 'x' and 'y' into a new environment called 'dt'
load("mydata.Rda", dt <- new.env())
dt$x
#> [1] 2
x
#> [1] 123
Rdata file with one object
assign('newname', get(load('~/oldname.Rdata')))
In case anyone is looking to do this with a plain source file, rather than a saved Rdata/RDS/Rda file, the solution is very similar to the one provided by #Hong Ooi
load_obj <- function(fileName) {
local_env = new.env()
source(file = fileName, local = local_env)
return(local_env[[names(local_env)[1]]])
}
my_loaded_obj = load_obj(fileName = "TestSourceFile.R")
my_loaded_obj(7)
Prints:
[1] "Value of arg is 7"
And in the separate source file TestSourceFile.R
myTestFunction = function(arg) {
print(paste0("Value of arg is ", arg))
}
Again, this solution only works if there is exactly one file, if there are more, then it will just return one of them (probably the first, but that is not guaranteed).
I'm extending the answer from #ricardo to allow selection of specific variable if the .Rdata file contains multiple variables (as my credits are low to edit an answer). It adds some lines to read user input after listing the variables contained in the .Rdata file.
loadRData <- function(fileName) {
#loads an RData file, and returns it
load(fileName)
print(ls())
n <- readline(prompt="Which variable to load? \n")
get(ls()[as.integer(n)])
}
select_var <- loadRData('Multiple_variables.Rdata')
Following from #ricardo, another example of using (effectively) a separate environment
load_rdata <- function(file_path) {
res <- local({
load(file_path)
return(get(ls()))
})
return(res)
}
Similar caveats with only expects one object to be returned