When you save a variable in an R data file using save, it is saved under whatever name it had in the session that saved it. When I later go to load it from another session, it is loaded with the same name, which the loading script cannot possibly know. This name could overwrite an existing variable of the same name in the loading session. Is there a way to safely load an object from a data file into a specified variable name without risk of clobbering existing variables?
Example:
Saving session:
x = 5
save(x, file="x.Rda")
Loading session:
x = 7
load("x.Rda")
print(x) # This will print 5. Oops.
How I want it to work:
x = 7
y = load_object_from_file("x.Rda")
print(x) # should print 7
print(y) # should print 5
If you're just saving a single object, don't use an .Rdata file, use an .RDS file:
x <- 5
saveRDS(x, "x.rds")
y <- readRDS("x.rds")
all.equal(x, y)
I use the following:
loadRData <- function(fileName){
#loads an RData file, and returns it
load(fileName)
get(ls()[ls() != "fileName"])
}
d <- loadRData("~/blah/ricardo.RData")
You can create a new environment, load the .rda file into that environment, and retrieve the object from there. However, this does impose some restrictions: either you know what the original name for your object is, or there is only one object saved in the file.
This function returns an object loaded from a supplied .rda file. If there is more than one object in the file, an arbitrary one is returned.
load_obj <- function(f)
{
env <- new.env()
nm <- load(f, env)[1]
env[[nm]]
}
You could also try something like:
# Load the data, and store the name of the loaded object in x
x = load('data.Rsave')
# Get the object by its name
y = get(x)
# Remove the old object since you've stored it in y
rm(x)
Similar to the other solutions above, I load variables into an environment variable. This way if I load multiple variables from the .Rda, those will not clutter my environment.
load("x.Rda", dt <- new.env())
Demo:
x <- 2
y <- 1
save(x, y, file = "mydata.Rda")
rm(x, y)
x <- 123
# Load 'x' and 'y' into a new environment called 'dt'
load("mydata.Rda", dt <- new.env())
dt$x
#> [1] 2
x
#> [1] 123
Rdata file with one object
assign('newname', get(load('~/oldname.Rdata')))
In case anyone is looking to do this with a plain source file, rather than a saved Rdata/RDS/Rda file, the solution is very similar to the one provided by #Hong Ooi
load_obj <- function(fileName) {
local_env = new.env()
source(file = fileName, local = local_env)
return(local_env[[names(local_env)[1]]])
}
my_loaded_obj = load_obj(fileName = "TestSourceFile.R")
my_loaded_obj(7)
Prints:
[1] "Value of arg is 7"
And in the separate source file TestSourceFile.R
myTestFunction = function(arg) {
print(paste0("Value of arg is ", arg))
}
Again, this solution only works if there is exactly one file, if there are more, then it will just return one of them (probably the first, but that is not guaranteed).
I'm extending the answer from #ricardo to allow selection of specific variable if the .Rdata file contains multiple variables (as my credits are low to edit an answer). It adds some lines to read user input after listing the variables contained in the .Rdata file.
loadRData <- function(fileName) {
#loads an RData file, and returns it
load(fileName)
print(ls())
n <- readline(prompt="Which variable to load? \n")
get(ls()[as.integer(n)])
}
select_var <- loadRData('Multiple_variables.Rdata')
Following from #ricardo, another example of using (effectively) a separate environment
load_rdata <- function(file_path) {
res <- local({
load(file_path)
return(get(ls()))
})
return(res)
}
Similar caveats with only expects one object to be returned
Related
When you save a variable in an R data file using save, it is saved under whatever name it had in the session that saved it. When I later go to load it from another session, it is loaded with the same name, which the loading script cannot possibly know. This name could overwrite an existing variable of the same name in the loading session. Is there a way to safely load an object from a data file into a specified variable name without risk of clobbering existing variables?
Example:
Saving session:
x = 5
save(x, file="x.Rda")
Loading session:
x = 7
load("x.Rda")
print(x) # This will print 5. Oops.
How I want it to work:
x = 7
y = load_object_from_file("x.Rda")
print(x) # should print 7
print(y) # should print 5
If you're just saving a single object, don't use an .Rdata file, use an .RDS file:
x <- 5
saveRDS(x, "x.rds")
y <- readRDS("x.rds")
all.equal(x, y)
I use the following:
loadRData <- function(fileName){
#loads an RData file, and returns it
load(fileName)
get(ls()[ls() != "fileName"])
}
d <- loadRData("~/blah/ricardo.RData")
You can create a new environment, load the .rda file into that environment, and retrieve the object from there. However, this does impose some restrictions: either you know what the original name for your object is, or there is only one object saved in the file.
This function returns an object loaded from a supplied .rda file. If there is more than one object in the file, an arbitrary one is returned.
load_obj <- function(f)
{
env <- new.env()
nm <- load(f, env)[1]
env[[nm]]
}
You could also try something like:
# Load the data, and store the name of the loaded object in x
x = load('data.Rsave')
# Get the object by its name
y = get(x)
# Remove the old object since you've stored it in y
rm(x)
Similar to the other solutions above, I load variables into an environment variable. This way if I load multiple variables from the .Rda, those will not clutter my environment.
load("x.Rda", dt <- new.env())
Demo:
x <- 2
y <- 1
save(x, y, file = "mydata.Rda")
rm(x, y)
x <- 123
# Load 'x' and 'y' into a new environment called 'dt'
load("mydata.Rda", dt <- new.env())
dt$x
#> [1] 2
x
#> [1] 123
Rdata file with one object
assign('newname', get(load('~/oldname.Rdata')))
In case anyone is looking to do this with a plain source file, rather than a saved Rdata/RDS/Rda file, the solution is very similar to the one provided by #Hong Ooi
load_obj <- function(fileName) {
local_env = new.env()
source(file = fileName, local = local_env)
return(local_env[[names(local_env)[1]]])
}
my_loaded_obj = load_obj(fileName = "TestSourceFile.R")
my_loaded_obj(7)
Prints:
[1] "Value of arg is 7"
And in the separate source file TestSourceFile.R
myTestFunction = function(arg) {
print(paste0("Value of arg is ", arg))
}
Again, this solution only works if there is exactly one file, if there are more, then it will just return one of them (probably the first, but that is not guaranteed).
I'm extending the answer from #ricardo to allow selection of specific variable if the .Rdata file contains multiple variables (as my credits are low to edit an answer). It adds some lines to read user input after listing the variables contained in the .Rdata file.
loadRData <- function(fileName) {
#loads an RData file, and returns it
load(fileName)
print(ls())
n <- readline(prompt="Which variable to load? \n")
get(ls()[as.integer(n)])
}
select_var <- loadRData('Multiple_variables.Rdata')
Following from #ricardo, another example of using (effectively) a separate environment
load_rdata <- function(file_path) {
res <- local({
load(file_path)
return(get(ls()))
})
return(res)
}
Similar caveats with only expects one object to be returned
I am trying to load a sequence of files into a list in R. Here below is example and the code I used.
## data
val <- c(1:5)
save(val, file='test1.rda')
val <- c(6:10)
save(val, file='test2.rda')
## file names
files = paste0('test',c(1:2), '.rda')
# "test1.rda" "test2.rda"
## use apply to load data into a list
res <- lapply(files, function(x) load(x))
res
# [[1]]
# [1] "val" # ??? supposed to be 1,2,3,4,5
#
# [[2]]
# [1] "val" # ??? supposed to be 6,7,8,9,10
## use for loops to load data
for (i in c(1:2)){
load(files[i])
}
# data sets are loaded as expected
I cannot see why the apply + load function is not returning the correct list. I appreciate it if anyone can point me to the right direction.
Bottom line up front: load loads data into the calling environment, and that is very different when run from a for loop and from lapply. You can override this to force into which environment the data is loaded.
If you read ?load, you'll see the envir= argument:
Usage:
load(file, envir = parent.frame(), verbose = FALSE)
Arguments:
file: a (readable binary-mode) connection or a character string
giving the name of the file to load (when tilde expansion is
done).
envir: the environment where the data should be loaded.
verbose: should item names be printed during loading?
Since the default is parent.frame(), that means it is being loaded into the environment defined within lapply, not the global environment.
Demonstration:
for (i in 1:2) { print(environment()); }
# <environment: R_GlobalEnv>
# <environment: R_GlobalEnv>
ign <- lapply(1:2, function(ign) print(environment()))
# [[1]]
# <environment: 0x000000006f54b838> # not R_GlobalEnv, aka .GlobalEnv
# [[2]]
# <environment: 0x000000006f54de58>
Also, since
Value:
A character vector of the names of objects created, invisibly.
this means that res <- lapply(files, load) will always only return a character vector, not the values itself.
While I agree with Samet Sökel's premise that readRDS provides a more functional interface (meaning: it returns something, it doesn't operate solely on side-effect), the workaround is not too difficult:
Load into the global environment:
res <- lapply(files, load, envir = .GlobalEnv)
This will return the name of all variables loaded into res, and all data appearing in the global environment.
Load into a user-defined environment:
e <- new.env(parent = emptyenv())
res <- lapply(files, load, envir = e)
# all data is now in 'e'
res will also contains just the names, but this is a little bit closer to a functional interface in that the data is going into a very specific place you define.
Don't dismiss this quickly: if you ever choose to "productionize" your code that loads all of the .rda files, it might be nice for it to load the data into an environment other than .GlobalEnv. For one, loading while inside a function and putting the data in global is really bad practice, and it might not always work smoothly for your function. Okay, it's just "one", side-effect in a production-type function/package is a bad thing (imo): it often breaks reproducibility, it can really mess with users who happen to have same-named variables in their environment ... and overwriting them is an irreversible operation that can quickly lead to anger and lost productivity. Side-effect is also very difficult to troubleshoot when something goes wrong.
load function is not a good way to assign saved R objects because it loads the object directly in your environment (as you did in your for loop, without assigning a new named object)
saveRDS and readRDS would serve you to assign a saved file to a new object in your environment;
val <- c(1:5)
saveRDS(val, file='test1.rds')
val <- c(6:10)
saveRDS(val, file='test2.rds')
files = paste0('test',c(1:2), '.rds')
res <- lapply(files, function(x) readRDS(x))
res
output;
1. 1 2 3 4 5
2. 6 7 8 9 10
I am trying to determine all the objects in a script. ( specifically to get all the dataframes but I'll settle for all the assigned objects ie vectors lists etc.)
Is there a way of doing this. Should I make the script run in its own session and then somehow get the objects from that session rather than rely on the global environment.
Use the second argument to source() when you execute the script. For example, here's a script:
x <- y + 1
z <- 2
which I can put in script.R. Then I will execute it in its own environment using the following code:
x <- 1 # This value will *not* change
y <- 2 # This value will be visible to the script
env <- new.env()
source("script.R", local = env)
Now I can print the values, and see that the comments are correct
x # the original one
# [1] 1
ls(env) # what was created?
# [1] "x" "z"
env$x # this is the one from the script
# [1] 3
I had a similar question and found an answer. I am copying the answer from my other post here.
I wrote the following function, get.objects(), that returns all the objects created in a script:
get.objects <- function(path2file = NULL, exception = NULL, source = FALSE, message = TRUE) {
library("utils")
library("tools")
# Step 0-1: Possibility to leave path2file = NULL if using RStudio.
# We are using rstudioapi to get the path to the current file
if(is.null(path2file)) path2file <- rstudioapi::getSourceEditorContext()$path
# Check that file exists
if (!file.exists(path2file)) {
stop("couldn't find file ", path2file)
}
# Step 0-2: If .Rmd file, need to extract the code in R chunks first
# Use code in https://felixfan.github.io/extract-r-code/
if(file_ext(path2file)=="Rmd") {
require("knitr")
tmp <- purl(path2file)
path2file <- paste(getwd(),tmp,sep="/")
source = TRUE # Must be changed to TRUE here
}
# Step 0-3: Start by running the script if you are calling an external script.
if(source) source(path2file)
# Step 1: screen the script
summ_script <- getParseData(parse(path2file, keep.source = TRUE))
# Step 2: extract the objects
list_objects <- summ_script$text[which(summ_script$token == "SYMBOL")]
# List unique
list_objects <- unique(list_objects)
# Step 3: find where the objects are.
src <- paste(as.vector(sapply(list_objects, find)))
src <- tapply(list_objects, factor(src), c)
# List of the objects in the Global Environment
# They can be in both the Global Environment and some packages.
src_names <- names(src)
list_objects = NULL
for (i in grep("GlobalEnv", src_names)) {
list_objects <- c(list_objects, src[[i]])
}
# Step 3bis: if any exception, remove from the list
if(!is.null(exception)) {
list_objects <- list_objects[!list_objects %in% exception]
}
# Step 4: done!
# If message, print message:
if(message) {
cat(paste0(" ",length(list_objects)," objects were created in the script \n ", path2file,"\n"))
}
return(list_objects)
}
To run it, you need a saved script. Here is an example of a script:
# This must be saved as a script, e.g, "test.R".
# Create a bunch of objects
temp <- LETTERS[1:3]
data <- data.frame(x = 1:10, y = 10:1)
p1 <- ggplot(data, aes(x, y)) + geom_point()
# List the objects. If you want to list all the objects except some, you can use the argument exception. Here, I listed as exception "p1.
get.objects()
get.objects(exception = "p1", message = FALSE)
Note that the function also works for external script and R markdown.
If you run an external script, you will have to run the script before. To do so, change the argument source to TRUE.
I am fairly new to R
and will try my best to make myself understood.
Suppose if I have an existing rdata file with multiple objects.
Now I want to add a data frame to it how do i do that?
I tried the following:
write.data.loc <- 'users/Jim/Objects'
rdataPath <- 'users/Jim/Objects.Rda'
myFile<- read.csv("myFile.csv")
loadObjects <- load(rdataPath)
save(loadObjects,myFile,file=paste(write.data.loc,".Rda",sep=""))
But this does not seem to work?
I'm not certain of your actual use-case, but if you must "append" a new object to an rda file, here is one method. This tries to be clever by loading all of the objects from the rda file into a new environment (there are many tutorials and guides that discuss the use and relevance of environments, Hadley's "Advanced R" is one that does a good job, I think).
This first step loads all of the objects into a new (empty) environment. It's useful to use an otherwise-empty environment so that we can get all of the objects from it rather easily using ls.
e <- new.env(parent = emptyenv())
load("path/to/.rda", envir = e)
The object you want to add should be loaded into a variable within the environment. Note that the dollar-sign access looks the same as lists, which makes it both (1) easy to confuse the two, and (2) easy to understand the named indexing that $ provides.
e$myFile <- read.csv("yourFile.csv")
This last piece, re-saving the rda file, is an indirect method. The ls(envir = e) returns the variable names of all objects within the environment. This is good, because save can deal with objects or with their names.
do.call("save", c(ls(envir = e), list(envir = e, file = "newsave.rda")))
Realize that this is not technically appending the data.frame to the rda file, it's over-writing the rda file with a new one that happens to contain all the previous objects and the new one data.frame.
I wrote this solution that can add dataframes, list, matrices or lists. By default it will overwrite an existing object but can be reversed with overwrite=TRUE.
add_object_to_rda <- function(obj, rda_file, overwrite = FALSE) {
.dummy <- NULL
if (!file.exists(rda_file)) save(.dummy, file = rda_file)
old_e <- new.env()
new_e <- new.env()
load(file = rda_file, envir = old_e)
name_obj <- deparse(substitute(obj)) # get the name of the object
# new_e[[name_obj]] <- get(name_obj) # use this only outside a function
new_e[[name_obj]] <- obj
# merge object from old environment with the new environment
# ls(old_e) is a character vector of the object names
if (overwrite) {
# the old variables take precedence over the new ones
invisible(sapply(ls(new_e), function(x)
assign(x, get(x, envir = new_e), envir = old_e)))
# And finally we save the variables in the environment
save(list = ls(old_e), file = rda_file, envir = old_e)
}
else {
invisible(sapply(ls(old_e), function(x)
assign(x, get(x, envir = old_e), envir = new_e)))
# And finally we save the variables in the environment
save(list = ls(new_e), file = rda_file, envir = new_e)
}
}
I have found myself in the position of needing to update one or two data objects in an Rdata file previously created using save. If I'm not careful to load the file I can forget to re-save some objects in the file. As an example, I'm working on a package with some objects stored in sysdata.rda (look-up tables for internal use which I do not want to export) and only want to worry about updating individual objects.
I haven't managed to work out if there is a standard way to do this, so created my own function.
resave <- function (..., list = character(), file = stop("'file' must be specified")) {
# create a staging environment to load the existing R objects
stage <- new.env()
load(file, envir=stage)
# get the list of objects to be "resaved"
names <- as.character(substitute(list(...)))[-1L]
list <- c(list, names)
# copy the objects to the staging environment
lapply(list, function(obj) assign(obj, get(obj), stage))
# save everything in the staging environment
save(list=ls(stage, all.names=TRUE), file=file)
}
It does seem like overkill though. Is there a better/easier way to do this?
As an aside, am I right in assuming that a new environment created in the scope of a function is destroyed after the function call?
Here is a slightly shorter version:
resave <- function(..., list = character(), file) {
previous <- load(file)
var.names <- c(list, as.character(substitute(list(...)))[-1L])
for (var in var.names) assign(var, get(var, envir = parent.frame()))
save(list = unique(c(previous, var.names)), file = file)
}
I took advantage of the fact the load function returns the name of the loaded variables, so I could use the function's environment instead of creating one. And when using get, I was careful to only look in the environment from which the function is called, i.e. parent.frame().
Here is a simulation:
x1 <- 1
x2 <- 2
x3 <- 3
save(x1, x2, x3, file = "abc.RData")
x1 <- 10
x2 <- 20
x3 <- 30
resave(x1, x3, file = "abc.RData")
load("abc.RData")
x1
# [1] 10
x2
# [1] 2
x3
# [1] 30
I have added a refactored version of #flodel's answer in the stackoverflow package. It uses environments explicitly to be a bit more defensive.
resave <- function(..., list = character(), file) {
e <- new.env()
load(file, e)
list <- union(list, as.character(substitute((...)))[-1L])
copyEnv(parent.frame(), e, list)
save(list = ls(e, all.names=TRUE), envir = e, file = file)
}