add a data frame to an existing rdata file - r

I am fairly new to R
and will try my best to make myself understood.
Suppose if I have an existing rdata file with multiple objects.
Now I want to add a data frame to it how do i do that?
I tried the following:
write.data.loc <- 'users/Jim/Objects'
rdataPath <- 'users/Jim/Objects.Rda'
myFile<- read.csv("myFile.csv")
loadObjects <- load(rdataPath)
save(loadObjects,myFile,file=paste(write.data.loc,".Rda",sep=""))
But this does not seem to work?

I'm not certain of your actual use-case, but if you must "append" a new object to an rda file, here is one method. This tries to be clever by loading all of the objects from the rda file into a new environment (there are many tutorials and guides that discuss the use and relevance of environments, Hadley's "Advanced R" is one that does a good job, I think).
This first step loads all of the objects into a new (empty) environment. It's useful to use an otherwise-empty environment so that we can get all of the objects from it rather easily using ls.
e <- new.env(parent = emptyenv())
load("path/to/.rda", envir = e)
The object you want to add should be loaded into a variable within the environment. Note that the dollar-sign access looks the same as lists, which makes it both (1) easy to confuse the two, and (2) easy to understand the named indexing that $ provides.
e$myFile <- read.csv("yourFile.csv")
This last piece, re-saving the rda file, is an indirect method. The ls(envir = e) returns the variable names of all objects within the environment. This is good, because save can deal with objects or with their names.
do.call("save", c(ls(envir = e), list(envir = e, file = "newsave.rda")))
Realize that this is not technically appending the data.frame to the rda file, it's over-writing the rda file with a new one that happens to contain all the previous objects and the new one data.frame.

I wrote this solution that can add dataframes, list, matrices or lists. By default it will overwrite an existing object but can be reversed with overwrite=TRUE.
add_object_to_rda <- function(obj, rda_file, overwrite = FALSE) {
.dummy <- NULL
if (!file.exists(rda_file)) save(.dummy, file = rda_file)
old_e <- new.env()
new_e <- new.env()
load(file = rda_file, envir = old_e)
name_obj <- deparse(substitute(obj)) # get the name of the object
# new_e[[name_obj]] <- get(name_obj) # use this only outside a function
new_e[[name_obj]] <- obj
# merge object from old environment with the new environment
# ls(old_e) is a character vector of the object names
if (overwrite) {
# the old variables take precedence over the new ones
invisible(sapply(ls(new_e), function(x)
assign(x, get(x, envir = new_e), envir = old_e)))
# And finally we save the variables in the environment
save(list = ls(old_e), file = rda_file, envir = old_e)
}
else {
invisible(sapply(ls(old_e), function(x)
assign(x, get(x, envir = old_e), envir = new_e)))
# And finally we save the variables in the environment
save(list = ls(new_e), file = rda_file, envir = new_e)
}
}

Related

Load multiple rda files and keep their names in the global environment in R [duplicate]

When you save a variable in an R data file using save, it is saved under whatever name it had in the session that saved it. When I later go to load it from another session, it is loaded with the same name, which the loading script cannot possibly know. This name could overwrite an existing variable of the same name in the loading session. Is there a way to safely load an object from a data file into a specified variable name without risk of clobbering existing variables?
Example:
Saving session:
x = 5
save(x, file="x.Rda")
Loading session:
x = 7
load("x.Rda")
print(x) # This will print 5. Oops.
How I want it to work:
x = 7
y = load_object_from_file("x.Rda")
print(x) # should print 7
print(y) # should print 5
If you're just saving a single object, don't use an .Rdata file, use an .RDS file:
x <- 5
saveRDS(x, "x.rds")
y <- readRDS("x.rds")
all.equal(x, y)
I use the following:
loadRData <- function(fileName){
#loads an RData file, and returns it
load(fileName)
get(ls()[ls() != "fileName"])
}
d <- loadRData("~/blah/ricardo.RData")
You can create a new environment, load the .rda file into that environment, and retrieve the object from there. However, this does impose some restrictions: either you know what the original name for your object is, or there is only one object saved in the file.
This function returns an object loaded from a supplied .rda file. If there is more than one object in the file, an arbitrary one is returned.
load_obj <- function(f)
{
env <- new.env()
nm <- load(f, env)[1]
env[[nm]]
}
You could also try something like:
# Load the data, and store the name of the loaded object in x
x = load('data.Rsave')
# Get the object by its name
y = get(x)
# Remove the old object since you've stored it in y
rm(x)
Similar to the other solutions above, I load variables into an environment variable. This way if I load multiple variables from the .Rda, those will not clutter my environment.
load("x.Rda", dt <- new.env())
Demo:
x <- 2
y <- 1
save(x, y, file = "mydata.Rda")
rm(x, y)
x <- 123
# Load 'x' and 'y' into a new environment called 'dt'
load("mydata.Rda", dt <- new.env())
dt$x
#> [1] 2
x
#> [1] 123
Rdata file with one object
assign('newname', get(load('~/oldname.Rdata')))
In case anyone is looking to do this with a plain source file, rather than a saved Rdata/RDS/Rda file, the solution is very similar to the one provided by #Hong Ooi
load_obj <- function(fileName) {
local_env = new.env()
source(file = fileName, local = local_env)
return(local_env[[names(local_env)[1]]])
}
my_loaded_obj = load_obj(fileName = "TestSourceFile.R")
my_loaded_obj(7)
Prints:
[1] "Value of arg is 7"
And in the separate source file TestSourceFile.R
myTestFunction = function(arg) {
print(paste0("Value of arg is ", arg))
}
Again, this solution only works if there is exactly one file, if there are more, then it will just return one of them (probably the first, but that is not guaranteed).
I'm extending the answer from #ricardo to allow selection of specific variable if the .Rdata file contains multiple variables (as my credits are low to edit an answer). It adds some lines to read user input after listing the variables contained in the .Rdata file.
loadRData <- function(fileName) {
#loads an RData file, and returns it
load(fileName)
print(ls())
n <- readline(prompt="Which variable to load? \n")
get(ls()[as.integer(n)])
}
select_var <- loadRData('Multiple_variables.Rdata')
Following from #ricardo, another example of using (effectively) a separate environment
load_rdata <- function(file_path) {
res <- local({
load(file_path)
return(get(ls()))
})
return(res)
}
Similar caveats with only expects one object to be returned

Reading global environment from within a function in R

I need to access (i.e., read and save) the items of the environment I'm working in. I have written the following function to save all objects in my (global) environment:
save_vars <- function(list.of.vars = NULL,
prefix = "StatusQuo",
path = "data") {
if(is.null(list.of.vars)) list.of.vars <- ls()
date_time <- Sys.time()
if (!is.null(path))
path <- paste0(path, "/")
file_name <- paste0(path, prefix, "_", date_time, ".RData")
save(list = list.of.vars, file = file_name)
}
The idea was that if no list.of.vars argument is passed to the function, using ls(), the function accesses the variables of the environment calling save_vars. However, it only saves the variables within the scope of the function itself. I know I can call the function as save_vars(ls()) to do the job, but is there a neater way around it?
Probably cleanest to pass the environment:
fun <- function(envir = parent.frame()) ls(envir = envir)
fun()
This lists the objects in the caller but also lets the user change which environment is used. For example, they could force the global environment to be used:
fun(.GlobalEnv)

function to clean current workspace apart from some variables

How can I write a generic R function that cleans the current workspace apart from some self-defined variables? For sure, I can achieve this in a single script with the following code:
prj = '/path/to/project'
src = 'string'
data_to_clean = head(iris)
rm(list = ls()[ !ls() %in% c('prj', 'src') ] )
# only prj and src remain
However I want this to be a function, so that it's applicable for multiple scripts and I can change the variables which should not be cleaned, in one place. Is this possible?
In case you wrap this in a function, you have to keep in mind, that a function will create its own environment, when executed. Therefore, you need to specify the environment every time (in each ls as well as rm). You probably want to remove them from the .GlobalEnv.
clean_workspace <- function(not_to_be_removed) {
rm(list =
setdiff(ls(envir = .GlobalEnv), c("clean_workspace", not_to_be_removed)),
envir = .GlobalEnv)
}
prj = '/path/to/project'
src = 'string'
data_to_clean = head(iris)
clean_workspace(c('prj', 'src'))
In order not to remove the function itself, it should be added to the values not to be removed.
If you want to read more about environments, have a look a this overview.
I think you want to remove the function itself. The important bit is to tell rm the environment where to remove these objects from:
clean_workspace <- function(not_to_be_removed, envir = globalenv()) {
objs <- ls(envir = envir)
rm(list = objs[ !objs %in% not_to_be_removed], envir = envir)
}
prj = '/path/to/project'
src = 'string'
data_to_clean = head(iris)
clean_workspace(c('prj', 'src'))
ls()
#> [1] "prj" "src"

Importing .rda file in R environment in a dataframe [duplicate]

When you save a variable in an R data file using save, it is saved under whatever name it had in the session that saved it. When I later go to load it from another session, it is loaded with the same name, which the loading script cannot possibly know. This name could overwrite an existing variable of the same name in the loading session. Is there a way to safely load an object from a data file into a specified variable name without risk of clobbering existing variables?
Example:
Saving session:
x = 5
save(x, file="x.Rda")
Loading session:
x = 7
load("x.Rda")
print(x) # This will print 5. Oops.
How I want it to work:
x = 7
y = load_object_from_file("x.Rda")
print(x) # should print 7
print(y) # should print 5
If you're just saving a single object, don't use an .Rdata file, use an .RDS file:
x <- 5
saveRDS(x, "x.rds")
y <- readRDS("x.rds")
all.equal(x, y)
I use the following:
loadRData <- function(fileName){
#loads an RData file, and returns it
load(fileName)
get(ls()[ls() != "fileName"])
}
d <- loadRData("~/blah/ricardo.RData")
You can create a new environment, load the .rda file into that environment, and retrieve the object from there. However, this does impose some restrictions: either you know what the original name for your object is, or there is only one object saved in the file.
This function returns an object loaded from a supplied .rda file. If there is more than one object in the file, an arbitrary one is returned.
load_obj <- function(f)
{
env <- new.env()
nm <- load(f, env)[1]
env[[nm]]
}
You could also try something like:
# Load the data, and store the name of the loaded object in x
x = load('data.Rsave')
# Get the object by its name
y = get(x)
# Remove the old object since you've stored it in y
rm(x)
Similar to the other solutions above, I load variables into an environment variable. This way if I load multiple variables from the .Rda, those will not clutter my environment.
load("x.Rda", dt <- new.env())
Demo:
x <- 2
y <- 1
save(x, y, file = "mydata.Rda")
rm(x, y)
x <- 123
# Load 'x' and 'y' into a new environment called 'dt'
load("mydata.Rda", dt <- new.env())
dt$x
#> [1] 2
x
#> [1] 123
Rdata file with one object
assign('newname', get(load('~/oldname.Rdata')))
In case anyone is looking to do this with a plain source file, rather than a saved Rdata/RDS/Rda file, the solution is very similar to the one provided by #Hong Ooi
load_obj <- function(fileName) {
local_env = new.env()
source(file = fileName, local = local_env)
return(local_env[[names(local_env)[1]]])
}
my_loaded_obj = load_obj(fileName = "TestSourceFile.R")
my_loaded_obj(7)
Prints:
[1] "Value of arg is 7"
And in the separate source file TestSourceFile.R
myTestFunction = function(arg) {
print(paste0("Value of arg is ", arg))
}
Again, this solution only works if there is exactly one file, if there are more, then it will just return one of them (probably the first, but that is not guaranteed).
I'm extending the answer from #ricardo to allow selection of specific variable if the .Rdata file contains multiple variables (as my credits are low to edit an answer). It adds some lines to read user input after listing the variables contained in the .Rdata file.
loadRData <- function(fileName) {
#loads an RData file, and returns it
load(fileName)
print(ls())
n <- readline(prompt="Which variable to load? \n")
get(ls()[as.integer(n)])
}
select_var <- loadRData('Multiple_variables.Rdata')
Following from #ricardo, another example of using (effectively) a separate environment
load_rdata <- function(file_path) {
res <- local({
load(file_path)
return(get(ls()))
})
return(res)
}
Similar caveats with only expects one object to be returned

Updating an existing Rdata file

I have found myself in the position of needing to update one or two data objects in an Rdata file previously created using save. If I'm not careful to load the file I can forget to re-save some objects in the file. As an example, I'm working on a package with some objects stored in sysdata.rda (look-up tables for internal use which I do not want to export) and only want to worry about updating individual objects.
I haven't managed to work out if there is a standard way to do this, so created my own function.
resave <- function (..., list = character(), file = stop("'file' must be specified")) {
# create a staging environment to load the existing R objects
stage <- new.env()
load(file, envir=stage)
# get the list of objects to be "resaved"
names <- as.character(substitute(list(...)))[-1L]
list <- c(list, names)
# copy the objects to the staging environment
lapply(list, function(obj) assign(obj, get(obj), stage))
# save everything in the staging environment
save(list=ls(stage, all.names=TRUE), file=file)
}
It does seem like overkill though. Is there a better/easier way to do this?
As an aside, am I right in assuming that a new environment created in the scope of a function is destroyed after the function call?
Here is a slightly shorter version:
resave <- function(..., list = character(), file) {
previous <- load(file)
var.names <- c(list, as.character(substitute(list(...)))[-1L])
for (var in var.names) assign(var, get(var, envir = parent.frame()))
save(list = unique(c(previous, var.names)), file = file)
}
I took advantage of the fact the load function returns the name of the loaded variables, so I could use the function's environment instead of creating one. And when using get, I was careful to only look in the environment from which the function is called, i.e. parent.frame().
Here is a simulation:
x1 <- 1
x2 <- 2
x3 <- 3
save(x1, x2, x3, file = "abc.RData")
x1 <- 10
x2 <- 20
x3 <- 30
resave(x1, x3, file = "abc.RData")
load("abc.RData")
x1
# [1] 10
x2
# [1] 2
x3
# [1] 30
I have added a refactored version of #flodel's answer in the stackoverflow package. It uses environments explicitly to be a bit more defensive.
resave <- function(..., list = character(), file) {
e <- new.env()
load(file, e)
list <- union(list, as.character(substitute((...)))[-1L])
copyEnv(parent.frame(), e, list)
save(list = ls(e, all.names=TRUE), envir = e, file = file)
}

Resources