Preserve values given by identical object names from multiple RData images - r

I have multiple RData images saved. Each file contains the same number of objects with identical names across each file. How can I prepend the names of every object in every file so that I can load every file into my global environment without overwriting the objects from the previously loaded file?
For example, if I load "image1.RData", I get two objects in my global environment:
Object name
Value
object1
a
object2
b
If I load "image2.RData", I get another two objects in my global environment:
Object name
Value
object1
c
object2
d
The values for object1 and object2 given by "image1.RData" have been overwritten by the values from "image2.RData".
My goal is to be able to load each RData file and preserve the values for each object given by their respective file. Ideally, the object names from each file would be prepended with the name of their data file, such that my global environment would look something like this:
Object name
Value
image1_object1
a
image1_object2
b
image2_object1
c
image2_object2
d
Is there a feasible way to make this happen? Prepending the object names isn't a necessary requirement as long as my goal is obtained, that's just what I thought made the most sense but I can't figure out how to do it.
Thanks in advance for the help.

Here is one solution. I'm sure there is a simpler approach out there, but this does the job nicely. It's based on the simple idea of:
loading each data into a separate environment, so there is no overlap or overriding of variables.
making changes to the names of variables to remove any duplicate names.
then putting them all into the global environment (now that we know there are no duplicate name that will override each other)
Here's my setup of your situation.
object1 <- "a"
object2 <- "b"
save(object1, object2, file = "image1.rdata")
object1 <- "c"
object2 <- "d"
save(object1, object2, file = "image2.rdata")
rm(list = ls()) # remove all from global env
Here is my solution:
files <- fs::dir_ls(glob = "*.rdata") # get characther vector of paths to .rdata files
for(i in 1:length(files)) {
f_nm <- sub(".rdata$", "", basename(files[i])) #get filename without extension
e <- new.env() # initialise a new envir to save objects so don't override
obj_nm <- load(files[i], envir = e) # get object names
load(files[i], envir = e) # save objects from rdata into new env
obj <- mget(obj_nm, envir = e, inherits = FALSE) #save objects from new env into list
names(obj) <- paste(f_nm, names(obj), sep = "_") # change names as desired
list2env(obj, globalenv()) # save into global environment
}
rm(i, f_nm, e, obj_nm, obj) #clean up global env
Hope this helps.

Related

Using a function to repeat function over a collection of filenames

I would like to write a function to repeat a chunk of code over a collection of file names (names of files present in my working directory) in r. I would also like to save the outputs of each line as a global environment object if possible. General structure of what I am trying to do is given below. Function name is made up.
global_environment_object_1 <- anyfunction("filename and extension").
# Repeat this over a set of filenames in the working directory and save each as a separate
# global environment object with separate names.
A Real life example can be:
sds22 <- get_subdatasets("EVI_2017_June_2.hdf")
sds23 <- get_subdatasets("EVI_2016_June_1.hdf")
-where object names and file names are changing and the total number of files is 48.
Thanks for the help in advance!
Try using :
#Get all the filenames
all_files <- list.files(full.names = TRUE)
#Probably to be specific
#all_files <- list.files(patter = "\\.hdf$", full.names = TRUE)
#apply get_subdatasets to each one of them
all_data <- lapply(all_files, get_subdatasets)
#Assign name to list output
names(all_data) <- paste0('sds', seq_along(all_data))
#Get the data in global environment
list2env(all_data, .GlobalEnv)

Saving several variables in a single RDS file

I want to pass a list of variables to saveRDS() to save their values, but instead it saves their names:
variables <- c("A", "B", "C")
saveRDS(variables, "file.R")
it saves the single vector "variables".
I also tried:
save(variables, "file.RData")
with no success
You need to use the list argument of the save function. EG:
var1 = "foo"
var2 = 2
var3 = list(a="abc", z="xyz")
ls()
save(list=c("var1", "var2", "var3"), file="myvariables.RData")
rm(list=ls())
ls()
load("myvariables.RData")
ls()
Please note that the saveRDS function creates a .RDS file, which is used to save a single R object. The save function creates a .RData file (same thing as .RDA file). .RData files are used to store an entire R workspace, or whichever names in an R workspace are passed to the list argument.
YiHui has a nice blogpost on this topic.
If you have several data tables and need them all saved in a single R object, then you can go the saveRDS route. As an example:
datalist = list(mtcars = mtcars, pressure=pressure)
saveRDS(datalist, "twodatasets.RDS")
rm(list=ls())
datalist = readRDS("twodatasets.RDS")
datalist
Another option is to store all your variables within a new environment and save this as an Rds file. You can then move the objects of this environment to the global environment (or leave them where they are).
e <- new.env()
with(e, {
var1 = "foo"
var2 = 2
var3 = list(a="abc", z="xyz")
})
saveRDS(e, "my_obj.Rds")
## new Session
my_obj <- readRDS("my_obj.Rds")
list2env(as.list(my_obj), globalenv())

R identify workspace objects used in source

Is there a way to identify all the workspace objects created, modified or referenced in a sourced script? I have hundreds of randomly-named objects in my workplace and am 'cleaning house' - I would like to be able to be more proactive about this in the future and just rm() the heck out of the sourced script at the end.
The simplest way is to store your environment objects in a list before sourcing, sourcing, then comparing the new environment objects with the old list.
Here is some pseudo-code.
old_objects <- ls()
source(file)
new_objects <- setdiff(ls(), c(old_objects, "old_objects"))
This will identify the created objects. To identify whether an object was modified, I don't see another solution than to store all your objects in a list beforehand and then running identical afterwards.
# rm(list = ls(all = TRUE))
a <- 1
b <- 1
old_obj_names <- ls()
old_objects <- lapply(old_obj_names, get)
names(old_objects) <- old_obj_names
# source should go here
a <- 2
c <- 3
# I add "old_obj_names" and "old_objects" in the setdiff as these
# were created after the call to ls but before the source
new_objects <- setdiff(ls(), c(old_obj_names, "old_obj_names", "old_objects"))
modified_objects <- sapply(old_obj_names, function(x) !identical(old_objects[[x]], get(x)),
USE.NAMES = TRUE)
modified_objects <- names(modified_objects[modified_objects])
new_objects is indeed "c" and modified_objects is indeed "a" in this example. Obviously, for this to work, you need to ensure that neither old_objects nor old_obj_names are in any way created or modified in the sourced file!

Assigning Directory as a Variable in R

I need to create a function called PollutantMean with the following arguments: directory, pollutant, and id=1:332)
I have most of the code written but I can't figure out how to assign my directory as a variable. My current working directory is C:/Users/User/Documents. I tried writing the variable as:
directory <- "C:/Users/User/specdata" and that didn't work.
Next I tried the following:
directory <- list.files("specdata", full.names=TRUE) and that didn't work either.
Any ideas on how to change this?
If you are trying to assign the values in your current working directory to the variable "directory" Why not take the simple method and add:
directory <- getwd()
This should take the contents of the working directory and assign the values to the variable "directory".
I've already worker with directory as variables, I usually declare them like that
directory<-"C://Users//User//specdata//"
To take back your example.
Then, if I want to read a specific file in this directory, I will just go like :
read.table(paste(directory,"myfile.txt",sep=""),...)
It's the same process to write in a file
write.table(res,file=paste(directory,"myfile.txt",sep=""),...)
Is this helping ?
EDIT : you can then use read.csv and it will work fine
I think you are confused by the assignment operation in R. The following line
directory <- "C:/Users/User/specdata"
assigns a string to a new object that just happened to be called directory. It has the same effect on your working environment as
elephant <- "C:/Users/User/specdata"
To change where R reads its files, use the function setwd (short for set working directory):
setwd("C:/Users/User/specdata")
You can also specify full path names to functions that read in data (like read.table). For your specific problem,
# creates a list of all files ending with `csv` (i.e. all csv files)
all.specdata.files <- list.files(path = "C:/Users/User/specdata", pattern = "csv$")
# creates a list resulting from the application of `read.csv` to
# each of these files (which may be slow!!)
all.specdata.list <- lapply(all.specdata.files, read.csv)
Then we use dplyr::rbind_all to row-bind them into one file.
library(dplyr)
all.specdata <- rbind_all(all.specdata.list)
Then use colMeans to determine the grand means. Not sure how to do this without seeing the data.
Assuming that the columns in each of the 300+ csv files are the same, that is have column j contains the same type of data in all files, then the following example should be of use:
# let's use a temp directory for storing the files
tmpdr <- tempdir()
# Let's creat a large matrix of values and then split it into many different
# files
original_data <- data.frame(matrix(rnorm(10000L), nrow = 1000L))
# write each row to a file
for(i in seq(1, nrow(original_data), by = 1)) {
write.csv(original_data[i, ],
file = paste0(tmpdr, "/", formatC(i, format = "d", width = 4, flag = 0), ".csv"),
row.names = FALSE)
}
# get a character vector with the full path of each of the files
files <- list.files(path = tmpdr, pattern = "\\.csv$", full.names = TRUE)
# read each file into a list
read_data <- lapply(files, read.csv)
# bind the read_data into one data.frame,
read_data <- do.call(rbind, read_data)
# check that our two data.frames are the same.
all.equal(read_data, original_data)
# [1] TRUE

Combine multiple .RData files containing objects with the same name into one single .RData file

I have many many .RData files containing one dataframe that I had saved in a previous analysis and the data frame has the same name for each file loaded. So for example using load(file1.RData) I get a data frame called 'df', then using load(file2.RData) I get a data frame with the same name 'df'. I was wondering if it is at all possible to combine all these .RData files into one big .RData file since I need to load them all at once, with the name of each df equal to the file name so I can then use the different data frames.
I can do this using the code below, but it is very intricate, there must be a simpler way to do this… Thank you for your suggestions.
Say I have 3 .RData files and want to save all in a file called "main.RData" with their specific name (now they all come out as 'df'):
all.files = c("/Users/fra/file1.RData", "/Users/fra/file2.RData", "/Users/fra/file3.RData")
assign(gsub("/Users/fra/", "", all.files[1]), local(get(load(all.files[1]))))
rm(list= ls()[!(ls() %in% (ls(pattern = "file")))])
save.image(file="main.RData")
all.files = all.files = c("/Users/fra/file1.RData", "/Users/fra/file2.RData", "/Users/fra/file3.RData")
for (f in all.files[-1]) {
assign(gsub("/Users/fra/", "", f), local(get(load(f))))
rm(list= ls()[!(ls() %in% (ls(pattern = "file")))])
save.image(file="main.RData")
}
Here's an option that incorporates several existing posts
all.files = c("file1.RData", "file2.RData", "file3.RData")
Read multiple dataframes into a single named list (How can I load an object into a variable name that I specify from an R data file?)
mylist<- lapply(all.files, function(x) {
load(file = x)
get(ls()[ls()!= "filename"])
})
names(mylist) <- all.files #Note, the names here don't have to match the filenames
You can save the list, or transfer the dataframes into the global environment prior to saving (Unlist a list of dataframes)
list2env(mylist ,.GlobalEnv)
Alternatively, if the dataframes were identical and you wanted to create a single big dataframe, you could collapse the list and add a variable with names of contributing files (Dataframes in a list; adding a new variable with name of dataframe).
all <- do.call("rbind", mylist)
all$id <- rep(all.files, sapply(mylist, nrow))
I think the best answer I saw was the code below, which I copied from an SO answer which I can't track down right now. Apologies to the original author.
resave <- function(..., list = character(), file) {
previous <- load(file)
var.names <- c(list, as.character(substitute(list(...)))[-1L])
for (var in var.names) assign(var, get(var, envir = parent.frame()))
save(list = unique(c(previous, var.names)), file = file)
}
#I took advantage of the fact the load function
#returns the name of the loaded variables, so
#I could use the function's environment instead of creating one.
#And when using get, I was careful to only look in the
#environment from which the function is called, i.e. parent.frame()

Resources