Saving several variables in a single RDS file - r

I want to pass a list of variables to saveRDS() to save their values, but instead it saves their names:
variables <- c("A", "B", "C")
saveRDS(variables, "file.R")
it saves the single vector "variables".
I also tried:
save(variables, "file.RData")
with no success

You need to use the list argument of the save function. EG:
var1 = "foo"
var2 = 2
var3 = list(a="abc", z="xyz")
ls()
save(list=c("var1", "var2", "var3"), file="myvariables.RData")
rm(list=ls())
ls()
load("myvariables.RData")
ls()
Please note that the saveRDS function creates a .RDS file, which is used to save a single R object. The save function creates a .RData file (same thing as .RDA file). .RData files are used to store an entire R workspace, or whichever names in an R workspace are passed to the list argument.
YiHui has a nice blogpost on this topic.
If you have several data tables and need them all saved in a single R object, then you can go the saveRDS route. As an example:
datalist = list(mtcars = mtcars, pressure=pressure)
saveRDS(datalist, "twodatasets.RDS")
rm(list=ls())
datalist = readRDS("twodatasets.RDS")
datalist

Another option is to store all your variables within a new environment and save this as an Rds file. You can then move the objects of this environment to the global environment (or leave them where they are).
e <- new.env()
with(e, {
var1 = "foo"
var2 = 2
var3 = list(a="abc", z="xyz")
})
saveRDS(e, "my_obj.Rds")
## new Session
my_obj <- readRDS("my_obj.Rds")
list2env(as.list(my_obj), globalenv())

Related

Preserve values given by identical object names from multiple RData images

I have multiple RData images saved. Each file contains the same number of objects with identical names across each file. How can I prepend the names of every object in every file so that I can load every file into my global environment without overwriting the objects from the previously loaded file?
For example, if I load "image1.RData", I get two objects in my global environment:
Object name
Value
object1
a
object2
b
If I load "image2.RData", I get another two objects in my global environment:
Object name
Value
object1
c
object2
d
The values for object1 and object2 given by "image1.RData" have been overwritten by the values from "image2.RData".
My goal is to be able to load each RData file and preserve the values for each object given by their respective file. Ideally, the object names from each file would be prepended with the name of their data file, such that my global environment would look something like this:
Object name
Value
image1_object1
a
image1_object2
b
image2_object1
c
image2_object2
d
Is there a feasible way to make this happen? Prepending the object names isn't a necessary requirement as long as my goal is obtained, that's just what I thought made the most sense but I can't figure out how to do it.
Thanks in advance for the help.
Here is one solution. I'm sure there is a simpler approach out there, but this does the job nicely. It's based on the simple idea of:
loading each data into a separate environment, so there is no overlap or overriding of variables.
making changes to the names of variables to remove any duplicate names.
then putting them all into the global environment (now that we know there are no duplicate name that will override each other)
Here's my setup of your situation.
object1 <- "a"
object2 <- "b"
save(object1, object2, file = "image1.rdata")
object1 <- "c"
object2 <- "d"
save(object1, object2, file = "image2.rdata")
rm(list = ls()) # remove all from global env
Here is my solution:
files <- fs::dir_ls(glob = "*.rdata") # get characther vector of paths to .rdata files
for(i in 1:length(files)) {
f_nm <- sub(".rdata$", "", basename(files[i])) #get filename without extension
e <- new.env() # initialise a new envir to save objects so don't override
obj_nm <- load(files[i], envir = e) # get object names
load(files[i], envir = e) # save objects from rdata into new env
obj <- mget(obj_nm, envir = e, inherits = FALSE) #save objects from new env into list
names(obj) <- paste(f_nm, names(obj), sep = "_") # change names as desired
list2env(obj, globalenv()) # save into global environment
}
rm(i, f_nm, e, obj_nm, obj) #clean up global env
Hope this helps.

Load multiple .csv files in R, edit them and save as new .csv files named by a list of chracter strings

I am pretty new to R and programming so I do apologies if this question has been asked elsewhere.
I'm trying to load multiple .csv files, edit them and save again. But cannot find out how to manage more than one .csv file and also name new files based on a list of character strings.
So I have .csv file and can do:
species_name<-'ace_neg'
{species<-read.csv('species_data/ace_neg.csv')
species_1_2<-species[,1:2]
species_1_2$species<-species_name
species_3_2_1<-species_1_2[,c(3,1,2)]
write.csv(species_3_2_1, file='ace_neg.csv',row.names=FALSE)}
But I would like to run this code for all .csv files in the folder and add text to a new column based on .csv file name.
So I can load all .csv files and make a list of character strings for use as a new column text and as new file names.
NDOP_files <- list.files(path="species_data", pattern="*.csv$", full.names=TRUE, recursive=FALSE)
short_names<- substr(NDOP_files, 14,20)
Then I tried:
lapply(NDOP_files, function(x){
species<-read.csv(x)
species_1_2<-species[,1:2]
species_1_2$species<-'name' #don't know how to insert first character string of short_names instead of 'name', than second character string from short_names for second csv. file etc.
Then continue in the code to change an order of columns
species_3_2_1<-species_1_2[,c(3,1,2)]
And then write all new modified csv. files and name them again by the list of short_names.
I'm sorry if the text is somewhat confusing.
Any help or suggestions would be great.
You are actually quite close and using lapply() is really good idea.
As you state, the issue is, it only takes one list as an argument,
but you want to work with two. mapply() is a function in base R that you can feed multiple lists into and cycle through synchronically. lapply() and mapply()are both designed to create/ manipulate objects inRbut you want to write the files and are not interested in the out withinR. Thepurrrpackage has thewalk*()\ functions which are useful,
when you want to cycle through lists and are only interested in creating
side effects (in your case saving files).
purrr::walk2() takes two lists, so you can provide the data and the
file names at the same time.
library(purrr)
First I create some example data (I’m basically already using the same concept here as I will below):
test_data <- map(1:5, ~ data.frame(
a = sample(1:5, 3),
b = sample(1:5, 3),
c = sample(1:5, 3)
))
walk2(test_data,
paste0("species_data/", 1:5, "test.csv"),
~ write.csv(.x, .y))
Instead of getting the file paths and then stripping away the path
to get the file names, I just call list.files(), once with full.names = TRUE and once with full.names = FALSE.
NDOP_filepaths <-
list.files(
path = "species_data",
pattern = "*.csv$",
full.names = TRUE,
recursive = FALSE
)
NDOP_filenames <-
list.files(
path = "species_data",
pattern = "*.csv$",
full.names = FALSE,
recursive = FALSE
)
Now I feed the two lists into purrr::walk2(). Using the ~ before
the curly brackets I can define the anonymous function a bit more elegant
and then use .x, and .y to refer to the entries of the first and the
second list.
walk2(NDOP_filepaths,
NDOP_filenames,
~ {
species <- read.csv(.x)
species <- species[, 1:2]
species$species <- gsub(".csv", "", .y)
write.csv(species, .x)
})
Learn more about purrr at purrr.tidyverse.org.
Alternatively, you could just extract the file name in the loop and stick to lapply() or use purrr::map()/purrr::walk(), like this:
lapply(NDOP_filepaths,
function(x) {
species <- read.csv(x)
species <- species[, 1:2]
species$species <- gsub("species///|.csv", "", x)
write.csv(species, gsub("species///", "", x))
})
NDOP_files <- list.files(path="species_data", pattern="*.csv$",
full.names=TRUE, recursive=FALSE)
# Get name of each file (without the extension)
# basename() removes all of the path up to and including the last path seperator
# file_path_sands_ext() removes the .csv extension
csvFileNames <- tools::file_path_sans_ext(basename(NDOP_files))
Then, I would write a function that takes in 1 csv file and does some manipulation to the file and outputs out a data frame. Since you have a list of csv files from using list.files, you can use the map function in the purrr package to apply your function to each csv file.
doSomething <- function(NDOP_file){
# your code here to manipulate NDOP_file to your liking
return(NDOP_file)
NDOP_files <- map(NDOP_files, ~doSomething(.x))
Lastly, you can manipulate the file names when you write the new csv files using csvFileNames and a custom function you write to change the file name to your liking. Essentially, use the same architecture of defining your custom function and using map to apply to each of your files.

R identify workspace objects used in source

Is there a way to identify all the workspace objects created, modified or referenced in a sourced script? I have hundreds of randomly-named objects in my workplace and am 'cleaning house' - I would like to be able to be more proactive about this in the future and just rm() the heck out of the sourced script at the end.
The simplest way is to store your environment objects in a list before sourcing, sourcing, then comparing the new environment objects with the old list.
Here is some pseudo-code.
old_objects <- ls()
source(file)
new_objects <- setdiff(ls(), c(old_objects, "old_objects"))
This will identify the created objects. To identify whether an object was modified, I don't see another solution than to store all your objects in a list beforehand and then running identical afterwards.
# rm(list = ls(all = TRUE))
a <- 1
b <- 1
old_obj_names <- ls()
old_objects <- lapply(old_obj_names, get)
names(old_objects) <- old_obj_names
# source should go here
a <- 2
c <- 3
# I add "old_obj_names" and "old_objects" in the setdiff as these
# were created after the call to ls but before the source
new_objects <- setdiff(ls(), c(old_obj_names, "old_obj_names", "old_objects"))
modified_objects <- sapply(old_obj_names, function(x) !identical(old_objects[[x]], get(x)),
USE.NAMES = TRUE)
modified_objects <- names(modified_objects[modified_objects])
new_objects is indeed "c" and modified_objects is indeed "a" in this example. Obviously, for this to work, you need to ensure that neither old_objects nor old_obj_names are in any way created or modified in the sourced file!

Assigning Directory as a Variable in R

I need to create a function called PollutantMean with the following arguments: directory, pollutant, and id=1:332)
I have most of the code written but I can't figure out how to assign my directory as a variable. My current working directory is C:/Users/User/Documents. I tried writing the variable as:
directory <- "C:/Users/User/specdata" and that didn't work.
Next I tried the following:
directory <- list.files("specdata", full.names=TRUE) and that didn't work either.
Any ideas on how to change this?
If you are trying to assign the values in your current working directory to the variable "directory" Why not take the simple method and add:
directory <- getwd()
This should take the contents of the working directory and assign the values to the variable "directory".
I've already worker with directory as variables, I usually declare them like that
directory<-"C://Users//User//specdata//"
To take back your example.
Then, if I want to read a specific file in this directory, I will just go like :
read.table(paste(directory,"myfile.txt",sep=""),...)
It's the same process to write in a file
write.table(res,file=paste(directory,"myfile.txt",sep=""),...)
Is this helping ?
EDIT : you can then use read.csv and it will work fine
I think you are confused by the assignment operation in R. The following line
directory <- "C:/Users/User/specdata"
assigns a string to a new object that just happened to be called directory. It has the same effect on your working environment as
elephant <- "C:/Users/User/specdata"
To change where R reads its files, use the function setwd (short for set working directory):
setwd("C:/Users/User/specdata")
You can also specify full path names to functions that read in data (like read.table). For your specific problem,
# creates a list of all files ending with `csv` (i.e. all csv files)
all.specdata.files <- list.files(path = "C:/Users/User/specdata", pattern = "csv$")
# creates a list resulting from the application of `read.csv` to
# each of these files (which may be slow!!)
all.specdata.list <- lapply(all.specdata.files, read.csv)
Then we use dplyr::rbind_all to row-bind them into one file.
library(dplyr)
all.specdata <- rbind_all(all.specdata.list)
Then use colMeans to determine the grand means. Not sure how to do this without seeing the data.
Assuming that the columns in each of the 300+ csv files are the same, that is have column j contains the same type of data in all files, then the following example should be of use:
# let's use a temp directory for storing the files
tmpdr <- tempdir()
# Let's creat a large matrix of values and then split it into many different
# files
original_data <- data.frame(matrix(rnorm(10000L), nrow = 1000L))
# write each row to a file
for(i in seq(1, nrow(original_data), by = 1)) {
write.csv(original_data[i, ],
file = paste0(tmpdr, "/", formatC(i, format = "d", width = 4, flag = 0), ".csv"),
row.names = FALSE)
}
# get a character vector with the full path of each of the files
files <- list.files(path = tmpdr, pattern = "\\.csv$", full.names = TRUE)
# read each file into a list
read_data <- lapply(files, read.csv)
# bind the read_data into one data.frame,
read_data <- do.call(rbind, read_data)
# check that our two data.frames are the same.
all.equal(read_data, original_data)
# [1] TRUE

Combine multiple .RData files containing objects with the same name into one single .RData file

I have many many .RData files containing one dataframe that I had saved in a previous analysis and the data frame has the same name for each file loaded. So for example using load(file1.RData) I get a data frame called 'df', then using load(file2.RData) I get a data frame with the same name 'df'. I was wondering if it is at all possible to combine all these .RData files into one big .RData file since I need to load them all at once, with the name of each df equal to the file name so I can then use the different data frames.
I can do this using the code below, but it is very intricate, there must be a simpler way to do this… Thank you for your suggestions.
Say I have 3 .RData files and want to save all in a file called "main.RData" with their specific name (now they all come out as 'df'):
all.files = c("/Users/fra/file1.RData", "/Users/fra/file2.RData", "/Users/fra/file3.RData")
assign(gsub("/Users/fra/", "", all.files[1]), local(get(load(all.files[1]))))
rm(list= ls()[!(ls() %in% (ls(pattern = "file")))])
save.image(file="main.RData")
all.files = all.files = c("/Users/fra/file1.RData", "/Users/fra/file2.RData", "/Users/fra/file3.RData")
for (f in all.files[-1]) {
assign(gsub("/Users/fra/", "", f), local(get(load(f))))
rm(list= ls()[!(ls() %in% (ls(pattern = "file")))])
save.image(file="main.RData")
}
Here's an option that incorporates several existing posts
all.files = c("file1.RData", "file2.RData", "file3.RData")
Read multiple dataframes into a single named list (How can I load an object into a variable name that I specify from an R data file?)
mylist<- lapply(all.files, function(x) {
load(file = x)
get(ls()[ls()!= "filename"])
})
names(mylist) <- all.files #Note, the names here don't have to match the filenames
You can save the list, or transfer the dataframes into the global environment prior to saving (Unlist a list of dataframes)
list2env(mylist ,.GlobalEnv)
Alternatively, if the dataframes were identical and you wanted to create a single big dataframe, you could collapse the list and add a variable with names of contributing files (Dataframes in a list; adding a new variable with name of dataframe).
all <- do.call("rbind", mylist)
all$id <- rep(all.files, sapply(mylist, nrow))
I think the best answer I saw was the code below, which I copied from an SO answer which I can't track down right now. Apologies to the original author.
resave <- function(..., list = character(), file) {
previous <- load(file)
var.names <- c(list, as.character(substitute(list(...)))[-1L])
for (var in var.names) assign(var, get(var, envir = parent.frame()))
save(list = unique(c(previous, var.names)), file = file)
}
#I took advantage of the fact the load function
#returns the name of the loaded variables, so
#I could use the function's environment instead of creating one.
#And when using get, I was careful to only look in the
#environment from which the function is called, i.e. parent.frame()

Resources