Save multiple objects using saveRDS and lapply - r

I'm trying to write function that can take objects and save them each individually. This is what I have so far:
# Objects
x = 1:10
y = letters[1:10]
# Save location
folder = "Output_Data"
# Save a single object
ObjSave <- function(object, folder) {
filename = paste0(folder, "/", deparse(substitute(object)), ".rds")
saveRDS(object, filename)
}
ObjSave(x, folder) # Works fine. Output: x.rds
# Save multiple objects
ObjSave <- function(..., folder) {
invisible(lapply(
list(...),
function(object) {
filename = paste0(folder, "/", deparse(substitute(object)), ".rds")
saveRDS(object, filename)}
))
}
ObjSave(x, y, folder = folder)
# Creates a single object "X[[i]].rds"
# When I use readRDS, it gives the last object i.e. y
# I'm trying to get separate x.rds and y.rds containing x and y respectively
Any help would be much appreciated! I think it's just the deparse(substitute(object)) that is giving me issues, but I haven't worked it out yet.

You need to be careful when you deparse an object.
If you're looking for the variable name in the function input, it'd be easiest to do it on the first line in the function, otherwise if you call it later after changing how it's referenced (e.g., in the lapply loop) the parse tree changes, and therefore the deparse name changes.
x = 1:10
y = letters[1:10]
# Save location
folder = "output_data"
# Save multiple objects
ObjSave <- function(..., folder) {
objects <- list(...)
object_names <- sapply(substitute(list(...))[-1], deparse)
sapply(1:length(objects), function(i) {
filename = paste0(folder, "/", object_names[i], ".rds")
saveRDS(objects[i], filename)
})
}
ObjSave(x, y, folder = folder)

Related

Loading Multiple RDS Files in R as Multiple Objects in a Custom Function

I'm trying to write a custom function to load multiple RDS files and assign them to separate objects within my environment. The code for the function is below:
read_multi_rds <- function(filepath, regrex) {
## grab all files in filepath with regrex provided
files <- list.files(path = filepath, pattern = regrex)
var_names <- character(0)
for(i in 1:length(files)){
name <- substr(files[i], 1, (nchar(files[i])-4)) ## -4 to remove the .rds from the var name
var_names[i] <- name
}
for(i in 1:length(files)){
file <- readRDS(paste0(filepath, files[i]))
assign(var_names[i], file)
}
}
When I test this function by running each bit of the function separately:
filepath <- "I:/Data Sets/"
regrex <- "^cleaned"
files <- list.files(path = filepath, pattern = regrex)
var_names <- character(0)
...followed by...
for(i in 1:length(files)){
name <- substr(files[i], 1, (nchar(files[i])-4)) ## -4 to remove the .rds from the var name
var_names[i] <- name
}
...and finally...
for(i in 1:length(files)){
file <- readRDS(paste0(filepath, files[i]))
assign(var_names[i], file)
}
...the objects are loaded into the environment.
But when I try to load the objects using the function:
read_multi_rds(filepath = "I:/Data Sets/", regrex = "^cleaned")
Nothing loads. I've added the line:
print('done')
at the end of the function to make sure it's running in its entirety, and it seems to be. I'm not getting any error messages or warnings, either.
Is there something I need to add into the function to properly load these items into my environment? Or is this just not possible to do as a function in R? I'm happy just using the code as is within my scripts, but being able to use it as a function would be much neater if I could pull it off.
assign, when used in a function, assigns in the environment of the function. You have to tell assign to assign in the global environment, as the following code illustrates:
data(mtcars)
tmp <- tempfile(fileext = ".csv")
write.csv(mtcars, tmp)
read_wrong <- function(file_name = tmp) {
f <- read.csv(file_name)
assign("my_data", f)
ls() # shows that my_data is in the current environment
}
read_correct <- function(file_name = tmp) {
f <- read.csv(file_name)
assign("my_data", f, envir = .GlobalEnv)
ls() # shows that my_data is not in the current environment
}
read_wrong()
# [1] "f" "file_name" "my_data"
ls() # no my_data
# [1] "mtcars" "read_correct" "read_wrong" "tmp"
read_correct()
# [1] "f" "file_name"
ls()
# [1] "mtcars" "my_data" "read_correct" "read_wrong" "tmp"
Having said that I would not use assign in the first place but instead return a list of data frames from the function.
read_better <- function(file_name = tmp) {
parsed_name <- basename(tmp) # do some parsing here to get a proper object name
f <- read.csv(file_name)
setNames(list(f), parsed_name)
}
all_data <- read_better()

Remove path from variable name in a dataframe

I've put together a function that looks like this, with the first comment lines being an example. Most importantly here is the set.path variable that I use to set the path initially for the function.
# igor.import(set.path = "~/Desktop/Experiment1 Folder/SCNavigator/Traces",
# set.pattern = "StepsCrop.ibw",
# remove.na = TRUE)
igor.multifile.import <- function(set.path, set.pattern, remove.na){
{
require("IgorR")
require("reshape2")
raw_list <- list.files(path= set.path,
pattern= set.pattern,
recursive= TRUE,
full.names=TRUE)
multi.read <- function(f) { # Note that "temp.data" is just a placeholder in the function
temp_data <- as.vector(read.ibw(f)) # Change extension to match your data type
}
my_list <- sapply(X = raw_list, FUN = multi.read) # Takes all files gathered in raw_list and applies multi.read()
my_list_combined <- as.data.frame(do.call(rbind, my_list))
my_list_rotated <- t(my_list_combined[nrow(my_list_combined):1,]) # Matrix form
data_out <- melt(my_list_rotated) # "Long form", readable by ggplot2
data_out$frame <- gsub("V", "", data_out$Var1)
data_out$name <- gsub(set.path, "", data_out$Var2) # FIX THIS
}
if (remove.na == TRUE){
set_name <- na.omit(data_out)
} else if (remove.na == FALSE) {
set_name <- data_out
} else (set_name <- data_out)
}
When I run this function I'll get a large dataframe, where each file that matched the pattern will show up with a name like
/Users/Joh/Desktop/Experiment1 Folder/SCNavigator/Traces/Par994/StepsCrop.ibw`
that includes the entire filepath, and is a bit unwieldy to look at and deal with.
I've tried to remove the path part with the line that says
data_out$name <- gsub(set.path, "", data_out$Var2)
Similar to the command above that removes the dataframe auto-named V1, V2, V3... (which works). I can't remove the string part matching the set.path = "my/path/" though.
Regardless of what your set.path is, you can eliminate it by
gsub(".*/","",mypath)
mypath<-"/Users/Joh/Desktop/Experiment1 Folder/SCNavigator/Traces/Par994/StepsCrop.ibw"
gsub(".*/","",mypath)
[1] "StepsCrop.ibw"
`

Applying a function on all csv files from a certain folder

I am reading csv files from a certain folder, which all have the same structure. Furthermore, I have created a function which adds a certain value to a dataFrame.
I have created the "folder reading" - part and also created the function. However, I now need to connect these two with each other. This is where I am having my problems:
Here is my code:
addValue <- function(valueToAdd, df.file, writterPath) {
df.file$result <- df.file$Value + valueToAdd
x <- x + 1
df.file <- as.data.frame(do.call(cbind, df.file))
fullFilePath <- paste(writterPath, x , "myFile.csv", sep="")
write.csv(as.data.frame(df.file), fullFilePath)
}
#1.reading R files
path <- "C:/Users/RFiles/files/"
files <- list.files(path=path, pattern="*.csv")
for(file in files)
{
perpos <- which(strsplit(file, "")[[1]]==".")
assign(
gsub(" ","",substr(file, 1, perpos-1)),
read.csv(paste(path,file,sep="")))
}
#2.appyling function
writterPath <- "C:/Users/RFiles/files/results/"
addValue(2, sys, writterPath)
How to apply the addValue() function in my #1.reading R files construct? Any recommendations?
I appreciate your answers!
UPDATE
When trying out the example code, I get:
+ }
+ ## If you really need to change filenames with numbers,
+ newfname <- file.path(npath, paste0(x, basename(fname)))
+ ## otherwise just use `file.path(npath, basename(fname))`.
+
+ ## (4) Write back to a different file location:
+ write.csv(newdat, file = newfname, row.names = FALSE)
+ }
Error in `$<-.data.frame`(`*tmp*`, "results", value = numeric(0)) :
replacement has 0 rows, data has 11
Any suggestions?
There are several problems with your code (e.g., x in your function is never defined and is not retained between calls to addValue), so I'm guessing that this is a chopped-down version of the real code and you still have remnants remaining. Instead of picking it apart verbosely, I'll just offer my own suggested code and a few pointers.
The function addValue looks like it is good for changing a data.frame, but I would not have guessed (by the name, at least) that it would also write the file to disk (and potentially over-write an existing file).
I'm guessing you are trying to (1) read a file, (2) "add value" to it, (3) assign it to a global variable, and (4) write it to disk. The third can be problematic (and contentious with some programmers), but I'll leave it for now.
Unless writing to disk is inherent to your idea of "adding value" to a data.frame, I recommend you keep #2 separate from #4. Below is a suggested alternative to your code:
addValue <- function(valueToAdd, df) {
df$results <- df$Value + valueToAdd
## more stuff here?
return(df)
}
opath <- 'c:/Users/RFiles/files/raw' # notice the difference
npath <- 'c:/Users/RFiles/files/adjusted'
files <- list.files(path = opath, pattern = '*.csv', full.names = TRUE)
x <- 0
for (fname in files) {
x <- x + 1
## (1) read in and (2) "add value" to it
dat <- read.csv(fname)
newdat <- addValue(2, dat)
## (3) Conditionally assign to a global variable:
varname <- gsub('\\.[^.]*$', '', basename(fname))
if (! exists(varname)) {
assign(x = varname, value = newdat)
} else {
warning('variable exists, did not overwrite: ', varname)
}
## If you really need to change filenames with numbers,
newfname <- file.path(npath, paste0(x, basename(fname)))
## otherwise just use `file.path(npath, basename(fname))`.
## (4) Write back to a different file location:
write.csv(newdat, file = newfname, row.names = FALSE)
}
Notice that it will not overwrite global variables. This may be an annoying check, but will keep you from losing data if you accidentally run this section of code.
An alternative to assigning numerous variables to the global address space is to save all of them to a single list. Assuming they are the same format, you will likely be dealing with them with identical (or very similar) analytical methods, so putting them all in one list will facilitate that. The alternative of tracking disparate variable names can be tiresome.
## addValue as defined previously
opath <- 'c:/Users/RFiles/files/raw'
npath <- 'c:/Users/RFiles/files/adjusted'
ofiles <- list.files(path = opath, pattern = '*.csv', full.names = TRUE)
nfiles <- file.path(npath, basename(ofiles))
dats <- mapply(function(ofname, nfname) {
dat <- read.csv(ofname)
newdat <- addValue(2, dat)
write.csv(newdat, file = nfname, row.names = FALSE)
newdat
}, ofiles, nfiles, SIMPLIFY = FALSE)
length(dats) # number of files
names(dats) # one for each file

function that returns a value stored as a variable in an RData file (without global vars)

I want to get a specific variable value from a stored RData file. Often times in R sample code, the data set is loaded involving global variables.
I want to avoid any global variables and instead write a function that returns the value of a variable stored in an RData file. (This makes is also more explicit which variable is needed.)
How can I program a function returns a value stored as a variable in an RData file (without using any global variables).
(My try ist the function getVariableFromRDatabelow, but it is a bit cumbersome and perhaps not correct.)
xx <- pi # to ensure there is some data
save(list = ls(all = TRUE), file= "all.RData")
rm(xx)
getVariableFromRData <- function(dataName, varName) {
e <- new.env()
load(dataName, envir=e)
if(varName %in% ls(e)) {
resultVar <- e[[varName]]
return(resultVar)
} else {
stop (paste0("!! Error: varname (", varName,
") not found in RData (", dataName, ")!"))
}
}
yy <- getVariableFromRData("all.RData", "xx")
Your solution looks decent. Compare w/a function I wrote (based on some old SO question) to modify a .Rdata file:
resave<- function (..., list = character(), file)
{
previous <- load(file)
var.names <- c(list, as.character(substitute(list(...)))[-1L])
for (var in var.names) assign(var, get(var, envir = parent.frame()))
save(list = unique(c(previous, var.names)), file = file)
}
So strictly speaking you don't need a new environment: you can just query the output of load to see if the desired variable name is there.

Loading many files at once?

So let's say I have a directory with a bunch of .rdata files
file_names=as.list(dir(pattern="stock_*"))
[[1]]
[1] "stock_1.rdata"
[[2]]
[1] "stock_2.rdata"
Now, how do I load these files with a single call?
I can always do:
for(i in 1:length(file_names)) load(file_names[[i]])
but why can't I do something like do.call(load, file_names)?
I suppose none of the apply functions would work because most of them would return lists but nothing should be returned, just that these files need to be loaded. I cannot get the get function to work in this context either. Ideas?
lapply works, but you have to specify that you want the objects loaded to the .GlobalEnv otherwise they're loaded into the temporary evaluation environment created (and destroyed) by lapply.
lapply(file_names,load,.GlobalEnv)
For what it's worth, the above didn't exactly work for me, so I'll post how I adapted that answer:
I have files in folder_with_files/ that are prefixed by prefix_pattern_, are all of type .RData, and are named what I want them to be named in my R environment: ex: if I had saved var_x = 5, I would save it as prefix_pattern_var_x.Data in folder_with_files.
I get the list of the file names, then generate their full path to load them, then gsub out the parts that I don't want: taking it (for object1 as an example) from folder_with_files/prefix_pattern_object1.RData to object1 as the objname to which I will assign the object stored in the RData file.
file_names=as.list(dir(path = 'folder_with_files/', pattern="prefix_pattern_*"))
file_names = lapply(file_names, function(x) paste0('folder_with_files/', x))
out = lapply(file_names,function(x){
env = new.env()
nm = load(x, envir = env)[1]
objname = gsub(pattern = 'folder_with_files/', replacement = '', x = x, fixed = T)
objname = gsub(pattern = 'prefix_pattern_|.RData', replacement = '', x = objname)
# print(str(env[[nm]]))
assign(objname, env[[nm]], envir = .GlobalEnv)
0 # succeeded
} )
Loading many files in a function?
Here's a modified version of Joshua Ulrich's answer that will work both interactively and if placed within a function, by replacing GlobalEnv with environment():
lapply(file_names, load, environment())
or
foo <- function(file_names) {
lapply(file_names, load, environment())
ls()
}
Working example below. It will write files to your current working directory.
invisible(sapply(letters[1:5], function(l) {
assign(paste0("ex_", l), data.frame(x = rnorm(10)))
do.call(save, list(paste0("ex_", l), file = paste0("ex_", l, ".rda")))
}))
file_names <- paste0("ex_", letters[1:5], ".rda")
foo(file_names)

Resources