Loading many files at once? - r

So let's say I have a directory with a bunch of .rdata files
file_names=as.list(dir(pattern="stock_*"))
[[1]]
[1] "stock_1.rdata"
[[2]]
[1] "stock_2.rdata"
Now, how do I load these files with a single call?
I can always do:
for(i in 1:length(file_names)) load(file_names[[i]])
but why can't I do something like do.call(load, file_names)?
I suppose none of the apply functions would work because most of them would return lists but nothing should be returned, just that these files need to be loaded. I cannot get the get function to work in this context either. Ideas?

lapply works, but you have to specify that you want the objects loaded to the .GlobalEnv otherwise they're loaded into the temporary evaluation environment created (and destroyed) by lapply.
lapply(file_names,load,.GlobalEnv)

For what it's worth, the above didn't exactly work for me, so I'll post how I adapted that answer:
I have files in folder_with_files/ that are prefixed by prefix_pattern_, are all of type .RData, and are named what I want them to be named in my R environment: ex: if I had saved var_x = 5, I would save it as prefix_pattern_var_x.Data in folder_with_files.
I get the list of the file names, then generate their full path to load them, then gsub out the parts that I don't want: taking it (for object1 as an example) from folder_with_files/prefix_pattern_object1.RData to object1 as the objname to which I will assign the object stored in the RData file.
file_names=as.list(dir(path = 'folder_with_files/', pattern="prefix_pattern_*"))
file_names = lapply(file_names, function(x) paste0('folder_with_files/', x))
out = lapply(file_names,function(x){
env = new.env()
nm = load(x, envir = env)[1]
objname = gsub(pattern = 'folder_with_files/', replacement = '', x = x, fixed = T)
objname = gsub(pattern = 'prefix_pattern_|.RData', replacement = '', x = objname)
# print(str(env[[nm]]))
assign(objname, env[[nm]], envir = .GlobalEnv)
0 # succeeded
} )

Loading many files in a function?
Here's a modified version of Joshua Ulrich's answer that will work both interactively and if placed within a function, by replacing GlobalEnv with environment():
lapply(file_names, load, environment())
or
foo <- function(file_names) {
lapply(file_names, load, environment())
ls()
}
Working example below. It will write files to your current working directory.
invisible(sapply(letters[1:5], function(l) {
assign(paste0("ex_", l), data.frame(x = rnorm(10)))
do.call(save, list(paste0("ex_", l), file = paste0("ex_", l, ".rda")))
}))
file_names <- paste0("ex_", letters[1:5], ".rda")
foo(file_names)

Related

Loading Multiple RDS Files in R as Multiple Objects in a Custom Function

I'm trying to write a custom function to load multiple RDS files and assign them to separate objects within my environment. The code for the function is below:
read_multi_rds <- function(filepath, regrex) {
## grab all files in filepath with regrex provided
files <- list.files(path = filepath, pattern = regrex)
var_names <- character(0)
for(i in 1:length(files)){
name <- substr(files[i], 1, (nchar(files[i])-4)) ## -4 to remove the .rds from the var name
var_names[i] <- name
}
for(i in 1:length(files)){
file <- readRDS(paste0(filepath, files[i]))
assign(var_names[i], file)
}
}
When I test this function by running each bit of the function separately:
filepath <- "I:/Data Sets/"
regrex <- "^cleaned"
files <- list.files(path = filepath, pattern = regrex)
var_names <- character(0)
...followed by...
for(i in 1:length(files)){
name <- substr(files[i], 1, (nchar(files[i])-4)) ## -4 to remove the .rds from the var name
var_names[i] <- name
}
...and finally...
for(i in 1:length(files)){
file <- readRDS(paste0(filepath, files[i]))
assign(var_names[i], file)
}
...the objects are loaded into the environment.
But when I try to load the objects using the function:
read_multi_rds(filepath = "I:/Data Sets/", regrex = "^cleaned")
Nothing loads. I've added the line:
print('done')
at the end of the function to make sure it's running in its entirety, and it seems to be. I'm not getting any error messages or warnings, either.
Is there something I need to add into the function to properly load these items into my environment? Or is this just not possible to do as a function in R? I'm happy just using the code as is within my scripts, but being able to use it as a function would be much neater if I could pull it off.
assign, when used in a function, assigns in the environment of the function. You have to tell assign to assign in the global environment, as the following code illustrates:
data(mtcars)
tmp <- tempfile(fileext = ".csv")
write.csv(mtcars, tmp)
read_wrong <- function(file_name = tmp) {
f <- read.csv(file_name)
assign("my_data", f)
ls() # shows that my_data is in the current environment
}
read_correct <- function(file_name = tmp) {
f <- read.csv(file_name)
assign("my_data", f, envir = .GlobalEnv)
ls() # shows that my_data is not in the current environment
}
read_wrong()
# [1] "f" "file_name" "my_data"
ls() # no my_data
# [1] "mtcars" "read_correct" "read_wrong" "tmp"
read_correct()
# [1] "f" "file_name"
ls()
# [1] "mtcars" "my_data" "read_correct" "read_wrong" "tmp"
Having said that I would not use assign in the first place but instead return a list of data frames from the function.
read_better <- function(file_name = tmp) {
parsed_name <- basename(tmp) # do some parsing here to get a proper object name
f <- read.csv(file_name)
setNames(list(f), parsed_name)
}
all_data <- read_better()

Save multiple objects using saveRDS and lapply

I'm trying to write function that can take objects and save them each individually. This is what I have so far:
# Objects
x = 1:10
y = letters[1:10]
# Save location
folder = "Output_Data"
# Save a single object
ObjSave <- function(object, folder) {
filename = paste0(folder, "/", deparse(substitute(object)), ".rds")
saveRDS(object, filename)
}
ObjSave(x, folder) # Works fine. Output: x.rds
# Save multiple objects
ObjSave <- function(..., folder) {
invisible(lapply(
list(...),
function(object) {
filename = paste0(folder, "/", deparse(substitute(object)), ".rds")
saveRDS(object, filename)}
))
}
ObjSave(x, y, folder = folder)
# Creates a single object "X[[i]].rds"
# When I use readRDS, it gives the last object i.e. y
# I'm trying to get separate x.rds and y.rds containing x and y respectively
Any help would be much appreciated! I think it's just the deparse(substitute(object)) that is giving me issues, but I haven't worked it out yet.
You need to be careful when you deparse an object.
If you're looking for the variable name in the function input, it'd be easiest to do it on the first line in the function, otherwise if you call it later after changing how it's referenced (e.g., in the lapply loop) the parse tree changes, and therefore the deparse name changes.
x = 1:10
y = letters[1:10]
# Save location
folder = "output_data"
# Save multiple objects
ObjSave <- function(..., folder) {
objects <- list(...)
object_names <- sapply(substitute(list(...))[-1], deparse)
sapply(1:length(objects), function(i) {
filename = paste0(folder, "/", object_names[i], ".rds")
saveRDS(objects[i], filename)
})
}
ObjSave(x, y, folder = folder)

Function to read in multiple delimited text files

Using this answer, I have created a function that should read in all the text datasets in a directory:
read.delims = function(dir, sep = "\t"){
# Make a list of all data frames in the "data" folder
list.data = list.files(dir, pattern = "*.(txt|TXT|csv|CSV)")
# Read them in
for (i in 1:length(list.data)) {
assign(list.data[i],
read.delim(paste(dir, list.data[i], sep = "/"),
sep = sep))
}
}
However, even though there are .txt and .csv files in the specified directory, no R objects get created (I'm guessing this happens because I'm using the read.delim within a function). How to correct this?
You can add the parameter envir in your assignment, like this :
read.delims = function(dir, sep = "\t"){
# Make a list of all data frames in the "data" folder
list.data = list.files(dir, pattern = "*.(txt|TXT|csv|CSV)")
# Read them in
for (i in 1:length(list.data)) {
assign(list.data[i],
read.delim(paste(dir, list.data[i], sep = "/"),
sep = sep),
envir=.GlobalEnv)
}
}
Doing this, your object will be created in the global environment and not just in the function environment
As I said in my comment, it is necessary to return() a value after assigning. I don't really see the point in using assign() though, so here it is with a simple for-loop, assuming you want your output to be a list of data frames.
Note that I changed the reading function to read.table() for personal convenience. You might want to adjust that.
read.delims <- function(dir, sep = "\t"){
# Make a list of all data frames in the "data" folder
list.data <- list.files(dir, pattern = "*.(txt|TXT|csv|CSV)")
list.out <- as.list(1:length(list.data))
# Read them in
for (i in 1:length(list.data)) {
list.out[[i]] <- read.table(paste(dir, list.data[i], sep = "/"), sep = sep)
}
return(list.out)
}
Maybe you should also add a $ to your regular expression.
Cheers.

function that returns a value stored as a variable in an RData file (without global vars)

I want to get a specific variable value from a stored RData file. Often times in R sample code, the data set is loaded involving global variables.
I want to avoid any global variables and instead write a function that returns the value of a variable stored in an RData file. (This makes is also more explicit which variable is needed.)
How can I program a function returns a value stored as a variable in an RData file (without using any global variables).
(My try ist the function getVariableFromRDatabelow, but it is a bit cumbersome and perhaps not correct.)
xx <- pi # to ensure there is some data
save(list = ls(all = TRUE), file= "all.RData")
rm(xx)
getVariableFromRData <- function(dataName, varName) {
e <- new.env()
load(dataName, envir=e)
if(varName %in% ls(e)) {
resultVar <- e[[varName]]
return(resultVar)
} else {
stop (paste0("!! Error: varname (", varName,
") not found in RData (", dataName, ")!"))
}
}
yy <- getVariableFromRData("all.RData", "xx")
Your solution looks decent. Compare w/a function I wrote (based on some old SO question) to modify a .Rdata file:
resave<- function (..., list = character(), file)
{
previous <- load(file)
var.names <- c(list, as.character(substitute(list(...)))[-1L])
for (var in var.names) assign(var, get(var, envir = parent.frame()))
save(list = unique(c(previous, var.names)), file = file)
}
So strictly speaking you don't need a new environment: you can just query the output of load to see if the desired variable name is there.

R 3.1 sapply to a list of files

I want to parse the read.table() function to a list of .txt files. These files are in my current directory.
my.txt.list <-
list("subject_test.txt", "subject_train.txt", "X_test.txt", "X_train.txt")
Before applying read.table() to elements of this list, I want to check if the dt has not been already computed and is in a cache directory. dt from cache directory are already in my environment(), in form of file_name.dt
R> ls()
"subject_test.dt" "subject_train.dt"
In this example, I only want to compute "X_test.txt" and "X_train.txt". I wrote a small function to test if dt has already been cached and apply read.table()in case not.
my.rt <- function(x,...){
# apply read.table to txt files if data table is not already cached
# x is a character vector
y <- strsplit(x,'.txt')
y <- paste(y,'.dt',sep = '')
if (y %in% ls() == FALSE){
rt <- read.table(x, header = F, sep = "", dec = '.')
}
}
This function works if I take one element this way :
subject_test.dt <- my.rt('subject_test.txt')
Now I want to sapply to my files list this way:
my.res <- saply(my.txt.list,my.rt)
I have my.resas a list of df, but the issue is the function compute all files and does take into account already computed files.
I must be missing something, but I can't see why.
TY for suggestions.
I think it has to do with the use of strsplit in your example. strsplit returns a list.
What about this?
my.txt.files <- c("subject_test.txt", "subject_train.txt", "X_test.txt", "X_train.txt")
> ls()
[1] "subject_test.dt" "subject_train.dt"
my.rt <- function(x){
y <- gsub(".txt", ".dt", x, fixed = T)
if (!(y %in% ls())) {
read.table(x, header = F, sep = "", dec = '.') }
}
my.res <- sapply(my.txt.files, FUN = my.rt)
Note that I'm replacing .txt with .dt and I'm doing a "not in". You will get NULL entries in the result list if a file is not processed.
This is untested, but I think it should work...

Resources