Do not collect results in a list in lapply [duplicate] - r

This question already has answers here:
Stop lapply from printing to console
(3 answers)
Closed 5 years ago.
Suppose I have a file structure defined as follows:
There are three folders: A, B and C. Each of the folders contains a file called file_demo.csv. Now I would like to read the file from each of the folders, do some operation on them and export them to three new files not in those folders. Subsequently I use lapply() to do.
Here's some code for a demo:
# the folder list
folder_list <- c('A', 'B', 'C')
# creating demo data frames
set.seed(1)
file_demo_a <- data.frame(X = rnorm(5),
Y = rpois(5, lambda = 2))
write_csv(file_demo_a, 'A/file_demo.csv')
set.seed(2)
file_demo_b <- data.frame(X = rnorm(5),
Y = rpois(5, lambda = 2))
write_csv(file_demo_b, 'B/file_demo.csv')
set.seed(3)
file_demo_c <- data.frame(X = rnorm(5),
Y = rpois(5, lambda = 2))
write_csv(file_demo_c, 'C/file_demo.csv')
# defining a function
df_mod_func <- function(folder_name){
path_name <- paste(folder_name, 'file_demo.csv', sep = "/")
new_demo <- read_csv(path_name)
new_demo <- new_demo + 1 # do a new operation
csv_file_name <- paste(folder_name, 'new_file_demo.csv', sep = "_")
new_demo %>% write_csv(csv_file_name)
# return(NULL)
}
lapply(folder_list, df_mod_func)
Now the problem I am facing is that when I call lapply(), each of the final data frames are printed to the console. This is a problem because these data files that I will load are huge and I do not want R to crash. I also do not want to store it a an object because of the huge size. I have also tried to return NULL in the function but that seems like a hacky way plus I do not want to fill up my console with useless output.
Is there a way to not get lapply (or use any other function) to collect the output in this case and just silently execute?

If it's just about not printing the result, you can always use invisible(), as in:
invisible( lapply( folder_list, df_mod_func ) )

Related

Looping over lists, extracting certain elements and delete the list?

I am trying to create an efficient code that opens data files containing a list, extracts one element within the list, stores it in a data frame and then deletes this object before opening the next one.
My idea is doing this using loops. Unfortunately, I am quite new in learning how to do this using loops, and don't know how write the code.
I have managed to open the data-sets using the following code:
for(i in 1995:2015){
objects = paste("C:/Users/...",i,"agg.rda", sep=" ")
load(objects)
}
The problem is that each data-set is extremely large and R cannot open all of them at once. Therefore, I am now trying to extract an element within each list called: tab_<<i value >>_agg[["A"]] (for example tab_1995_agg[["A"]]), then delete the object and iterate over each i (which are different years).
I have tried using the following code but it does not work
for(i in unique(1995:2015)){
objects = paste("C:/Users/...",i,"agg.rda", sep=" ")
load(objects)
tmp = cat("tab",i,"_agg[[\"A\"]]" , sep = "")
y <- rbind(y, tmp)
rm(list=objects)
}
I apologize for any silly mistake (or question) and greatly appreciate any help.
Here’s a possible solution using a function to rename the object you’re loading in. I got loadRData from here. The loadRData function makes this a bit more approachable because you can load in the object with a different name.
Create some data for a reproducible example.
tab2000_agg <-
list(
A = 1:5,
b = 6:10
)
tab2001_agg <-
list(
A = 1:5,
d = 6:10
)
save(tab2000_agg, file = "2000_agg.rda")
save(tab2001_agg, file = "2001_agg.rda")
rm(tab2000_agg, tab2001_agg)
Using your loop idea.
loadRData <- function(fileName){
load(fileName)
get(ls()[ls() != "fileName"])
}
y <- list()
for(i in 2000:2001){
objects <- paste("", i, "_agg.rda", sep="")
data_list <- loadRData(objects)
tmp <- data_list[["A"]]
y[[i]] <- tmp
rm(data_list)
}
y <- do.call(rbind, y)
You could also turn it into a function rather than use a loop.
getElement <- function(year){
objects <- paste0("", year, "_agg.rda")
data_list <- loadRData(objects)
tmp <- data_list[["A"]]
return(tmp)
}
y <- lapply(2000:2001, getElement)
y <- do.call(rbind, y)
Created on 2022-01-14 by the reprex package (v2.0.1)

Saving CSV files inside a for-loop [duplicate]

This question already has answers here:
Writing multiple data frames into .csv files using R
(4 answers)
Closed 3 years ago.
I'm trying to use a for loop on a function that takes the data file read by the loop, modifies it, and saves the result as a CSV file and can't figure out how to apply the loop so that each result would be saved as a separate CSV file.
Below are examples of my data frames:
df1 <- data.frame("age" = c(1.5, 5.5, 10), "group" = rep("A", 3))
df2 <- data.frame("age" = c(1, 5.5, 9, 15), "group" = rep("B", 4))
dffiles <- list(df1, df2)
Right now my code looks like this:
addone <- function(id,df){
new <- df[,1]+1
df$index <- id
name <- paste0("df",i)
assign(name, df)
write.csv(new, paste0("added", id, ".csv"))
}
for (i in 1:2){
dffile <- dffiles[[i]]
addone(i, dffile)
}
So the for loop is reading each original file, which would ideally each get a new index column and gets saved as a CSV file. There seems to be a problem with my loop though because I would get the correct file when I run individual lines in the function but my output from the loop doesn't include the index column. Can anyone point out which part in the loop I messed up?
You need to create a new file name and then use the newly created name in the write.csv function.
Something like this should work:
#function takes an ID and creates the file name
func <- function(id, df) {
new <- df[,1]+1
write.csv(new, paste0("new", id, ".csv"))
}
#the call to the function passes the identifier.
dffiles <- list(df1, df2)
for (i in 1:2){
dffile <- dffiles[[i]]
func(i, dffile)
}

use name of dataframe on a list of dataframes

I try to solve a problem from a question I have previously posted looping inside list in r
Is there a way to get the name of a dataframe that is on a list of dataframes?
I have listed a serie of dataframes and to each dataframe I want to apply myfunction. But I do not know how to get the name of each dataframe in order to use it on nameofprocesseddf of myfunction.
Here is the way I get the list of my dataframes and the code I got until now. Any suggestion how I can make this work?
library(missForest)
library(dplyr)
myfunction <- function (originaldf, proceseddf, nonproceseddf, nameofprocesseddf=character){
NRMSE <- nrmse(proceseddf, nonproceseddf, originaldf)
comment(nameofprocesseddf) <- nameofprocesseddf
results <- as.data.frame(list(comment(nameofprocesseddf), NRMSE))
names(results) <- c("Dataset", "NRMSE")
return(results)
}
a <- data.frame(value = rnorm(100), cat = c(rep(1,50), rep(2,50)))
da1 <- data.frame(value = rnorm(100,4), cat2 = c(rep(2,50), rep(3,50)))
dataframes <- dir(pattern = ".txt")
list_dataframes <- llply(dataframes, read.table, header = T, dec=".", sep=",")
n <- length(dataframes)
# Here is where I do not know how to get the name of the `i` dataframe
for (i in 1:n){
modified_list <- llply(list_dataframes, myfunction, originaldf = a, nonproceseddf = da1, proceseddf = list_dataframes[i], nameof processeddf= names(list_dataframes[i]))
write.table(file = sprintf("myfile/%s_NRMSE20%02d.txt", dataframes[i]), modified_list[[i]], row.names = F, sep=",")
}
as a matter of fact, the name of a data frame is not an attribute of the data frame. It's just an expression used to call the object. Hence the name of the data frame is indeed 'list_dataframes[i]'.
Since I assume you want to name your data frame as the text file is named without the extension, I propose you use something like (it require the library stringr) :
nameofprocesseddf = substr(dataframes[i],start = 1,stop = str_length(dataframes[i])-4)

How to address several objects named in the same structure in R?

I'd like to know how to include every object that fulfils certain naming requirements in my arguments in R. Let's say the objects are all called something like this
var01 var02 var03 var04 varnn
What I would do in Stata for instance would be simply this
tab1 var*
and it would tabulate every variable with the first 3 letters "var".
In an earlier version of this post I was quite vague about what I actually want to do in my R project. So here goes. I've got a for loop, that iterates over 650 instances, with the goal of appending 6 datasets for every one of these instances. However, for some (I don't know which), not all 6 datasets exist, which is why the rbind command that's written like this fails:
rbind(data01, data02, data03, data04, data05, data06)
I'd therefore like to run something like this
rbind(data*)
So as to account for missing datasets.
Sorry for the confusion, I wasn't being clear enough when I originally wrote the question.
Just for reference, here is the whole loop:
for(i in 1:650){
try(part1 <- read.csv(file = paste0("Twitter Scrapes/searchTwitter/09July/",MP.ID[i],".csv")))
try(part2 <- read.csv(file = paste0("Twitter Scrapes/userTimeline/08July/",MP.ID[i],".csv")))
try(part3 <- read.csv(file = paste0("Twitter Scrapes/userTimeline/16July/",MP.ID[i],".csv")))
try(part4 <- read.csv(file = paste0("Twitter Scrapes/searchTwitter/17July/",MP.ID[i],".csv")))
try(part5 <- read.csv(file = paste0("Twitter Scrapes/userTimeline/24July/",MP.ID[i],".csv")))
try(part6 <- read.csv(file = paste0("Twitter Scrapes/searchTwitter/24July/",MP.ID[i],".csv")))
allParts <- ls(pattern = "^part*")
allNames <- paste(allParts, collapse = ", ") # this is just what I tried just now, didn't work though
combined.df <- rbind(ALL THE DATASETS WITH PART))
}
Data
var01 <- sample(2, 10, TRUE)
var02 <- sample(2, 10, TRUE)
var03 <- sample(2, 10, TRUE)
vvv01 <- sample(2, 10, TRUE) # variable which should not be tabulated
Code
allV <- ls(pattern = "^var.*") # vector of all variables starting with 'var'
lapply(allV, function(.) table(get(.)))
Explanation
With ls you get all variables which are named according to the pattern you provide. Then, you loop over all these variables, retrieve the variable by its name and tabulate it.
Update
With your recent changes what I would do is the following:
allV <- lapply(ls(pattern = "^part.*"), get) #stores all part variables in a list
combined.df <- do.call(rbind, allV) # rbinds all of them

Efficiency in assigning programmatically in R

In summary, I have a script for importing lots of data stored in several txt files. In a sigle file not all the rows are to be put in the same table (DF now switching to DT), so for each file I select all the rows belonging to the same DF, get DF and assign to it the rows.
The first time I create a DF named ,say, table1 I do:
name <- "table1" # in my code the value of name will depend on different factors
# and **not** known in advance
assign(name, someRows)
Then, during the execution my code may find (in other files) other lines to be put in the table1 data frame, so:
name <- "table"
assign(name, rbindfill(get(name), someRows))
My question is: is assign(get(string), anyObject) the best way for doing assignment programmatically? Thanks
EDIT:
here is a simplified version of my code: (each item in dataSource is the result of read.table() so one single text file)
set.seed(1)
#
dataSource <- list(data.frame(fileType = rep(letters[1:2], each=4),
id = rep(LETTERS[1:4], each=2),
var1 = as.integer(rnorm(8))),
data.frame(fileType = rep(letters[1:2], each=4),
id = rep(LETTERS[1:4], each=2),
var1 = as.integer(rnorm(8))))
# # #
#
library(plyr)
#
tablesnames <- unique(unlist(lapply(dataSource,function(x) as.character(unique(x[,1])))))
for(l in tablesnames){
temp <- lapply(dataSource, function(x) x[x[,1]==l, -1])
if(exists(l)) assign(l, rbind.fill(get(l), rbind.fill(temp))) else assign(l, rbind.fill(temp))
}
#
#
# now two data frames a and b are crated
#
#
# different method using rbindlist in place of rbind.fill (faster and, until now, I don't # have missing column to fill)
#
rm(a,b)
library(data.table)
#
tablesnames <- unique(unlist(lapply(dataSource,function(x) as.character(unique(x[,1])))))
for(l in tablesnames){
temp <- lapply(dataSource, function(x) x[x[,1]==l, -1])
if(exists(l)) assign(l, rbindlist(list(get(l), rbindlist(temp)))) else assign(l, rbindlist(temp))
}
I would recommend using a named list, and skip using assign and get. Many of the cool R features (lapply for example) work very well on lists, and do not work with using assign and get. In addition, you can easily pass lists in to a function, while this can be somewhat cumbersome with groups of variables combined with assign and get.
If you want to read a set of files into one big data.frame I'd use something like this (assuming csv like text files):
library(plyr)
list_of_files = list.files(pattern = "*.csv")
big_dataframe = ldply(list_of_files, read.csv)
or if you want to keep the result in a list:
big_list = lapply(list_of_files, read.csv)
and possibly use rbind.fill:
big_dataframe = do.call("rbind.fill", big_list)

Resources