Storing multiple data frames into one data structure - R - r

Is it possible to have multiple data frames to be stored into one data structure and process it later by each data frame? i.e. example
df1 <- data.frame(c(1,2,3), c(4,5,6))
df2 <- data.frame(c(11,22,33), c(44,55,66))
.. then I would like to have them added in a data structure, such that I can loop through that data structure retrieving each data frame one at a time and process it, something like
for ( iterate through the data structure) # this gives df1, then df2
{
write data frame to a file
}
I cannot find any such data structure in R. Can anyone point me to any code that illustrates the same functionality?

Just put the data.frames in a list. A plus is that a list works really well with apply style loops. For example, if you want to save the data.frame's, you can use mapply:
l = list(df1, df2)
mapply(write.table, x = l, file = c("df1.txt", "df2.txt"))
If you like apply style loops (and you will, trust me :)) please take a look at the epic plyr package. It might not be the fastest package (look data.table for fast), but it drips with syntactic sugar.

Lists can be used to hold almost anything, including data.frames:
## Versatility of lists
l <- list(file(), new.env(), data.frame(a=1:4))
For writing out multiple data objects stored in a list, lapply() is your friend:
ll <- list(df1=df1, df2=df2)
## Write out as *.csv files
lapply(names(ll), function(X) write.csv(ll[[X]], file=paste0(X, ".csv")))
## Save in *.Rdata files
lapply(names(ll), function(X) {
assign(X, ll[[X]])
save(list=X, file=paste0(X, ".Rdata"))
})

What you are looking for is a list.
You can use a function like lapply to treat each of your data frames in the same manner sperately. However, there might be cases where you need to pass your list of data frames to a function that handles the data frames in relation to each other. In this case lapply doesn't help you.
That's why it is important to note how you can access and iterate the data frames in your list. It's done like this:
mylist[[data frame]][row,column]
Note the double brackets around your data frame index.
So for your example it would be
df1 <- data.frame(c(1,2,3), c(4,5,6))
df2 <- data.frame(c(11,22,33), c(44,55,66))
mylist<-list(df1,df2)
mylist[[1]][1,2] would return 4, whereas mylist[1][1,2] would return NULL. It took a while for me to find this, so I thought it might be helpful to post here.

Related

R. Apply same function to similar data frames. Generate new data frames with same name pattern

I have seen similar posts but no one that address specifically this question. I have 22 data frames called chr1,chr2,chr3,...,chr22. I want to apply a home-made function "diff_set" to all these data frames and generate 22 new data frames with names chr1.1,chr2.1,chr3.1,...,chr22.1. The function is applied to one of the columns. For instance, for chr1, I apply diff_set and generate chr1.1:
chr1.1 = diff_set(chr1$POSITION, 200000)
Any suggestion is welcome !
Simply lapply a list of dataframes on your function, diff_set, rename your output list and then if really needed but should be avoided run list2env to save individual dfs as separate objects:
output_list <- lapply(mget(paste0("chr", seq(1,22))),
function(df) diff_set(df$POSITION, 20000))
output_list <- setNames(output_list, paste0("chr", seq(1,22)+0.1))
# FIRST THREE DFS
output_list$chr1.1
output_list$chr2.1
output_list$chr3.1
# OUTPUT EACH DF AS SEPARATE OBJECT
# (BUT CONSIDER AVOIDING THIS AS YOU FLOOD GLOBAL ENVIRONMENT)
list2env(output_list, envir=.GlobalEnv)

Easy way to mention all the objects inside a function without explicitly writing all of them

Suppose I have created 100 data frames by the name of v1, v2, v3, v4, v5,...,v99,v100.
All these data frames are of the same size, I mean they all have the same set of columns.
I now want to export all these data frames appended in a single csv file using rbind within write.csv.
So I am using the function
write.csv(rbind(v1, v2, v3,v4,v5), "myfilename.csv")
The above command does the job but as you can see only 5 data frames are appended. I want to append all the data frames i.e. from v1 to v100 (in a sequential order ) but writing all of their names individually can be a painful task. Is there an easy way to mention all the object names without writing all of their names. Thanking in anticipation.
If you must have separate objects then get the names and order them by number. Then copy them into list L and rbind them together
nms <- ls(pattern = "^v\\d+$")
nms <- nms[order(as.numeric(sub("v", "", nms)))]
L <- mget(nms)
DF <- do.call("rbind", L)
however, as #MrFlick mentioned it would have been better to create them in a list L in the first place in which case only the last statement would be needed.

How to loop over several lists to make dataframes in R?

I need to create a loop for converting several list into dataframe, and then write each dataframe as csv. I mean, I want to (i) run a loop for unlist all my lists + convert them into data.frames, and (ii) write each list as CSV.
I ran the following scrip which works for one of my lists but I need to do the same for many of them.
Script to convert a nested list (e.g., list1) in data frame, and write as CSV
data <- as.data.frame(t(do.call(rbind,unlist(list1,recursive = FALSE))))
write.csv(data,"list1.csv"))
Please note that "list1" is one of my list that I wrote as an example. I created an script (done <- ls(pattern="list")) to get a vector with the name of all my lists load in the R environment. So that, I should apply the step (i) and (ii) to all the names in the "done" vector. Was it clearer now?
I would really appreciate if you can help me to create the loop?
for(i in 1:nrow(done){
list_name <- done[i]
data <- as.data.frame(t(do.call(rbind,unlist(noquote(list_name),recursive = FALSE))))
write.csv(data,paste0(list_name,".csv"))
}
fun <- function(x){
data <- as.data.frame(t(do.call(rbind,unlist(paste0("list",x),recursive = FALSE))))
write.csv(data,paste0("list",x,".csv"))
}
fun(1:n)
I believe this is the most efficient way.

Quick Read and Merge with Data.Table's Fread and Rbindlist

I am looking for a way to quickly read and merge a bunch of data files using data.table's fread and rbindlist functions. I think if fread could take a vector of files names as an argument, it could be one, elegant line like
mergeddata = rbindlist(fread(list.files("my/data/directory/")))
but since that doesn't seem to be an option, I've taken the more awkward approach of looping through the files to read them in and assign them to temporary names and then put together a list of the temporary data table names created. However I get tripped up whenever I am trying to call the list of data.table names. So my questions are (1) how can I pass a list of datatable names to rbindlist in this context, and (2) more broadly is there a better approach to this problem?
Thanks in advance for the time and help!
datafiles = list.files()
datatablelist = c()
for(i in 1:length(datafiles)){
assign(paste("dt",i,sep=""),fread(datafiles[1]))
datatablelist = append(datatablelist ,paste("dt",i,sep=""))
}
mergeddata = rbindlist(list(datatablelist))
Here is a simple way to bind multiple data frames into one single data frame using fread
# Load library
library(data.table)
# Get a List of all files named with a key word, say all `.csv` files
filenames <- list.files("C:/your/folder", pattern=glob2rx("*.csv"), full.names=TRUE)
# Load and bind all data sets
data <- rbindlist(lapply(filenames,fread))
And in case you want to bind all data files into a list of data frames, it's as simple as
# Load data sets
list.DFs <- lapply(filenames,fread)
You could do datatablelist = lapply(list.files("my/data/directory/"), fread) and then rbind the resulting list of data frames.
Although lapply is cleaner than an explicit loop, your loop will work if you read the files directly into a list.
datatablelist = list()
for(i in 1:length(datafiles)){
datatablelist[[datafiles[i]]] = fread(datafiles[i])
}

Count the number of data frames beginning with prefix in R

I have a collection of data frames that I have generated in R. I need to count the number of data frames whose names begin with "entry_". I'd like to generate a number to then use for a function that rbinds all of these data frames and these data frames only.
So far, I have tried using grep to identify the data frames, however, this just returns where they are indexed in my object list (e.g., 16:19 --- objects 16-19 begin with "entry_"):
count_entry <- (grep("entry_", objects()))
Eventually I would like to rbind all of these data frames like so:
list.make <- function() {
sapply(paste('entry_', seq(1:25), sep=''), get, environment(), simplify = FALSE)
}
all.entries <- list.make()
final.data <- rbind.fill(all.entries)
I don't want to have to enter the sequence manually every time (for example (1:25) in the code above), which is why I'm hoping to be able to automatically count the data frames beginning with "entry_".
If anyone has any ideas of how to solve this, or how to go about this in a better way, I'm all ears!
Per comment by docendo: The ls function will list objects in an environment that match a regex pattern. You can then use mget to retrieve those objects as a list:
mylist <- mget(ls(pattern = "^entry_"))
That will then work with rbind.fill. You can then remove the original objects using something similar: rm(ls(pattern = "^entry_"))

Resources