Binding data frames stored in a list in R - r

I have several data frames stored in R memory among several other objects.
Their particularity is that they are all named as "Station_Year.df".
I want to merge all these data frames into one.
I tried:
df_list <- ls(pattern=".df")
dataset <- rbind(df_list)
But I get a data frame with the names of the data frames...

You should use mget to get the data of each dataframe of the df_list. So you can do:
dataset <- do.call(rbind, mget(df_list))
Note that this implies that all the rows are of the same length. Probably you find useful also the merge function.
Thanks alexis_laz, I forgot the do.call.

Related

using rbind to combine all data sets the names of all data set start with common characters

I want to combine all rows of different data sets. The names of all data sets starts with test. All data sets have same number of observations. I know i can combine it by using rbind(). But typing the names of every data set will take a lot of time. Suggest me some better approach.
rbind(test1,test2,test3,test4)
Try first obtaining a vector of all matching objects using ls() with the pattern ^test:
dfs <- lapply(ls(pattern="^test"), function(x) get(x))
result <- rbindlist(dfs)
I am taking the suggestion by #Rohit to use rbindlist to make our lives easier to rbind together a list of data frames.
Second line of above code will work only if data sets are in data.table form or data frame form. IF data sets are in xts/zoo format then one have to make slight improvement use do.call() function.
## First make a list of all your data sets as suggested above
list_xts <- lapply(ls(pattern="^test"), function(x) get(x))
## then use do call and rbind()
xts_results<-do.call(rbind,list_xts)

Name list of data frames from data frame

I usually read a bunch of .csv files into a list of data frames and name it manually doing.
#...code for creating the list named "datos" with files from library
# Naming the columns of the data frames
names(datos$v1r1)<-c("estado","tiempo","x1","x2","y1","y2")
names(datos$v1r2)<-c(...)
names(datos$v1r3)<-c(...)
I want to do this renaming operation automatically. To do so, I created a data frame with the names I want for each of the data frames in my datos list.
Here is how I generate this data frame:
pru<-rbind(c("UT","TR","UT+","TR+"),
c("UT","TR","UT+","TR+"),
c("TR","UT","TR+","UT+"),
c("TR","UT","TR+","UT+"))
vec<-paste("v1r",seq(1,20,1),sep="")
tor<-paste("v1s",seq(1,20,1),sep="")
nombres<-do.call("rbind", replicate(10, pru, simplify = FALSE))
nombres_df<-data.frame(corrida=c(vec,tor),nombres)
Because nombres_df$corrida[1] is v1r1, I have to name the datos$v1r1 columns ("estado","tiempo", nombres_df[1,2:5]), and so on for the other 40 elements.
I want to do this renaming automatically. I was thinking I could use something that uses regular expressions.
Just for the record, I don't know why but the order of the list of data frames is not the same as the 1:20 sequence (by this I mean 10 comes before 2,3,4...)
Here's a toy example of a list with a similar structure but fewer and shorter data frames.
toy<-list(a=replicate(6,1:5),b=replicate(6,10:14))
You have a data frame where variable corridas is the name of the data frame to be renamed and the remaining columns are the desired variable names for that data frame. You could use a loop to do all the renaming operations:
for (i in seq_len(nrow(nombres_df))) {
names(datos[[nombres_df$corridas[i]]]) <- c("estado","tiempo",nombres_df[i,2:length(nombres_df)])
}

subset multiple data tables using lapply

I have multiple data tables and all have a common column called ID. I have a vector vec that contains a set of ID values.
I would like to use lapply to subset all data tables using vec
I understand how to use lapply to subset the data tables but my question is how to assign the subsetted results back to original data tables
Here is what I tried :
tables<-c("dt1","dt2","dt3","dt4")
lapply(mget(tables),function(x)x[ID %in% vec,])
The above gives subsets of all data tables but how do I assign them back to dt1,dt2,dt3,dt4 ?
I would keep the datasets in the list rather than updating the dataset objects in the global environment as most of the operations can be done within the list (including reading the files and writing to output files ). But, if you insist, we can use list2env which will update the original dataset objects with the subset dataset
lst <- lapply(mget(tables),function(x)x[ID %in% vec,])
list2env(lst, envir=.GlobalEnv)
You could also just name the datasets in the list:
tables <- c("dt1","dt2","dt3","dt4")
dflist <- lapply(mget(tables),function(x)x[ID %in% vec,])
dflist <- setNames(dflist, tables)

Binding data frames created by loops in R

I have a number of R scripts that create data frames of the same length and I am trying to aggregate all the data frames into one.
I used a for loop to run those R scripts:
for(i in sample){
source(i)
}
This does create all the data frames I need. But is there a good way to include a function that binds those data frames together within that for loop?
Assuming source(i) returns a data frame, you can combine all the data frames together with something like:
do.call(rbind, lapply(sample, source))

merging list of multiple data frames

I'm trying to merge all the data frames in my current environment into one data frame, initially I tried
Reduce(function(x,y) merge(x,y,by="Date"),list(ls()))
But this didn't work, just returning the a list of data frame names.
I know it will work if I do
Reduce(function(x,y) merge(x,y,by="Date"),list(df1,df2,df3....))
But why doesn't the initial attempt work?
Both
typeof(list(ls()))
typeof(list(df1,df2,df3))
Return type "list"
What can I do if there are so many data frames I can't input them all into the Reduce function?
Try this:
lst = lapply(Filter(function(x) class(get(x))=='data.frame', ls(env=globalenv())), get)
Reduce(function(x,y) merge(x,y,by="Date"),lst)

Resources