I have a number of R scripts that create data frames of the same length and I am trying to aggregate all the data frames into one.
I used a for loop to run those R scripts:
for(i in sample){
source(i)
}
This does create all the data frames I need. But is there a good way to include a function that binds those data frames together within that for loop?
Assuming source(i) returns a data frame, you can combine all the data frames together with something like:
do.call(rbind, lapply(sample, source))
Related
This is the code I am currently using to move data from multiple data frames into a time-ordered vector which I then perform analysis on and graph:
TotalLoans <- c(
sum(as.numeric(HCD2001$loans_all)), sum(as.numeric(HCD2002$loans_all)),
sum(as.numeric(HCD2003$loans_all)), sum(as.numeric(HCD2004$loans_all)),
sum(as.numeric(HCD2005$loans_all)), sum(as.numeric(HCD2006$loans_all)),
sum(as.numeric(HCD2007$loans_all)), sum(as.numeric(HCD2008$loans_all)),
sum(as.numeric(HCD2009$loans_all)), sum(as.numeric(HCD2010$loans_all)),
sum(as.numeric(HCD2011$loans_all)), sum(as.numeric(HCD2012$loans_all)),
sum(as.numeric(HCD2013$loans_all)), sum(as.numeric(HCD2014$loans_all)),
sum(as.numeric(HCD2015$loans_all)), sum(as.numeric(HCD2016$loans_all))
)
I do this four more times with similar data frames that also are similarly formatted as:
Varname$year
Is there a way to loop through these 16 data frames, select an individual column, perform a function on it, and put it into a vector? This is what I have tried so far:
AllList <- list(HCD2001, HCD2002, HCD2003, HCD2004, HCD2005, HCD2006, HCD2007, HCD2008, HCD2009, HCD2010, HCD2011, HCD2012, HCD2013, HCD2014, HCD2015, HCD2016)
TotalLoans <- lapply(AllList,
function(df){
sum(as.numeric(df$loans_all))
return(df)
}
)
However, it returns a Large List with every column from the data frames. All the other posts related to this were for modifying data frames, not creating a new vector with modified values of the data frames.
I usually read a bunch of .csv files into a list of data frames and name it manually doing.
#...code for creating the list named "datos" with files from library
# Naming the columns of the data frames
names(datos$v1r1)<-c("estado","tiempo","x1","x2","y1","y2")
names(datos$v1r2)<-c(...)
names(datos$v1r3)<-c(...)
I want to do this renaming operation automatically. To do so, I created a data frame with the names I want for each of the data frames in my datos list.
Here is how I generate this data frame:
pru<-rbind(c("UT","TR","UT+","TR+"),
c("UT","TR","UT+","TR+"),
c("TR","UT","TR+","UT+"),
c("TR","UT","TR+","UT+"))
vec<-paste("v1r",seq(1,20,1),sep="")
tor<-paste("v1s",seq(1,20,1),sep="")
nombres<-do.call("rbind", replicate(10, pru, simplify = FALSE))
nombres_df<-data.frame(corrida=c(vec,tor),nombres)
Because nombres_df$corrida[1] is v1r1, I have to name the datos$v1r1 columns ("estado","tiempo", nombres_df[1,2:5]), and so on for the other 40 elements.
I want to do this renaming automatically. I was thinking I could use something that uses regular expressions.
Just for the record, I don't know why but the order of the list of data frames is not the same as the 1:20 sequence (by this I mean 10 comes before 2,3,4...)
Here's a toy example of a list with a similar structure but fewer and shorter data frames.
toy<-list(a=replicate(6,1:5),b=replicate(6,10:14))
You have a data frame where variable corridas is the name of the data frame to be renamed and the remaining columns are the desired variable names for that data frame. You could use a loop to do all the renaming operations:
for (i in seq_len(nrow(nombres_df))) {
names(datos[[nombres_df$corridas[i]]]) <- c("estado","tiempo",nombres_df[i,2:length(nombres_df)])
}
I'm trying to merge all the data frames in my current environment into one data frame, initially I tried
Reduce(function(x,y) merge(x,y,by="Date"),list(ls()))
But this didn't work, just returning the a list of data frame names.
I know it will work if I do
Reduce(function(x,y) merge(x,y,by="Date"),list(df1,df2,df3....))
But why doesn't the initial attempt work?
Both
typeof(list(ls()))
typeof(list(df1,df2,df3))
Return type "list"
What can I do if there are so many data frames I can't input them all into the Reduce function?
Try this:
lst = lapply(Filter(function(x) class(get(x))=='data.frame', ls(env=globalenv())), get)
Reduce(function(x,y) merge(x,y,by="Date"),lst)
I have several data frames stored in R memory among several other objects.
Their particularity is that they are all named as "Station_Year.df".
I want to merge all these data frames into one.
I tried:
df_list <- ls(pattern=".df")
dataset <- rbind(df_list)
But I get a data frame with the names of the data frames...
You should use mget to get the data of each dataframe of the df_list. So you can do:
dataset <- do.call(rbind, mget(df_list))
Note that this implies that all the rows are of the same length. Probably you find useful also the merge function.
Thanks alexis_laz, I forgot the do.call.
I'm trying to use various data frames within a single for loop, ie:
#after loading the 5 data frames
for(i in 1:5){
dframe <- dataframe[i]
print(sprintf("This is data frame %s", dframe)
}
However this only passes the variable name and not the data frame itself. Thanks.
To obtain the data use the get function.
dframe <- get(dataframe[i])