How to combine multiple dataframes with serial numbers in R [duplicate] - r

This question already has answers here:
R - use rbind on multiple variables with similar names
(2 answers)
Closed 2 years ago.
I have a series of numbered datasets generated from a program, like data1, data2, ..., data100. They have identical column names and I can use rbind(data1, data2, ...,data100) to combine them into one dataset. Is there a more efficient way to do it without a loop?
I created a vector of names by data_names<-paste('data',1:100,sep='') but the result is a vector of strings. The rbind(data_names) command didn't work. Is there a simple and elegant way to combine them?

You can use mget to get all data frames in a list, and then use do.call and rbind.
# Create 3 data frames in the work space as an example
set.seed(1)
data1 <- data.frame(a = runif(2), b = runif(2))
data2 <- data.frame(a = runif(2), b = runif(2))
data3 <- data.frame(a = runif(2), b = runif(2))
# Create the names of the data frame
data_names <- paste0("data", 1:3)
# Get the data frames baed on data_names as a list
data_list <- mget(data_names)
# Combine all data frames using do.call and rbind
data_all <- do.call("rbind", data_list)

Related

How to aggregate multiple data.frames from a list in R? [duplicate]

This question already has answers here:
Combine a list of data frames into one data frame by row
(10 answers)
Closed 5 years ago.
I am sorry if this question has been answered already. Also, this is my first time on stackoverflow.
I have a beginner R question concerning lists , data frames and merge() and/or rbind().
I started with a Panel that looks like this
COUNTRY YEAR VAR
A 1
A 2
B 1
B 2
For efficiency purposes, I created a list that consists of one data frame for each country and performed a variety of calculations on each individual data.frame. However, I cannot seem to combine the individual data frames into one large frame again.
rbind() and merge() both tell me that only replacement of elements is allowed.
Could someone tell me what I am doing wrong/ and how to actually recombine the data frames?
Thank you
Maybe you want to do something like:
do.call("rbind", my.df.list)
dplyr lets you use bind_rows function for that:
library(dplyr)
foo <- list(df1 = data.frame(x=c('a', 'b', 'c'),y = c(1,2,3)),
df2 = data.frame(x=c('d', 'e', 'f'),y = c(4,5,6)))
bind_rows(foo)
Note that the basic solution
do.call("rbind", my.df.list)
will be slow if we have many dataframes. A scalable solution is:
library(data.table)
rbindlist(my.df.list)
which, from the docs, is the same as do.call("rbind", l) on data.frames, but much faster.
plyr is probably best. Another useful approach if the data frames can be different is to use reshape:
library(reshape)
data <- merge_recurse(listofdataframes)
Look at my answer to this related question on merging data frames.
There might be a better way to do this, but this seems to work and it's straightforward. (My code has four lines so that it's easier to see the steps; these four could easily be combined.)
# first re-create your data frame:
A = matrix( ceiling(10*runif(8)), nrow=4)
colnames(A) = c("country", "year_var")
dfa = data.frame(A)
# now re-create the list you made from the individual rows of the data frame:
df1 = dfa[1,]
df2 = dfa[2,]
df3 = dfa[3,]
df4 = dfa[4,]
df_all = list(df1, df2, df3, df4)
# to recreate your original data frame:
x = unlist(df_all) # from your list create a single 1D array
A = matrix(x, nrow=4) # dimension that array in accord w/ your original data frame
colnames(A) = c("country", "year_var") # put the column names back on
dfa = data.frame(A) # from the matrix, create your original data frame

Running a function that renames dataframes per intermediate step, for a list of dataframes

I have gotten instructions to do an analysis in R with the vegan package (concerning DCA's).
The instructions on a single dataframe are pretty straightforward, but I would like to apply the analysis on a set of dataframes.
I know this can be done with a for-loop or lapply or sapply, but I have trouble dealing with the fact that each step of the analysis a new extension is added to the name of the dataframe.
An example below
Say I have a dataframe DF, then it goes as follows:
DF.t1 <- decostand(DF, "total")
DF.t2 <- decostand(DF.t1, "max")
DF.t2.dca <- decorana(DF.t2)
DF.t2.dca.DW <- decorana(DF.t2, iweigh=1)
names(DF.t2.dca)
summary(DF.t2.dca)
DF.t2.dca.taxonscores <- scores(DF.t2.dca, display=c("species"), choices=c(1,2))
DF.t2.dca.taxonscores <- DF.t2.dca$cproj[ ,1:2]
DF.t2.dca.samplescores <- scores(DF.t2.dca, display=c("sites"), choices=1)
What I want to achieve is to run several dataframes through this analysis without writing it all out separately.
Let's say I have a set of dataframes called "DF_1", "DF_2" & "DF_3" which I want to do this analysis on.
I probably need to put the dataframes in a list, and get all the steps in a for-loop or one of the apply methods.
But how do I approach the problem with the extensions added (.ra, .t1, .t2, .t2.dca, .t2.dca.DW etc.) to the dataframe names?
Edit: I need to retain the original dataframes after the analysis, in order to do follow-up analysis on them.
Unless you have a very limited amount of data frames, I would not advise to define ca. 8 new objects for each data frame in the global environment as this can become very messy.
One approach you might consider is creating a nested list where the first level is the data frame and the second level are the modified data frames.
# some example data sets
DF1 <- mtcars
DF2 <- mtcars*2
DF3 <- mtcars*3
all_dfs <- list(DF1 = DF1, DF2 = DF2, DF3 =DF3)
some_stuff <- function(df) {
DF.t1 <- decostand(df, "total")
DF.t2 <- decostand(DF.t1, "max")
DF.t2.dca <- decorana(DF.t2)
DF.t2.dca.DW <- decorana(DF.t2, iweigh=1)
names(DF.t2.dca)
summary(DF.t2.dca)
DF.t2.dca.taxonscores <- scores(DF.t2.dca, display=c("species"), choices=c(1,2))
DF.t2.dca.taxonscores <- DF.t2.dca$cproj[ ,1:2]
DF.t2.dca.samplescores <- scores(DF.t2.dca, display=c("sites"), choices=1)
return(list(DF.t1 = DF.t1, DF.t2 = DF.t2,
DF.t2.dca = DF.t2.dca,
DF.t2.dca.DW = DF.t2.dca.DW,
DF.t2.dca.taxonscores = DF.t2.dca.taxonscores,
DF.t2.dca.taxonscores = DF.t2.dca.taxonscores
))
}
nested_list <- lapply(all_dfs, some_stuff)
# To obtain any of the objects for a specific data.frame you could, for example, run
nested_list$DF1$DF.t2.dca.DW

Creating Subset data frames in R within For loop [duplicate]

This question already has answers here:
Split a large dataframe into a list of data frames based on common value in column
(3 answers)
Closed 4 years ago.
What I am trying to do is filter a larger data frame into 78 unique data frames based on the value of the first column in the larger data frame. The only way I can think of doing it properly is by applying the filter() function inside a for() loop:
for (i in 1:nrow(plantline))
{x1 = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}
The issue is I don't know how to create a new data frame, say x2, x3, x4... every time the loop runs.
Can someone tell me if that is possible or if I should be trying to do this some other way?
There must be many duplicates for this question
split(plantline, plantline$Plant_Line)
will create a list of data.frames.
However, depending on your use case, splitting the large data.frame into pieces might not be necessary as grouping can be used.
You could use split -
# creates a list of dataframes into 78 unique data frames based on
# the value of the first column in the larger data frame
lst = split(large_data_frame, large_data_frame$first_column)
# takes the dataframes out of the list into the global environment
# although it is not suggested since it is difficult to work with 78
# dataframes
list2env(lst, envir = .GlobalEnv)
The names of the dataframes will be the same as the value of the variables in the first column.
It would be easier if we could see the dataframes....
I propose something nevertheless. You can create a list of dataframes:
dataframes <- vector("list", nrow(plantline))
for (i in 1:nrow(plantline)){
dataframes[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])
}
You can use assign :
for (i in 1:nrow(plantline))
{assign(paste0(x,i), filter(rawdta.df, Plant_Line == plantline$Plant_Line[i]))}
alternatively you can save your results in a list :
X <- list()
for (i in 1:nrow(plantline))
{X[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}
Would be easier with sample data. by would be my favorite.
d <- data.frame(plantline = rep(LETTERS[1:3], 4),
x = 1:12,
stringsAsFactors = F)
l <- by(d, d$plantline, data.frame)
print(l$A)
print(l$B)
Solution using plyr:
ma <- cbind(x = 1:10, y = (-4:5)^2, z = 1:2)
ma <- as.data.frame(ma)
library(plyr)
dlply(ma, "z") # you split ma by the column named z

cbind dataframes in loop [duplicate]

This question already has answers here:
Using cbind on an arbitrarily long list of objects
(4 answers)
Closed 4 years ago.
I have n number of dataframes named "s.dfx" where x=1:n. All the dataframes have 7 columns with different names. Now I want to cbind all the dataframes.
I know the comand
t<-cbind.data.frame(s.df1,s,df2,...,s.dfn)
But I want to optimize and cbind them in a loop, since n is a large number.
I have tried
for(t2 in 1:n){
t<-cbind.data.drame(s.df[t2])
}
But I get this error "Error in [.data.frame(s.df, t2) : undefined columns selected"
Can anyone help?
I don't think that a for-loop would be any faster than do.call(cbind, dfs), but it wasn't clear to me that you actually had such a list yet. I thought you might need to build such list from a character object. This answer assumes you don't have a list yet but that you do have all your dataframes numbered in an ascending sequence that ends in n where the decimal representation might have multiple digits.
t <- do.call( cbind, mget( paste0("s.dfs", 1:n) ) )
Pasqui uses ls inside mget and a pattern to capture all the numbered dataframes. I would have used a slightly different one, since you suggested that the number was higher than 9 which is all that his pattern would capture:
ls(pattern = "^s\\.df[0-9]+") # any number of digits
# ^ need double escapes to make '.' a literal period or fixed=TRUE
library(purrr) #to be redundant
#generating dummy data frames
df1 <- data.frame(x = c(1,2), y = letters[1:2])
df2 <- data.frame(x = c(10,20), y = letters[c(10, 20)])
df3 <- data.frame(x = c(100, 200), y = letters[c(11, 22)])
#' DEMO [to be adapted]: capturing the EXAMPLE data frames in a list
dfs <- mget(ls(pattern = "^df[1-3]"))
#A Tidyverse (purrr) Solution
t <- purrr::reduce(.x = dfs, .f = bind_cols)
#Base R
do.call(cbind,dfs)
# or
Reduce(cbind,dfs)

How to cbind many data frames with a loop?

I have 105 data frames with xts, zoo class and II want to combine their 6th columns into a data frame.
So, I created a data frame that contains all the data frame names to use it with a 'for' function:
mydata <- AAL
for (i in 2:105) {
k <- top100[i,1] # The first column contains all the data frame names
mydata <- cbind(mydata, k)
}
It's obviously wrong, but I have no idea either how to cbind so many data frames with completely different names (my data frame names are NASDAQ Symbols) nor how to pick the 6th column of all.
Thank you in advance
Try foreach package. May be there is more elegant way to do this task, but this approach will work.
library(foreach)
#create simple data frames with columns named 'A' and 'B'
df1<-t(data.frame(1,2,3))
df2<-t(data.frame(4,5,6))
colnames(df1)<-c('A')
colnames(df2)<-c('B')
#make a list
dfs<-list(df1,df2)
#join data frames column by column, this will preserve their names
foreach(x=1:2
,.combine=cbind)%do% # don`t forget this directive
{
dfs[[x]]
}
The result will be:
A B
X1 1 4
X2 2 5
X3 3 6
To pick column number 6:
df[,6]
First, you should store all of your data.frames in a list. You can then use a combination of lapply and do.call to extract and recombine the sixth columns of each of the data.frames:
# Create sample data
df_list <- lapply(1:105, function(x) {
as.data.frame(matrix(sample(1:1000, 100), ncol = 10))
})
# Extract the sixth column from each data.frame
extracted_cols <- lapply(df_list, function(x) x[6])
# Combine all of the columns together into a new data.frame
result <- do.call("cbind", extracted_cols)
One way to get all of your preexisting data.frames into a list would be to use lapply along with get:
df_list <- lapply(top100[[1]], get)

Resources