Rbind data frames with names in a list [duplicate] - r

I have an issue that I thought easy to solve, but I did not manage to find a solution.
I have a large number of data frames that I want to bind by rows. To avoid listing the names of all data frames, I used "paste0" to quickly create a vector of names of the data frames. The problem is that I do not manage to make the rbind function identify the data frames from this vector of name.
More explicitely:
df1 <- data.frame(x1 = sample(1:5,5), x2 = sample(1:5,5))
df2 <- data.frame(x1 = sample(1:5,5), x2 = sample(1:5,5))
idvec <- noquote(c(paste0("df",c(1,2))))
> [1] df1 df2
What I would like to get:
dftot <- rbind(df1,df2)
x1 x2
1 4 1
2 5 2
3 1 3
4 3 4
5 2 5
6 5 3
7 1 4
8 2 2
9 3 5
10 4 1
dftot <- rbind(idvec)
> [,1] [,2]
> idvec "df1" "df2"

If there are multiple objects in the global environment with the pattern df followed by digits, one option is using ls to find all those objects with the pattern argument. Wrapping it with mget gets the values in the list, which we can rbind with do.call.
v1 <- ls(pattern='^df\\d+')
`row.names<-`(do.call(rbind,mget(v1)), NULL)
If we know the objects, another option is paste to create a vector of object names and then do as before.
v1 <- paste0('df', 1:2)
`row.names<-`(do.call(rbind,mget(v1)), NULL)

This should give the result:
dfcount <- 2
dftot <- df1 #initialise
for(n in 2:dfcount){dftot <- rbind(dftot, eval(as.name(paste0("df", as.character(n)))))}
eval(as.name(variable_name)) reads the data frames from strings matching their names.

Related

How to combine a list of data.frames when all columns are unique?

I'm seeking to combine 10 separate dataframes together within a list of data frames I created from a standard for loop procedure. However every column name in each dataframe is unique. I don't seek to bind any columns into other columns. I simply want to place all columns next to each other. So rbind didn't work for me.
> do.call(rbind, data)
Error in match.names(clabs, names(xi)) :
names do not match previous names
Any help would be appreciated thank you.
Maybe you can try the code below if you would like to use rbind
do.call(rbind,Map(as.matrix,data))
Example
df1 <- data.frame(a = 1:2, b = 1:2)
df2 <- data.frame(c = 1:3, d = 1:3)
data <- list(df1,df2)
such that
> do.call(rbind,Map(as.matrix,data))
a b
[1,] 1 1
[2,] 2 2
[3,] 1 1
[4,] 2 2
[5,] 3 3

How to longitudinally concatenate/append columns of a data frame in r

There are many examples of how to concatenate columns element by element, but I can't find an example where columns are concatenated sequentially. I can write an example with a loop:
tst <- cbind.data.frame(c(1,2,3),c(4,5,6))
names(tst) <- c("A","B")
A B
1 1 4
2 2 5
3 3 6
vec <- c()
for (i in names(tst)){
vec <- c(vec,tst[,i])
}
vec
[1] 1 2 3 4 5 6
In other words, I want to create a vector with all the columns of the data frame appended one after the other.
The solution above works, but my question is: is there a way to do this without a loop?
Here, we can use unlist to convert to a vector
vec1 <- unlist(tst, use.names = FALSE)
identical(vec, vec1)
#[1] TRUE

How do I split a list containing data frames into the individual data frames?

I have 6 lists (l1,l2,l3,l4,l5,l6) in total and in each list, I have 12 dataframes (df1,df2,df3,...,df10,df11,df12). I would like to split all the lists. This is what I have tried.
split_df<-function(list){
for (i in 1:length(list)){
assign(paste0("df",i),list[[i]])}
}
It only works if I use the for loop only. But it doesnt work with the function.
Let's look at the following list, l1:
l1<-list(data.frame(matrix(1:10,nrow=2)),data.frame(matrix(1:4,nrow=2)))
split_df(l1)
df1
Error: object 'df1' not found
df2
Error: object 'df2' not found
But without the function:
for (i in 1:length(l1)){
assign(paste0("df",i),l1[[i]])}
df1
# X1 X2 X3 X4 X5
# 1 1 3 5 7 9
# 2 2 4 6 8 10
df2
# X1 X2
# 1 1 3
# 2 2 4
How do I rectify this?
You use assign locally. So inside the function, you create the data.frames df1 and df2. You can assign these to the global environment instead:
split_df<-function(list){
for (i in 1:length(list)){
assign(paste0("df",i), list[[i]], envir = .GlobalEnv)
}
}
You can do
l1<-list(data.frame(matrix(1:10,nrow=2)),data.frame(matrix(1:4,nrow=2)))
names(l1) <- paste0("df", seq_along(l1))
list2env(l1, .GlobalEnv)

Looping over csv files

So, I created a list a of csv files:
tbl = list.files(pattern="*.csv")
Then I separated them into two different lists:
tbl1 <- tbl[c(1,3:7,10:12,14:18,20)]
tbl2 <- tbl[c(2,19,8:9,13)]
Then loaded them:
list_of_data1 = lapply(tbl1, read.csv)
list_of_data2 = lapply(tbl2, read.csv)
And now I want to create a master file. I just want to select some data from each of csv file and store it in one table. To do that I created such loop:
gdata1 = lapply(list_of_data1,function(x) x[3:nrow(x),10:13])
for( i in 1:length(list_of_data1)){
rownames(gdata1[[i]]) = list_of_data1[[i]][3:nrow(list_of_data1[[i]]),1]
}
tmp = lapply(gdata1,function(x) matrix(as.numeric(x),ncol=4))
final.table1=c()
for(i in 1:length(gnames)){
print(i)
tmp=gnames[i]
f1 = function(x) {x[tmp,]}
tmp2 = lapply(gdata1,f1)
tmp3 = c()
for(j in 1:length(tmp2)){
tmp3=rbind(tmp3,tmp2[[j]])
}
tmp4 = as.vector(t(tmp3))
final.table1 = rbind(final.table1,tmp4)
}
rownames(final.table1) = gnames
I created two different lists of data because in first one list_of_data1 there are four interesting columns for me (10:13) and in the other one list_of_data2 there are only 3 columns (10:12). I want to put all of the data in one table. Is there any way to do it in one loop ?
I have an idea how to solve that problem. I may create a new loop for list_of_data2and after that bind both of them using cbind. I want to do it in more elegant way so that's why I came here!
I would suggest looking into do.call , you can rbind your first list of tables and then rbind your second list of tables and then cbind as you stated. Below a trivial use of do.call
#creating a list of tables that we are interested in appending
#together in one master dataframe
ts<-lapply(c(1,2,3),function(x) data.frame(c1=rep(c("a","b"),2),c2=(1:4)*x,c3=rnorm(4)))
#you could of course subset ts to the set of columns
#you find of interest ts[,colsOfInterest]
master<-do.call(rbind,ts)
After seeing your complication of various row/columns of interest in each file, I think you could do something like this. Seems a bit hackerish but could get the job done. I assume you merge the files based on a column named id, you could of course generalize this to multiple columns etc
#creating a series of data frames for which we only want a subset of row/cols
> df1<-data.frame(id=1:10,val1=rnorm(10),val2=rnorm(10))
> df2<-data.frame(id=5:10,val3=rnorm(6))
> df3<-data.frame(id=1:3,val4=rnorm(3), val5=rnorm(3), val6=rnorm(3))
#specifying which rows/cols we are interested in
#i assume you have some way of doing this programmatically or you defined elsewhere
> colsofinterest<-list(df1=c("id","val1"),df2=c("id","val3"),df3=c("id","val5","val6"))
> rowsofinterest<-list(df1=1:5,df2=5:8,df3=2:3)
#create a list of data frames where each has only the row/cols combination we want
> ts<-lapply(c("df1","df2","df3"),
function(x) get(x)[rowsofinterest[[x]],colsofinterest[[x]]])
> ts
[[1]]
id val1
1 1 0.24083489
2 2 -0.50140019
3 3 -0.24509033
4 4 1.41865350
5 5 -0.08123618
[[2]]
id val3
5 9 -0.1862852
6 10 0.5117775
NA NA NA
NA.1 NA NA
[[3]]
id val5 val6
2 2 0.2056010 -0.6788145
3 3 0.2057397 0.8416528
#now merge these based on a key column "id", and we want to keep all.
> final<-Reduce(function(x,y) merge(x,y,by="id",all=T), ts)
> head(final)
id val1 val3 val5 val6
1 1 0.24083489 NA NA NA
2 2 -0.50140019 NA 0.2056010 -0.6788145
3 3 -0.24509033 NA 0.2057397 0.8416528
4 4 1.41865350 NA NA NA
5 5 -0.08123618 NA NA NA
6 9 NA -0.1862852 NA NA
Is this what you are thinking about or did I misinterpret?
not ldplyr() functions in the same way as do.call() in JPC's answer.... I just happen to use plyr more, if you are looking at manipulating r datastructures in a vectorised way then lots of useful stuff in there.
library(plyr)
d1 <- ldplyr(list_of_data1, rbind)
d2 <- ldplyr(list_of_data2, rbind)
select cols of d1 and d2
d1 <- d1[,c(10:13)]
d2 <- d2[,c(10:12)]
final.df <- cbind(d1,d2)

How to use a string to refer to a data frame in R?

I have 3 data frames called 'first', 'second' and 'third'
I also have a string vector with their names in it
frames <- c("first","second","third")
I would like to loop through the vector and refer to the data frames
for (i in frames) {
#set first value be 0 in each data frame
i[1,1] <- 0
}
This does not work, what am I missing?
This is really not the optimal way to do this but this is one way to make your specific example work.
first <- data.frame(x = 1:5)
second <- data.frame(x = 1:5)
third <- data.frame(x = 1:5)
frames <- c("first","second","third")
for (i in frames) {
df <- get(i)
df[1,1] <- 45
assign(as.character(i), df, envir= .GlobalEnv)
}
> first
x
1 45
2 2
3 3
4 4
5 5
> second
x
1 45
2 2
3 3
4 4
5 5
> third
x
1 45
2 2
3 3
4 4
5 5
As Justin mentioned, R way would be to use a list. So given that you only have the data frame names as strings, you can copy them in a list.
frames <- lapply(c("first", "second", "third"), get)
(frames <- lapply(frames, function(x) {x[1,1] <- 0; x}))
However, you are working on a copy of first, second and third within frames.

Resources