Joining nested lists and adding column - r

I have a list containing two matrices:
a <- list("m1"=matrix(1:9, nrow = 3, ncol = 3),
"m2"=matrix(1:9, nrow = 3, ncol = 3))
I want to bind the two matrices (row-bind), and to distinguish the rows, I want to add a column that contains the name of the matrix. I can bind the rows using r bind:
b <- do.call(rbind, a) %>% as.data.frame
which yields
V1 V2 V3
1 1 4 7
2 2 5 8
3 3 6 9
4 1 4 7
5 2 5 8
6 3 6 9
But how do I add a column containing the names? I can do b$id <- c("m1","m1","m1","m2","m2","m2"), but there must be an easier way than this (?)

Here's how to do it in dplyr / purrr
a %>% purrr::map(as.data.frame) %>% dplyr::bind_rows(.id = "origin")
origin V1 V2 V3
1 m1 1 4 7
2 m1 2 5 8
3 m1 3 6 9
4 m2 1 4 7
5 m2 2 5 8
6 m2 3 6 9
That converts the matrices to data-frames before row-binding them.
You can use bind_rows on a list of matrices. But it doesn't return what you expect.
a %>% bind_rows(.id = "origin")
# A tibble: 9 x 3
origin m1 m2
<chr> <int> <int>
1 1 1 1
2 1 2 2
3 1 3 3
4 1 4 4
5 1 5 5
6 1 6 6
7 1 7 7
8 1 8 8
9 1 9 9
This happens because m1 and m2 are vectors (because they are matrices) of the same length, and bind_rows sees a list of constant-length vectors as a single data-frame. So the latter call is equivalent to
bind_rows(data.frame(m1 = as.vector(m1), m2 = as.vector(m2)), .id = "origin")
So, make sure you convert your matrices to data.frames before you bind them together.

You can do:
b <- do.call(rbind.data.frame, a)
# V1 V2 V3
#m1.1 1 4 7
#m1.2 2 5 8
#m1.3 3 6 9
#m2.1 1 4 7
#m2.2 2 5 8
#m2.3 3 6 9
or if you not happy with this,
b <- do.call(rbind.data.frame, a)
b$id <- sub("[.].+", "", rownames(b))
# V1 V2 V3 id
#m1.1 1 4 7 m1
#m1.2 2 5 8 m1
#m1.3 3 6 9 m1
#m2.1 1 4 7 m2
#m2.2 2 5 8 m2
#m2.3 3 6 9 m2

Related

join columns recursively in R

Hello I have a data frame of 245 columns but to add some sets and generate new columns try to do it recursively as follows
cl1<-sample(1:4,10,replace=TRUE)
cl2<-sample(1:4,10,replace=TRUE)
cl3<-sample(1:4,10,replace=TRUE)
cl4<-sample(1:4,10,replace=TRUE)
cl5<-sample(1:4,10,replace=TRUE)
cl6<-sample(1:4,10,replace=TRUE)
dat<-data.frame(cl1,cl2,cl3,cl4,cl5,cl6)
my intention is to add column 1 with column 3 and 5, likewise column 2 with 4 and 6 and in the end obtain a dataframe with two columns
and you should pay me something like that
I have programmed the following code
revisar<- function(a){
todos = list()
i=1
j=3
l=5
k=1
while(i<=2 ){
cl<-a[,i]
cl2<-a[,j]
cl3<-a[,l]
cl[is.na(cl)] <- 0
cl2[is.na(cl2)] <- 0
cl3[is.na(cl3)] <- 0
colu<-cl+cl2+cl3
col<-cbind(colu,colu)
i<-i+1
j<-j+1
l<-l+1
k<-k+1
}
return(col)
}
it turns out that it only returns column 2 repeated twice and I must replicate the same thing to join those 245 columns.7
I would like to know what is failing the example
base R
Literal programming:
with(dat, data.frame(s1 = cl1+cl3+cl5, s2 = cl2+cl4+cl6))
# s1 s2
# 1 7 11
# 2 7 7
# 3 4 11
# 4 4 10
# 5 9 8
# 6 12 5
# 7 7 6
# 8 7 10
# 9 4 9
# 10 6 5
Programmatically,
L <- list(s1 = c(1,3,5), s2 = c(2,4,6))
out <- data.frame(lapply(L, function(z) do.call(rowSums, list(as.matrix(dat[,z])))))
out
# s1 s2
# 1 7 11
# 2 7 7
# 3 4 11
# 4 4 10
# 5 9 8
# 6 12 5
# 7 7 6
# 8 7 10
# 9 4 9
# 10 6 5
dplyr
library(dplyr)
dat %>%
transmute(
s1 = rowSums(cbind(cl1, cl3, cl5)),
s2 = rowSums(cbind(cl2, cl4, cl6))
)
or programmatically using purrr:
purrr::map_dfc(L, ~ rowSums(dat[, .]))
Data
set.seed(42)
# your `dat` above
Here is an alternative general approach:
Here we sum all uneven columns -> s1 and
all even columns -> s2:
library(dplyr)
dat %>%
rowwise() %>%
mutate(s1 = sum(c_across(seq(1,ncol(dat),2)), na.rm = TRUE),
s2 = sum(c_across(seq(2,ncol(dat),2)), na.rm = TRUE))
cl1 cl2 cl3 cl4 cl5 cl6 s1 s2
<int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 2 3 2 7 5
2 2 4 1 4 2 3 5 11
3 2 2 2 2 1 3 5 7
4 2 4 4 3 1 4 7 11
5 2 4 4 3 2 2 8 9
6 3 3 3 2 2 2 8 7
7 2 1 1 2 1 4 4 7
8 2 4 1 3 2 3 5 10
9 3 1 1 2 3 4 7 7
10 2 4 1 3 4 4 7 11

Add column to each data frame within list with function rowSums and range of columns

SO. The following might serve as a small example of the real list.
a <- data.frame(
x = c("A","A","A","A","A"),
y = c(1,2,3,4,5),
z = c(1,2,3,4,5))
b <- data.frame(
x = c("A","A","A","A","A"),
y = c(1,2,3,4,5),
z = c(1,2,3,4,5))
c <- data.frame(
x = c("A","A","A","A","A"),
y = c(1,2,3,4,5),
z = c(1,2,3,4,5))
l <- list(a,b,c)
From the second column to last column - on every data frame - i want to add the sums as a new column to each data frame.
I tried:
lapply(l, function(x) rowSums(x[2:ncol(x)]))
which returns the correct sums, but doesn't add them to the data frames.
I also tried:
lapply(l, transform, sum = y + z)
which gives me the correct results but is not flexible enough, because i don't always know how many columns there are for each data frame and what names they have. The only thing i know, is, that i have to start from second column to end. I tried to combine these two approaches but i can't figure out, how to do it exactly.
Thanks
Try this. You can play around index in columns and exclude the first variable so that there is not issues about how many additional variables you have in order to obtain the rowsums. Here the code:
#Compute rowsums
l1 <- lapply(l,function(x) {x$RowSum<-rowSums(x[,-1],na.rm=T);return(x)})
Output:
l1
[[1]]
x y z RowSum
1 A 1 1 2
2 A 2 2 4
3 A 3 3 6
4 A 4 4 8
5 A 5 5 10
[[2]]
x y z RowSum
1 A 1 1 2
2 A 2 2 4
3 A 3 3 6
4 A 4 4 8
5 A 5 5 10
[[3]]
x y z RowSum
1 A 1 1 2
2 A 2 2 4
3 A 3 3 6
4 A 4 4 8
5 A 5 5 10
Here's how to combine your attempts. I used data[-1] instead of data[2:ncol(data)] because it seems simpler, but either should work.
lapply(l, function(data) transform(data, sum = rowSums(data[-1])))
Unfortunately, transform will be confused if the name of the argument to your anonymous function is the same as a column name - data[-1] needs to look at the data frame, not a particular column. (I originally use function(x) instead of function(data), and this caused an error because there is a column named x. From this perspective, Duck's answer is a little safer.)
Does this work:
> add_col <- function(df){
+ df[(ncol(df)+1)] = rowSums(df[2:ncol(df)])
+ df
+ }
> lapply(l, add_col)
[[1]]
x y z V4
1 A 1 1 2
2 A 2 2 4
3 A 3 3 6
4 A 4 4 8
5 A 5 5 10
[[2]]
x y z V4
1 A 1 1 2
2 A 2 2 4
3 A 3 3 6
4 A 4 4 8
5 A 5 5 10
[[3]]
x y z V4
1 A 1 1 2
2 A 2 2 4
3 A 3 3 6
4 A 4 4 8
5 A 5 5 10
>
With sum as column name:
> add_col <- function(df){
+ df['sum'] = rowSums(df[2:ncol(df)])
+ df
+ }
> lapply(l, add_col)
[[1]]
x y z sum
1 A 1 1 2
2 A 2 2 4
3 A 3 3 6
4 A 4 4 8
5 A 5 5 10
[[2]]
x y z sum
1 A 1 1 2
2 A 2 2 4
3 A 3 3 6
4 A 4 4 8
5 A 5 5 10
[[3]]
x y z sum
1 A 1 1 2
2 A 2 2 4
3 A 3 3 6
4 A 4 4 8
5 A 5 5 10
use tidyverse
library(tidyverse)
map(l, ~.x %>% mutate(Sum := apply(.x[-1], 1, sum)))
#> [[1]]
#> x y z Sum
#> 1 A 1 1 2
#> 2 A 2 2 4
#> 3 A 3 3 6
#> 4 A 4 4 8
#> 5 A 5 5 10
#>
#> [[2]]
#> x y z Sum
#> 1 A 1 1 2
#> 2 A 2 2 4
#> 3 A 3 3 6
#> 4 A 4 4 8
#> 5 A 5 5 10
#>
#> [[3]]
#> x y z Sum
#> 1 A 1 1 2
#> 2 A 2 2 4
#> 3 A 3 3 6
#> 4 A 4 4 8
#> 5 A 5 5 10
Created on 2020-09-30 by the reprex package (v0.3.0)
We can use map with mutate
library(purrr)
library(dplyr)
map(l, ~ .x %>%
mutate(sum = rowSums(select(., -1))))
Or with c_across
map(l, ~ .x %>%
rowwise() %>%
mutate(sum = sum(c_across(-1), na.rm = TRUE)) %>%
ungroup)

sorting a list of data frame on a condition

I have a list of data frames containing different number of columns.
Say Y is a list of 3 data frames containing 4,10 and 5 columns respectively
I want to sort these data frames in a list based on a condition that which column will be sorted first and so on. for that i have another list:
i1 = list(c(0),c(4,5,2,3),c(3))
i2 = c(0,4,1)
in first data frame i don't want to sort anything and for second and third data frame i want to follow the order given in i1 and i2
i have tried writing this function which works for 1 data frame but not working for a list
for (i in 1:length(i1){
if (i2[i] < 1) {
sorted[[i]]=y[[i]]
} else {
for(j in i1[[i]]){
sorted[[i]] <- y[[i]][order(y[[i]][j],]
}}}
We can do this with Map
Map(function(x,y, z) if(z < 1) x else x[do.call(order, x[y]),], Y, i1, i2)
#[[1]]
# V1 V2 V3 V4
#1 3 10 7 10
#2 3 3 4 2
#3 8 8 7 1
#4 6 9 7 6
#5 7 3 4 2
#[[2]]
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#1 1 7 4 3 5 1 5 10 5 4
#3 8 6 4 7 3 4 5 3 3 10
#4 2 7 2 7 3 3 8 2 2 8
#2 6 1 3 8 4 4 9 5 3 10
#5 3 1 10 10 1 4 6 2 8 5
#[[3]]
# V1 V2 V3 V4 V5
#2 3 6 2 3 8
#4 10 1 3 4 2
#5 7 8 4 9 5
#1 2 4 6 4 4
#3 1 9 6 9 10
data
set.seed(24)
Y <- list(as.data.frame(matrix(sample(1:10, 4*5, replace=TRUE), 5, 4)),
as.data.frame(matrix(sample(1:10, 10*5, replace=TRUE), 5, 10)),
as.data.frame(matrix(sample(1:10, 5*5, replace=TRUE), 5, 5)))

Loop function for comparing the columns

I have a very large data set including 400 string and numeric variables. I want to compare each two consequiative columns 3&4, 5&6, etc. I am going to compare the third variable (.x) with fourth (.y) , fifth with sixth one, seventh one with eightth one and so on in the following way: if (.y) is NA then we replace the NA with the value of corresponding row from (.x) . For example if number .y is NA we replace NA with the corresponding value from number .x which would be 5. Again, if day.y is NA we replace NA in day.y with the corresponding value from day.x which would be 3. How can I write a loope function to do that?
A<-c(1,2,3,4,5,6,7,NA,NA,5,5,6)
B<-c(3,4,5,6,1,2,7,6,7,NA,NA,6)
number.x<-c(1,2,3,4,5,6,7,NA,NA,5,5,6)
number.y<-c(3,4,5,6,1,2,7,6,7,NA,NA,6)
day.x<-c(1,3,4,5,6,7,8,1,NA,3,5,3)
day.y<-c(4,5,6,7,8,7,8,1,2,3,5,NA)
school.x<-c("a","b","b","c","n","f","h","NA","F","G","z","h")
school.y<-c("a","b","b","c","m","g","h","NA","NA","G","H","T")
city.x<- c(1,2,3,7,5,8,7,5,6,7,5,1)
city.y<- c(1,2,3,5,5,7,7,NA,NA,3,4,5)
df<-data.frame(A,B,number.x,number.y,day.x,day.y,school.x,school.y,city.x,city.y)
This is a hacked approach to your question and it requires that every two columns are going to be compared against one another.
library(dplyr)
start_group <- seq(1, length(df), by = 2)
df2 <- data.frame(id = 1:nrow(df))
for(i in start_group){
i <- i
j <- i + 1
dnames <- df[, c(i, j)] %>%
names
df_ <- data.frame(col1 = df[, i],
col2 = df[, j]) %>%
mutate(col1 = ifelse(is.na(col1), col2 %>% paste, col1 %>% paste)) %>%
mutate(col2 = ifelse(is.na(col2), col1 %>% paste, col2 %>% paste))
names(df_) <- dnames
df2 <- cbind(df2, df_)
}
df2[, -1]
number.x number.y day.x day.y school.x school.y city.x city.y
1 1 3 1 4 a a 1 1
2 2 4 3 5 b b 2 2
3 3 5 4 6 b b 3 3
4 4 6 5 7 c c 7 5
5 5 1 6 8 n m 5 5
6 6 2 7 7 f g 8 7
7 7 7 8 8 h h 7 7
8 6 6 1 1 NA NA 5 5
9 7 7 2 2 F F 6 6
10 5 5 3 3 G G 7 3
11 5 5 5 5 z H 5 4
12 6 6 3 3 h T 1 5
Consider the following base R solution. Essentially, it loops through a distinct list of column stem names (number, day, school, class) and replaces NA values in .x columns with corresponding NA values in .y columns and vice versa. NOTE: Schools column require conversion from factor to character and one of its rows has NA in both .x and .y columns
# CONVERT TO CHARACTER (NOTE: NA VALUE BECOME "NA" STRINGS)
df[,c('school.x', 'school.y')] <-
sapply(df[,c('school.x', 'school.y')], as.character)
# SET UP FINAL DF
finaldf <- df
# OBTAIN UNIQUE LIST OF COLUMNS STEM (W/O x AND y SUFFIXES)
distinctcols <- unique(gsub("[.][x]|[.][y]", "", names(df)[49:ncol(df)]))
# LOOP THROUGH COLUMN STEM REPLACING NA VALUES
for (col in distinctcols) {
# REPLACE NA .x COLUMN VALUES
finaldf[is.na(finaldf[paste0(col,'.x')])|finaldf[paste0(col,'.x')]=="NA",
paste0(col,'.x')] <-
finaldf[is.na(finaldf[paste0(col,'.x')])|finaldf[paste0(col,'.x')]=="NA",
paste0(col,'.y')]
# REPLACE NA .y COLUMN VALUES
finaldf[is.na(finaldf[paste0(col,'.y')])|finaldf[paste0(col,'.y')]=="NA",
paste0(col,'.y')] <-
finaldf[is.na(finaldf[paste0(col,'.y')])|finaldf[paste0(col,'.y')]=="NA",
paste0(col,'.x')]
}
OUTPUT
number.x number.y day.x day.y school.x school.y city.x city.y
1 1 3 1 4 a a 1 1
2 2 4 3 5 b b 2 2
3 3 5 4 6 b b 3 3
4 4 6 5 7 c c 7 5
5 5 1 6 8 n m 5 5
6 6 2 7 7 f g 8 7
7 7 7 8 8 h h 7 7
8 6 6 1 1 NA NA 5 5
9 7 7 2 2 F F 6 6
10 5 5 3 3 G G 7 3
11 5 5 5 5 z H 5 4
12 6 6 3 3 h T 1 5

Generate combination of data frame and vector

I know expand.grid is to create all combinations of given vectors. But is there a way to generate all combinations of a data frame and a vector by taking each row in the data frame as unique. For instance,
df <- data.frame(a = 1:3, b = 5:7)
c <- 9:10
how to create a new data frame that is the combination of df and c without expanding df:
df.c:
a b c
1 5 9
2 6 9
3 7 9
1 5 10
2 6 10
3 7 10
Thanks!
As for me the simplest way is merge(df, as.data.frame(c))
a b c
1 1 5 9
2 2 6 9
3 3 7 9
4 1 5 10
5 2 6 10
6 3 7 10
This may not scale when your dataframe has more than two columns per row, but you can just use expand.grid on the first column and then merge the second column in.
df <- data.frame(a = 1:3, b = 5:7)
c <- 9:10
combined <- expand.grid(a=df$a, c=c)
combined <- merge(combined, df)
> combined[order(combined$c), ]
a c b
1 1 9 5
3 2 9 6
5 3 9 7
2 1 10 5
4 2 10 6
6 3 10 7
You could also do something like this
do.call(rbind,lapply(9:10, function(x,d) data.frame(d, c=x), d=df)))
# or using rbindlist as a fast alternative to do.call(rbind,list)
library(data.table)
rbindlist(lapply(9:10, function(x,d) data.frame(d, c=x), d=df)))
or
rbindlist(Map(data.frame, c = 9:10, MoreArgs = list(a= 1:3,b=5:7)))
This question is really old but I found one more answer.
Use tidyr's expand_grid().
expand_grid(df, c)
# A tibble: 6 × 3
a b c
<int> <int> <int>
1 1 5 9
2 1 5 10
3 2 6 9
4 2 6 10
5 3 7 9
6 3 7 10

Resources