Calling a list index within a loop using names() - r

I have a list aa which references the index names of another list bb as well as containing one other element (call it cm). List bb items contain strings. I have a loop that goes through bb and, for every item which matches a string I've specified, adds it to a new row in a dataframe. What I need is to also add the cm value to that row.
Example:
library("tidyverse")
aa <- list(c(123, 1), c(234, 1), c(345, 2), c(456, 3))
bb <- list("123" = c("a", "b", "c"), "234" = c("b", "c", "d"), "345" = c("c", "d", "e"), "456" = c("f", "g", "h"))
cc <- c("a", "b", "c")
tbl <- NULL
for (a in aa){
for (b in bb) {
if (any(cc %in% b)) {
tb <- tibble(cm=a[2],n1=b[1],n2=b[2],n3=b[3])
tbl <- bind_rows(tbl,tb)
}
}
}
This iterates through for every possible combination of bb, and adds it to pairs it to every cm, which is no good. My output should look something like this:
output <- tibble(cm = c(1, 1, 2), n1 = c("a", "b", "c"),
n2 = c("b", "c", "d"), n3 = c("c", "d", "e"))
> output
# A tibble: 3 x 4
cm n1 n2 n3
<dbl> <chr> <chr> <chr>
1 1 a b c
2 1 b c d
3 2 c d e
I thought maybe something like this would work, as at least then I could loop through tbl later and use nm to replace it with the appropriate cm values:
tbl <- NULL
for (a in aa){
for (b in bb) {
if (any(cc %in% b)) {
tb <- tibble(nm = names(bb)[b], n1=b[1],n2=b[2],n3=b[3])
tbl <- bind_rows(tbl,tb)
}
}
}
I don't really understand why this doesn't work, because names(bb)[1] returns 123 so I figured it would work the same in a loop with names(bb)[b].

If you're happy with a base R solution without the explicit loops, would this work?
# generate data
aa <- list(c(123, 1), c(234, 1), c(345, 2), c(456, 3))
# cm is an element of bb
bb <- list("123" = c("a", "b", "c"), "234" = c("b", "c", "d"),
"345" = c("c", "d", "e"), "456" = c("f", "g", "h"),
cm = c(1, 1, 2))
cc <- c("a", "b", "c")
tbl <- data.frame(
bb[["cm"]],
# apply to each element of aa
do.call(rbind, lapply(aa, function(x, y, c) { # function takes 3 args
# only elements of bb whose names are in aa[[x]]
names_y <- as.character(intersect(x, names(y)))
# turn subset of bb into data.frame
out <- as.data.frame(do.call(rbind, y[names_y]))
# subset rows for which any row element %in% cc
out <- out[apply(out, 1, function(x, c) any(x %in% c), c)]
return(out)
}, bb, cc))) # pass bb and cc as args to the function in lapply()
names(tbl) <- c("cm", paste0("n", 1:(ncol(tbl) - 1)))
gives
> tbl
cm n1 n2 n3
123 1 a b c
234 1 b c d
345 2 c d e

Related

How to join same named sublist elements and add an ID variable

Suppose we have a list like this:
l <- list()
l[[1]] <- list()
l[[2]] <- list()
l[[3]] <- list()
names(l) <- c("A", "B", "C")
l[[1]][[1]] <- data.frame(6)
l[[1]][[2]] <- data.frame(3)
l[[1]][[3]] <- data.frame(8)
l[[1]][[4]] <- data.frame(7)
l[[2]][[1]] <- data.frame(5)
l[[2]][[2]] <- data.frame(4)
l[[2]][[3]] <- data.frame(7)
l[[2]][[4]] <- data.frame(9)
l[[3]][[1]] <- data.frame(1)
l[[3]][[2]] <- data.frame(6)
l[[3]][[3]] <- data.frame(2)
l[[3]][[4]] <- data.frame(8)
names(l[[1]]) <- c("aa", "bb", "cc", "dd")
names(l[[2]]) <- c("aa", "bb", "cc", "dd")
names(l[[3]]) <- c("aa", "bb", "cc", "dd")
I want to create a list l2 which contains 4 elements: aa, bb, cc and dd. Each of these elements would be the dataframe which would contain the values of aa, bb, cc and dd from list l and also an ID variable which would indicate if the element came from the A, B or C element of list l. So if we built the end result from scratch, it would look like this:
l2 <- list()
l2[[1]] <- data.frame(Value = c(6, 5, 1), ID = c("A", "B", "C"))
l2[[2]] <- data.frame(Value = c(3, 4, 6), ID = c("A", "B", "C"))
l2[[3]] <- data.frame(Value = c(8, 7, 2), ID = c("A", "B", "C"))
l2[[4]] <- data.frame(Value = c(7, 9, 8), ID = c("A", "B", "C"))
names(l2) <- c("aa", "bb", "cc", "dd")
However, I cannot build it from scratch, but instead I must "reshape" l to l2. What is the best way to do this? Preferred solution is in purrr.
The key is transpose(). You could set .id = "ID" in the inner map_dfr() to create a new column ID recording the names of sub-lists where each value comes when row-binding each element together.
library(purrr)
l %>%
transpose() %>%
map(~ map_dfr(.x, set_names, "Value", .id = "ID"))
Output
$aa
ID Value
1 A 6
2 B 5
3 C 1
$bb
ID Value
1 A 3
2 B 4
3 C 6
$cc
ID Value
1 A 8
2 B 7
3 C 2
$dd
ID Value
1 A 7
2 B 9
3 C 8

R script to generate all combinatorics of two identical lists including incomplete lists

I think this problem can be solved in many different ways, but I basically want to find a function that will give me a dataframe with every combination of values from a list into its columns, including the incomplete sets and excluding some, but not all, redundant combinations (order isn't important for now).
So I might start out with a list like this:
List = c("A","B","C")
and I want to get a dataframe that looks like
C1 = c("A","B","C","A","A","B","A")
C2 = c("","","","B","C","C","B")
C3 = c("","","","","","","C")
df <- cbind(C1, C2, C3)
row.names(df) <- c("A", "B", "C", "AB", "AC", "BC", "ABC")
colnames(df) <- c("First_Item", "Second_Item","Third_Item")
And then it fills in each cell with the corresponding letter.
e.g. position A1 in the df would be "A", positions A2 and A3 would be empty.
any idea how to do this?
I tried with dplyr:
library(tidyr)
list_1 = c("A", "B", "C", "NA")
list_2 = c("A", "B", "C", "NA")
list_3 = c("A", "B", "C", "NA")
list_4 = c("A", "B", "C", "NA")
test <- crossing(list_1, list_2,list_3,list_4)
test <- test[apply(test, MARGIN = 1, FUN = function(x) !(duplicated(x) | !any = "NA")),]
But I want to keep all the values with multiple NAs in them, so this doesn't quite work.
expand.grid has the same problem
expand.grid(list_1 = c("A", "B", "C", "NA"),list_2 = c("A", "B", "C", "NA"),list_3 = c("A", "B", "C", "NA"),list_4 = c("A", "B", "C", "NA"))
That's basically Roland's answer:
library(magrittr) # just for the pipe-operator
List %>%
seq_along() %>%
lapply(combn, x = List, simplify = FALSE) %>%
unlist(recursive = FALSE) %>%
sapply(`length<-`, length(List)) %>%
t() %>%
data.frame()
returns
X1 X2 X3
1 A <NA> <NA>
2 B <NA> <NA>
3 C <NA> <NA>
4 A B <NA>
5 A C <NA>
6 B C <NA>
7 A B C
Further more you could use the dplyr and tidyr packages to replace NAs. Just add one more function into the pipe:
mutate(across(everything(), replace_na, ""))
Here is my approach:
library(purrr)
List <- c("xA","xB","xC") # arbitrary as per request in comments
seq_along(List) %>% # h/t #MartinGal
map(~ combn(List, m = .x) %>%
apply(2, paste, collapse = "<!>")) %>%
unlist() %>%
tibble::tibble() %>%
tidyr::separate(1, into = c("First_Item", "Second_Item", "Third_Item"),
sep = "<!>")
Returns:
# A tibble: 7 x 3
First_Item Second_Item Third_Item
<chr> <chr> <chr>
1 xA NA NA
2 xB NA NA
3 xC NA NA
4 xA xB NA
5 xA xC NA
6 xB xC NA
7 xA xB xC

Filter rows which has at least two of particular values

I have a data frame like this.
df
Languages Order Machine Company
[1] W,X,Y,Z,H,I D D B
[2] W,X B A G
[3] W,I E B A
[4] H,I B C B
[5] W G G C
I want to get the number of rows where languages has 2 out of 3 values among W,H,I.
The result should be: 3 because row 1, row 3 and row 4 contains at least 2 values out of the3 values among W,H,I
You can use strsplit on df$Languages and take the intersect with W,H,I. Then get the lengths of this result and use which to get those which have more than 1 >1.
sum(lengths(sapply(strsplit(df$Languages, ",", TRUE), intersect, c("W","H","I"))) > 1)
#[1] 3
You can use :
sum(sapply(strsplit(df$Languages, ','), function(x)
sum(c("W","H","I") %in% x) >= 2))
#[1] 3
data
df<- structure(list(Languages = c("W,X,Y,Z,H,I", "W,X", "W,I", "H,I",
"W"), Order = c("D", "B", "E", "B", "G"), Machine = c("D", "A",
"B", "C", "G"), Company = c("B", "G", "A", "B", "C")),
class = "data.frame", row.names = c(NA, -5L))
a tidyverse approach
df %>% filter(map_int(str_split(Languages, ','), ~ sum(.x %in% c('W', 'H', 'I'))) >= 2)
Languages Order Machine Company
1 W,X,Y,Z,H,I D D B
2 W,I E B A
3 H,I B C B

Find first match to column 1 in columns 2:4 - R

Comparing "x1", "x2", an "x3" to "target", how do I return the first index of the column that matches "target"? An NA can result for no match.
pop <- c("A", "B", "C", "D")
target <- pop
x1 <- sample(pop)
x2 <- sample(pop)
x3 <- sample(pop)
df <- data.frame(target,x1,x2,x3)
> df
target x1 x2 x3
1 A B B D
2 B D C C
3 C C A A
4 D A D B
I have tried using something along the lines of:
min(which(df[3, 1] == df[3, 2:ncol(df)]))
...(row 3 being used as an example), but I don't know how to gracefully handle cases where there is no match, which is probably why I am having trouble using this in a function with apply(). The goal is either a new column on df or a vector of the returned values.
Thanks!
Here's a solution using match -
> df
target x1 x2 x3
1 A C A C
2 B A B B
3 C D D D
4 D B C A
apply(df, 1, function(x) match(TRUE, x[-1] == x[1]))
[1] 2 2 NA NA
Data -
df <- structure(list(target = c("A", "B", "C", "D"), x1 = c("C", "A",
"D", "B"), x2 = c("A", "B", "D", "C"), x3 = c("C", "B", "D",
"A")), .Names = c("target", "x1", "x2", "x3"), row.names = c(NA,
-4L), class = "data.frame")
There are many ways to do this. Loop through the columns 2:4, compare with the target and get the index of first match with which
sapply(df[-1], function(x) which(x == df$target)[1])
x1 x2 x3
#1 3 NA
If it is for comparing the rows
m1 <- df$target == df[-1]
max.col(m1, 'first') * NA^!rowSums(m1)
Or
apply(m1, 1, function(x) which(x)[1])
data
df <- data.frame(target,x1,x2,x3, stringsAsFactors = FALSE)

How to assign a value to a data.frame filtered by dplyr?

I am trying to modify a data.frame filtered by dplyr but I don't quite seem to grasp what I need to do. In the following example, I am trying to filter the data frame z and then assign a new value to the third column -- I give two examples, one with "9" and one with "NA".
require(dplyr)
z <- data.frame(w = c("a", "a", "a", "b", "c"), x = 1:5, y = c("a", "b", "c", "d", "e"))
z %>% filter(w == "a" & x == 2) %>% select(y)
z %>% filter(w == "a" & x == 2) %>% select(y) <- 9 # Should be similar to z[z$w == "a" & z$ x == 2, 3] <- 9
z %>% filter(w == "a" & x == 3) %>% select(y) <- NA # Should be similar to z[z$w == "a" & z$ x == 3, 3] <- NA
Yet, it doesn't work: I get the following error message:
"Error in z %>% filter(w == "a" & x == 3) %>% select(y) <- NA : impossible de trouver la fonction "%>%<-"
I know that I can use the old data.frame notation, but what would be the solution for dplyr?
Thanks!
Filtering will subset the data frame. If you want to keep the whole data frame, but modify part of it, you can, for example use mutate with ifelse. I've added stringsAsFactors=FALSE to your sample data so that y will be a character column.
z <- data.frame(w = c("a", "a", "a", "b", "c"), x = 1:5, y = c("a", "b", "c", "d", "e"),
stringsAsFactors=FALSE)
z %>% mutate(y = ifelse(w=="a" & x==2, 9, y))
w x y
1 a 1 a
2 a 2 9
3 a 3 c
4 b 4 d
5 c 5 e
Or with replace:
z %>% mutate(y = replace(y, w=="a" & x==2, 9),
y = replace(y, w=="a" & x==3, NA))
w x y
1 a 1 a
2 a 2 9
3 a 3 <NA>
4 b 4 d
5 c 5 e
It is my impression that the dplyr package is philosophically opposed to modifying your underlying data. You might find the data.table package friendlier for this operation:
library(data.table)
z <- data.table(w = c("a", "a", "a", "b", "c"), x = 1:5, y = c("a", "b", "c", "d", "e"))
m <- data.table(w = c("a","a"), x = c(2,3), new_y = c("9", NA))
z[m, y := new_y, on=c("w","x")]
w x y
1: a 1 a
2: a 2 9
3: a 3 NA
4: b 4 d
5: c 5 e
I'm sure there's a way in base R as well, but I don't know it. In particular, I can't get merge or match to do the job.

Resources