Subset a vector of lists in R - r

Let's say I have a vector of lists:
library(tidyverse)
d <- tribble(
~x,
c(10, 20, 64),
c(22, 11),
c(5, 9, 99),
c(55, 67),
c(76, 65)
)
How can I subset this vector such that, for example, I have have rows with lists having a length greater than 2? Here is my unsuccessful attempt using the tidyverse:
filter(d, length(x) > 2)
# A tibble: 5 x 1
x
<list>
1 <dbl [3]>
2 <dbl [2]>
3 <dbl [3]>
4 <dbl [2]>
5 <dbl [2]>

It would be lengths as the 'x' is a list
library(dplyr)
d %>%
filter(lengths(x) > 2)

You can use subset() + lengths()
subset(d,lengths(x)>2)

Related

dplyr mutate based on columns condition and an external vector

I am trying to add a list column to a tibble data frame. The resulting list column is calculated from two columns contained in the data frame and a vector which is external / independent.
Suppose that the data frame and the vector are the following:
library(dplyr)
library(magrittr)
dat <- tibble(A = c(12, 27, 22, 1, 15, 30, 20, 28, 19),
B = c(68, 46, 69, 7, 44, 76, 72, 50, 51))
vec <- c(12, 25, 28, 58, 98)
Now, I would like to add (mutate) the column y so that for each row y is a list containing the elements of vec between A and B (inclusive).
The not-so-proper way to do this would be via loop. I initialize the column y as list and update it row-wise based on the condition A <= vec & vec <= B:
dat %<>%
mutate(y = list(vec))
for (i in 1:nrow(dat)){
dat[i,]$y[[1]] <- (vec[dat[i,]$A <= vec & vec <= dat[i,]$B])
}
The result is a data frame with y being a list of dbl of variable length:
> dat
# A tibble: 9 x 3
A B y
<dbl> <dbl> <list>
1 12 68 <dbl [4]>
2 27 46 <dbl [1]>
3 22 69 <dbl [3]>
4 1 7 <dbl [0]>
5 15 44 <dbl [2]>
6 30 76 <dbl [1]>
7 20 72 <dbl [3]>
8 28 50 <dbl [1]>
9 19 51 <dbl [2]>
The first four values of y are:
[[1]]
[1] 12 25 28 58
[[2]]
[1] 28
[[3]]
[1] 25 28 58
[[4]]
numeric(0)
Note: the 4-th list is empty, because no value of vec is between A=1 and B=7.
I have tried as an intermediate step with getting the subscripts via which using mutate(y = list(which(A <= vec & vec <= B))) or with a combination of seq and %in%, for instance mutate(y = list(vec %in% seq(A, B))). These both give an error. However, I don't need the subscripts, I need a subset of vec.
Create a small helper function with the logic that you want to implement.
return_values_in_between <- function(vec, A, B) {
vec[A <= vec & vec <= B]
}
and call the function for each row (using rowwise) -
library(dplyr)
result <- dat %>%
rowwise() %>%
mutate(y = list(return_values_in_between(vec, A, B))) %>%
ungroup()
result
# A tibble: 9 × 3
# A B y
# <dbl> <dbl> <list>
#1 12 68 <dbl [4]>
#2 27 46 <dbl [1]>
#3 22 69 <dbl [3]>
#4 1 7 <dbl [0]>
#5 15 44 <dbl [2]>
#6 30 76 <dbl [1]>
#7 20 72 <dbl [3]>
#8 28 50 <dbl [1]>
#9 19 51 <dbl [2]>
Checking the first 4 values in result$y -
result$y
#[[1]]
#[1] 12 25 28 58
#[[2]]
#[1] 28
#[[3]]
#[1] 25 28 58
#[[4]]
#numeric(0)
#...
#...
With the help of #Ronak Shah, I was able to come up with a solution that doesn't require a dedicated function and also makes sure that the vec is pulled from the global environment (in case there might be a column vec in the data frame):
library(tidyverse)
dat |>
rowwise() |>
mutate(y = list(.GlobalEnv$vec[.GlobalEnv$vec >= A & .GlobalEnv$vec <= B])) |>
ungroup()

Concatenate/merge dataframes in R into vector type cells

I would like to merge two dataframe into one, each cell becoming a vector or a list.
Columns have the same name in both dataframes. Some columns are made of numerical values that I want to keep as numerical values in the merged dataframe. Some columns are made of characters.
For example I would like from these two dataframes:
DF1 <- data.frame(
xx = c(1:5),
yy = c(2:6),
zz = c("a","b","c","d","e"))
DF2 <- data.frame(
xx = c(3:7),
yy = c(5:9),
zz = c("a","i","h","g","f"))
Which look like this:
DF1
xx
yy
zz
1
2
a
2
3
b
3
4
c
4
5
d
5
6
e
DF2
xx
yy
zz
3
5
a
4
6
i
5
7
h
6
8
g
7
9
f
To get a dataframe looking like this:
xx
yy
zz
c(1,3)
c(2,5)
c(a,a)
c(2,4)
c(3,6)
c(b,i)
c(3,5)
c(4,7)
c(c,h)
c(4,6)
c(5,8)
c(d,g)
c(5,7)
c(6,9)
c(e,f)
I have tried with paste() or str_c() but it always transforms my numerical values into char and it does not create a list or a vector like I want.
Do you know of any functions that coule help me do that?
Using some tidyverse, you can invert the lists and then build it all back together.
library(purrr)
library(dplyr)
as_tibble(map2(DF1, DF2, ~ map(transpose(list(.x, .y)), unlist)))
This gets you your data frame of vectors.
# A tibble: 5 x 3
xx yy zz
<list> <list> <list>
1 <int [2]> <int [2]> <chr [2]>
2 <int [2]> <int [2]> <chr [2]>
3 <int [2]> <int [2]> <chr [2]>
4 <int [2]> <int [2]> <chr [2]>
5 <int [2]> <int [2]> <chr [2]>
Breaking this down...
transpose(list(.x, .y)) will flip a paired list of columns inside-out from a list of two vectors to a list of 5 elements (one for each row, each with two list elements in it).
map(transpose(list(.x, .y)), unlist)) will iterate over each of the 5 lists and unlist them back from a list of 2 to a vector of 2.
map2(DF1, DF2, ~ map(transpose(list(.x, .y)), unlist)) will iterate over each column pair from DF1 and DF2 (e.g., xx, yy, zz) doing steps 1 and 2.
as_tibble(map2(DF1, DF2, ~ map(transpose(list(.x, .y)), unlist))) converts the list to a tibble (basically a data.frame).
Another thing you can do is stack the data and then nest() it. You again need a few steps to do it. This would scale better because you could do this with more than 2 data frames.
library(dplyr)
library(tibble)
library(tidyr)
bind_rows(rowid_to_column(DF1),
rowid_to_column(DF2)) %>%
group_by(rowid) %>%
nest(nest_data = -rowid) %>%
unnest_wider(nest_data) %>%
ungroup() %>%
select(-rowid)
This also gets you your data frame of vectors.
# A tibble: 5 x 3
xx yy zz
<list> <list> <list>
1 <int [2]> <int [2]> <chr [2]>
2 <int [2]> <int [2]> <chr [2]>
3 <int [2]> <int [2]> <chr [2]>
4 <int [2]> <int [2]> <chr [2]>
5 <int [2]> <int [2]> <chr [2]>
This gives you matrices in a list:
res <- setNames(
lapply( colnames(DF1), function(x) cbind(DF1[[x]], DF2[[x]]) ),
colnames(DF1) )
To convert the result into a data frame you can use this:
data.frame( sapply(
names(res), function(x){ sapply(
1:nrow(res$xx), function(y){ list(res[[x]][y,1:ncol(res$xx)]) }
) }
) )
xx yy zz
1 1, 3 2, 5 a, a
2 2, 4 3, 6 b, i
3 3, 5 4, 7 c, h
4 4, 6 5, 8 d, g
5 5, 7 6, 9 e, f
Put together in a function:
EDIT: Added functionality to apply any number of DFs
(against what the question demands, but seemed to be necessary)
morph <- function(...){
abc <- list(...)
res <- sapply( colnames(abc[[1]]), function(col) list(
sapply( abc, function(dfr) dfr[[col]] ) ) )
data.frame( sapply(
names(res), function(x){ sapply(
1:nrow(res[[1]]), function(y){ list(res[[x]][y,1:ncol(res[[1]])]) }
) }
) )
}
morph(DF1, DF2, DF2)
xx yy zz
1 1, 3, 3 2, 5, 5 a, a, a
2 2, 4, 4 3, 6, 6 b, i, i
3 3, 5, 5 4, 7, 7 c, h, h
4 4, 6, 6 5, 8, 8 d, g, g
5 5, 7, 7 6, 9, 9 e, f, f
As your data consists of different types, There is no straight forward answer. However I produced some solution, that might do the trick by creating a nested list. Let me know, if this is what you need:
library(BBmisc)
library(dplyr)
colvec <- c("xx2","yy2","zz2")
colnames(DF2) <- colvec
DF <- bind_cols(DF1,DF2)
cols.num <- c("xx","xx2","yy","yy2")
DF[cols.num] <- sapply(DF[cols.num],as.character)
DF <- DF[,c(1,4,2,5,3,6)]
xx <- convertRowsToList(DF[,1:2])
yy <- convertRowsToList(DF[,3:4])
zz <- convertRowsToList(DF[,5:6])
final_list <- list(xx,yy,zz)
Try the following base R option
> data.frame(Map(function(x, y) asplit(cbind(x, y), 1), DF1, DF2))
xx yy zz
1 1, 3 2, 5 a, a
2 2, 4 3, 6 b, i
3 3, 5 4, 7 c, h
4 4, 6 5, 8 d, g
5 5, 7 6, 9 e, f

Unnesting a combination variable (combn) as a vector

With the following code, I manage to get a fine combination :
tibble(
x = list(c(1, 2, 3), c(4,5,6))
) %>%
mutate(
combination =
x %>%
map(
.f = combn
, 2
) %>%
map(.f = t)
) %>%
unnest(combination)
# A tibble: 6 x 2
x combination[,1] [,2]
<list> <dbl> <dbl>
1 <dbl [3]> 1 2
2 <dbl [3]> 1 3
3 <dbl [3]> 2 3
4 <dbl [3]> 4 5
5 <dbl [3]> 4 6
6 <dbl [3]> 5 6
Howerver, when observed with the View() function, I get :
How can I proceed to get combination displayed as a vector? i.e. :
We can specify the simplify = FALSE in combn to return a list instead of coercing to matrix
library(purrr)
library(dplyr)
library(tidyr)
tbl1 <- tibble(
x = list(c(1, 2, 3), c(4,5,6))
) %>%
mutate(
combination =
x %>%
map(
.f = combn
, 2, simplify = FALSE
))
Now, do the unnest
out <- tbl1 %>%
unnest(combination)
out
# A tibble: 6 x 2
# x combination
# <list> <list>
#1 <dbl [3]> <dbl [2]>
#2 <dbl [3]> <dbl [2]>
#3 <dbl [3]> <dbl [2]>
#4 <dbl [3]> <dbl [2]>
#5 <dbl [3]> <dbl [2]>
#6 <dbl [3]> <dbl [2]>
check the View
Here is a data.table option that might help
library(data.table)
library(tidyr)
unnest(setDT(df)[, combination := lapply(x, function(v) combn(v, 2, simplify = FALSE))], combination)

Remove duplicates from lists within a vector in R

I have a vector of lists like the following sample:
library(tidyverse)
z <- tribble(
~x,
c(10, 10, 64),
c(22, 22),
c(5, 9, 9),
c(55, 55),
c(76, 65)
)
I'm trying to reduce each list to include only cases with unique values. Here's the output I'm looking for:
y <- tribble(
~x,
c(10, 64),
c(22),
c(5, 9),
c(55),
c(76, 65)
)
Of course I can't post the actual output and have to write it out as a new data set for this example because it looks like this otherwise:
# A tibble: 5 x 1
x
<list>
1 <dbl [3]>
2 <dbl [2]>
3 <dbl [3]>
4 <dbl [2]>
5 <dbl [2]>
We can loop over the list with map and apply unique
library(dplyr)
library(purrr)
z %>%
mutate(x = map(x, unique))
In base R, it would be
z$x <- lapply(z$x, unique)

R: Join two tables (tibbles) by *list* columns

Seems like there should be a simple answer for this but I haven't been able to find one:
tib1 <- tibble(x = list(1, 2, 3), y = list(4, 5, 6))
tib1
# A tibble: 3 × 2
x y
<list> <list>
1 <dbl [1]> <dbl [1]>
2 <dbl [1]> <dbl [1]>
3 <dbl [1]> <dbl [1]>
tib2 <- tibble(x = list(1, 2, 4, 5), y = list(4, c(5, 10), 6, 7))
tib2
# A tibble: 4 × 2
x y
<list> <list>
1 <dbl [1]> <dbl [1]>
2 <dbl [1]> <dbl [2]>
3 <dbl [1]> <dbl [1]>
4 <dbl [1]> <dbl [1]>
dplyr::inner_join(tib1, tib2)
Joining, by = c("x", "y")
Error in inner_join_impl(x, y, by$x, by$y, suffix$x, suffix$y) :
Can't join on 'x' x 'x' because of incompatible types (list / list)
So is there a way to perform a join based on list columns (before I start writing my own)?
Basically if the list of both key variables is identical, I want the row to be included in the final table, and if not - not. In the above example there are two key variables x and y and the result should be only the first row in the two tibbles since it's the only identical one in both key variables:
tibble(x = list(1), y = list(4))
# A tibble: 1 × 2
x y
<list> <list>
1 <dbl [1]> <dbl [1]>
We could use hashes from digest:
tib1 <- tibble(x = list(1, 2, 3), y = list(4, 5, 6))
tib2 <- tibble(x = list(1, 2, 4, 5), y = list(4, c(5, 10), 6, 7))
tib1 <- mutate_all(tib1, funs(hash = map_chr(., digest::digest)))
tib2 <- mutate_all(tib2, funs(hash = map_chr(., digest::digest)))
inner_join(tib1, tib2, c('x_hash', 'y_hash')) %>%
select(x.x, x.y)
# A tibble: 1 × 2
x.x x.y
<list> <list>
1 <dbl [1]> <dbl [1]>

Resources