Find indices of vector elements in a another vector - r

This extends a previous question I asked.
I have 2 vectors:
a <- c("a","b","c","d","e","d,e","f")
b <- c("a","b","c","d,e","f")
I created b from a by eliminating elements of a that are contained in other, comma separated, elements in a (e.g., "d" and "e" in a are contained in "d,e" and therefore only "d,e" is represented in b).
I am looking for an efficient way to map between indices of the elements of a and b.
Specifically, I would like to have a list of the length of b where each element is a vector with the indices of the elements in a that map to that b element.
For this example the output should be:
list(1, 2, 3, c(4,5,6), 7)

Modifying slightly from my answer at your previous question, try:
a <- c("a","b","c","d","e","d,e","f")
b <- c("a","b","c","d,e","f")
B <- setNames(lapply(b, gsub, pattern = ",", replacement = "|"), seq_along(b))
lapply(B, function(x) which(grepl(x, a)))
# $`1`
# [1] 1
#
# $`2`
# [1] 2
#
# $`3`
# [1] 3
#
# $`4`
# [1] 4 5 6
#
# $`5`
# [1] 7

Related

How to name list after character vector and add atomic vectors to list

I have a four atomic vectors that I want to use to create a named list of atomic vectors where the names of the list are based on the first two, and the content of these list elements are the last two vectors repeated for each of the two first vectors.
# Naming vectors
v1 <- c("a", "b", "c")
v2 <- c("g", "h", "i")
# Content vectors
a <- 1:5
b <- LETTERS[1:5]
# I want to add each of the vectors a and b as repeated list elements
# in list l and name the list element after vectors v1 and v2, respectively.
l <- list()
# Step 1: Adding vector 'a' to list
l <- lapply(v1, function(x) {
append(l, a) %>%
unlist()
})
# Step 2: Adding vector 'b' to list
l <- lapply(v2, function(x) {
append(l, b) %>%
unlist()
})
# Step 3: Naming vector
names(l) <- c(v1, v2)
l
# The desired output (not working)
$a
1 2 3 4 5
$b
1 2 3 4 5
$c
1 2 3 4 5
$g
'A' 'B' 'C' 'D' 'E'
$h
'A' 'B' 'C' 'D' 'E'
$i
'A' 'B' 'C' 'D' 'E'
The last sentence should have added names to the list, but it didn't work. Instead I get the error message:
Error in names(l) <- c(v1, v2): 'names' attribute [6] must be the same length as the vector [3]
Troubleshooting
Commenting out step 2 and removing 'v2' from step 3, I get the desired output:
# Step 1: Adding vector 'a' to list
> l <- lapply(v1, function(x) {
append(l, a) %>%
unlist()
})
> names(l) <- v1
> l
$a
1 2 3 4 5
$b
1 2 3 4 5
$c
1 2 3 4 5
Commenting out step 3 and adding step 2 returns
# Step 1: Adding vector 'a' to list
> l <- lapply(v1, function(x) {
append(l, a) %>%
unlist()
})
# Step 2: Adding vector 'b' to list
> l <- lapply(v2, function(x) {
append(l, b) %>%
unlist()
})
> l
1. '1''2''3''4''5''1''2''3''4''5''1''2''3''4''5''A''B''C''D''E'
2. '1''2''3''4''5''1''2''3''4''5''1''2''3''4''5''A''B''C''D''E'
3. '1''2''3''4''5''1''2''3''4''5''1''2''3''4''5''A''B''C''D''E'
For some reason, adding step 2 will append the output into the list elements created in step 1 instead of creating new ones, and the list elements in step 1 are repeated 3 times.
How can I append the second step's list elements names appropriately?
Is this what you are looking for?
# Naming vectors
v1 <- c("a", "b", "c")
v2 <- c("g", "h", "i")
# Content vectors
a <- 1:5
b <- LETTERS[1:5]
# Create lists of length v1/v2 and put in contents
v1_list <- lapply(v1, function(x) a)
v2_list <- lapply(v2, function(x) b)
# Combine the lists
l <- c(v1_list, v2_list)
#Rename
names(l) <- c(v1, v2)
l
#> $a
#> [1] 1 2 3 4 5
#>
#> $b
#> [1] 1 2 3 4 5
#>
#> $c
#> [1] 1 2 3 4 5
#>
#> $g
#> [1] "A" "B" "C" "D" "E"
#>
#> $h
#> [1] "A" "B" "C" "D" "E"
#>
#> $i
#> [1] "A" "B" "C" "D" "E"
Created on 2022-01-05 by the reprex package (v2.0.1)

How to generate a named list containing vectors with repeating integers from a named list with character vectors in R?

Given is the list below. This list contains character vectors of variable length.
l1 <- list("a" = c("x1", "x2", "x3"),
"b" = c("x4", "x5"),
"c" = c("x6", "x7", "x8", "x9"))
> l1
$a
[1] "x1" "x2" "x3"
$b
[1] "x4" "x5"
$c
[1] "x6" "x7" "x8" "x9"
The desired output, let's call it l2, is the following:
$a
[1] 1 1 1
$b
[1] 2 2
$c
[1] 3 3 3 3
This output has the following characteristics:
l2 is a named list in which the names of the original list l1 are preserved.
The length of list l2 is the same as list l1.
The order of list elements in l1 is preserved in l2.
l2 contains vectors with repeating integers. The length of each vector in l2 is the same as the corresponding character vector in l1.
Part of solution
I found this post in which the statement below helped me to construct a partial solution.
The usual work-around is to pass it the names or indices of the vector instead of the vector itself.
l2 <- lapply(X = seq_along(l1),
FUN = function(x) rep(x, times = length(l1[[x]])))
l2
[[1]]
[1] 1 1 1
[[2]]
[1] 2 2
[[3]]
[1] 3 3 3 3
All criteria are met, except that the names are not preserved in l2.
How can I fix this in one go (not using a seperate statement after the lapply statement)?
After you run your above code,, just add the code below:-
names(l2) <- names(l1)
This will assign the names of l1 to l2, and hence, you will have the same names.
Edit: You can't achieve this with lapply, but you can do it with sapply by doing the following the following:-
l2 <- sapply(X = names(l1),
FUN = function(x) rep(which(names(l1) == x), times = length(l1[[x]])))
l2
$a
[1] 1 1 1
$b
[1] 2 2
$c
[1] 3 3 3 3
Turns out, if X argument of sapply is character vector, it will return the list by using X as names of the returned list.
You can try the following base R option, using lengths + rep + relist like below
> relist(rep(seq_along(l1), lengths(l1)), l1)
$a
[1] 1 1 1
$b
[1] 2 2
$c
[1] 3 3 3 3
You can use [] to preserve the names the list.
l1[] <- lapply(seq_along(l1), function(x) rep(x, times = length(l1[[x]])))
l1
#$a
#[1] 1 1 1
#$b
#[1] 2 2
#$c
#[1] 3 3 3 3
Another solution with Map.
l1[] <- Map(rep, seq_along(l1), lengths(l1))
In case you want to have another objects l2 keeping l1 as it is, create a copy of l1 in l2 by doing l2 <- l1.

How to extract the mimimum values from the list of list and give them a name in r

I have a list of a list with high complicated data. I would like to compare the values of each list and extract the smallest values. For simplicity, I provide a similar example.
s <- c(1,2,3)
ss <- c(4,5,6)
S <- list(s,ss)
h <- c(4,8,7)
hh <- c(0,3,4)
H <- list(h,hh)
HH <- list(S,H)
I would like to compare the element of each list with the element of the corresponding list and extract the smallest values. For example, the following are the values of HH list.
> HH
[[1]]
[[1]][[1]]
[1] 1 2 3
[[1]][[2]]
[1] 4 5 6
[[2]]
[[2]][[1]]
[1] 4 8 7
[[2]][[2]]
[1] 0 3 4
Now, I would like to compare
[[1]]
[[1]][[1]]
[1] 1 2 3
with
[[2]]
[[2]][[1]]
[1] 4 8 7
For example, 1 < 4, so I will select 1. For the second element, 2 < 8, so I will select 2. So, I would like to compare the elements of [[1]][[1]] with the elements of [[2]][[1]], and [[1]][[2]] with [[2]][[2]].
Then, I would like to print the name of the list. For example,
I expected to have similar to the following:
1 < 4, the first element of the first model is selected.
We could use a general solution (i.e. if there are many list elements) transpose from purrr to rearrange the list elements, and then use max.col to get the index
library(magrittr)
library(purrr)
HH %>%
transpose %>%
map(~ .x %>%
invoke(cbind, .) %>%
multiply_by(-1) %>%
max.col )
#[[1]]
#[1] 1 1 1
#[[2]]
#[1] 2 2 2
Or using base R
do.call(Map, c(f = function(...) max.col(-1 * cbind(...)), HH))
#[[1]]
#[1] 1 1 1
#[[2]]
#[1] 2 2 2
Maybe you can try this -
Map(function(x, y) as.integer(x > y) + 1, HH[[1]], HH[[2]])
#[[1]]
#[1] 1 1 1
#[[2]]
#[1] 2 2 2
This gives the position of the element selected.

is there a way I can recycle elements of the shorter list in purrr:: map2 or purrr::walk2?

purrr does not seem to support recycling of elements of a vector in case there is a shortage of elements in one of the two (while using purrr::map2 or purrr::walk2). Unlike baseR where we just get a warning if the larger vector is not a multiple of the shorter one.
Consider this toy example:
This works:
map2(1:3,4:6,sum)
#
#[[1]]
#[1] 5
#[[2]]
#[1] 7
#[[3]]
#[1] 9
And this doesn't work:
map2(1:3,4:9,sum)
Error: .x (3) and .y (6) are different lengths
I understand very well why this is not allowed - as it can make catching bugs very difficult. But is there any way in purrr I can force this to happen? Perhaps using some base R trick with purrr?
You can put both lists in a data frame and let that command repeat your vectors:
input <- data.frame(a = 1:3, b = 4:9)
purrr::map2(input$a, input$b, sum)
It's by design with purrr but you can use Map :
Map(sum,1:3,4:9)
# [[1]]
# [1] 5
#
# [[2]]
# [1] 7
#
# [[3]]
# [1] 9
#
# [[4]]
# [1] 8
#
# [[5]]
# [1] 10
#
# [[6]]
# [1] 12
And here's how I would recycle if I had to :
x <- 1:3
y <- 4:9
l <- max(length(y), length(x))
map2(rep(x,len = l), rep(y,len = l),sum)
# [[1]]
# [1] 5
#
# [[2]]
# [1] 7
#
# [[3]]
# [1] 9
#
# [[4]]
# [1] 8
#
# [[5]]
# [1] 10
#
# [[6]]
# [1] 12

Count the occurence of specific combinations of characters in a list

My question is very simple..but I cant manage to work it out...
I have run a variable selection method in R on 2000 genes using 1000 iterations and in each iteration I got a combination of genes. I would like to count the number of times each combination of genes occurs in R.
For example I have
# for iteration 1
genes[1] "a" "b" "c"
# for iteration 2
genes[2] "a" "b"
# for iteration 3
genes[3] "a" "c"
# for iteration 4
genes [4] "a" "b"
and this would give me
"a" "b" "c" 1
"a" "b" 2
"a" "c" 1
I have unlisted the list and got the number each gene comes but I am interested in is the combination. I tried to create a table but I have unequal length for each gene vector. Thanks in advance.
The way I could immediately think of is to paste them and then use table as follows:
genes_p <- sapply(my_genes, paste, collapse=";")
freq <- as.data.frame(table(genes_p))
# Var1 Freq
# 1 a;b 2
# 2 a;b;c 1
# 3 c 1
The above solution assumes that the genes are sorted by names and the same gene id doesn't occur more than once within an element of the list. If you want to account for both, then:
# sort genes before pasting
genes_p <- sapply(my_genes, function(x) paste(sort(x), collapse=";"))
# sort + unique
genes_p <- sapply(my_genes, function(x) paste(sort(unique(x)), collapse=";"))
Edit: Following OP's question in comment, the idea is to get all combinations of 2'ers (so to say), wherever possible and then take the table. First I'll break down the code and write them separate for understanding. Then I'll group them together to get a one-liner.
# you first want all possible combinations of length 2 here
# that is, if vector is:
v <- c("a", "b", "c")
combn(v, 2)
# [,1] [,2] [,3]
# [1,] "a" "a" "b"
# [2,] "b" "c" "c"
This gives all the combinations taken 2 at a time. Now, you can just paste it similarly. combn also allows function argument.
combn(v, 2, function(y) paste(y, collapse=";"))
# [1] "a;b" "a;c" "b;c"
So, for each set of genes in your list, you can do the same by wrapping this around a sapply as follows:
sapply(my_genes, function(x) combn(x, min(length(x), 2), function(y)
paste(y, collapse=";")))
The min(length(x), 2) is required because some of your gene list can be just 1 gene.
# [[1]]
# [1] "a;b" "a;c" "b;c"
# [[2]]
# [1] "a;b"
# [[3]]
# [1] "c"
# [[4]]
# [1] "a;b"
Now, you can unlist this to get a vector and then use table to get frequency:
table(unlist(sapply(l, function(x) combn(x, min(length(x), 2), function(y)
paste(y, collapse=";")))))
# a;b a;c b;c c
# 3 1 1 1
You can wrap this in turn with as.data.frame(.) to get a data.frame:
as.data.frame(table(unlist(sapply(l, function(x) combn(x, min(length(x), 2),
function(y) paste(y, collapse=";"))))))
# Var1 Freq
# 1 a;b 3
# 2 a;c 1
# 3 b;c 1
# 4 c 1

Resources