Use `[` in *apply - r

I'm sure I'm missing something obvious in base R itself. I want to subset a matrix by its columns based on indices stored in a list.
m <- matrix(1:50L, ncol = 5)
l <- lapply(0:3, `+`, 1:2)
> l
[[1]]
[1] 1 2
[[2]]
[1] 2 3
[[3]]
[1] 3 4
[[4]]
[1] 4 5
I want a list of matrices as below -
list(m[, l[[1]]], m[, l[[2]]], m[, l[[3]]], m[, l[[4]]])
I tried lapply(l, '[', m), but that obviously didn't work because '[' works on m like a vector. How can make it work on m like a matrix and specify the index 2? I also tried apply(X = do.call(rbind, l), MARGIN = 1, FUN = '[', m) but that didn't work either.
I know I can define a function f as below; but that feels hacky.
f <- function(ind){m[, ind]}
lapply(l, f)
Is there any clever way I can avoid defining f or am I being unrealistic in not working with f?

Related

concatenation sublists of two different lists [duplicate]

I have two lists
first = list(a = 1, b = 2, c = 3)
second = list(a = 2, b = 3, c = 4)
I want to merge these two lists so the final product is
$a
[1] 1 2
$b
[1] 2 3
$c
[1] 3 4
Is there a simple function to do this?
If lists always have the same structure, as in the example, then a simpler solution is
mapply(c, first, second, SIMPLIFY=FALSE)
This is a very simple adaptation of the modifyList function by Sarkar. Because it is recursive, it will handle more complex situations than mapply would, and it will handle mismatched name situations by ignoring the items in 'second' that are not in 'first'.
appendList <- function (x, val)
{
stopifnot(is.list(x), is.list(val))
xnames <- names(x)
for (v in names(val)) {
x[[v]] <- if (v %in% xnames && is.list(x[[v]]) && is.list(val[[v]]))
appendList(x[[v]], val[[v]])
else c(x[[v]], val[[v]])
}
x
}
> appendList(first,second)
$a
[1] 1 2
$b
[1] 2 3
$c
[1] 3 4
Here are two options, the first:
both <- list(first, second)
n <- unique(unlist(lapply(both, names)))
names(n) <- n
lapply(n, function(ni) unlist(lapply(both, `[[`, ni)))
and the second, which works only if they have the same structure:
apply(cbind(first, second),1,function(x) unname(unlist(x)))
Both give the desired result.
Here's some code that I ended up writing, based upon #Andrei's answer but without the elegancy/simplicity. The advantage is that it allows a more complex recursive merge and also differs between elements that should be connected with rbind and those that are just connected with c:
# Decided to move this outside the mapply, not sure this is
# that important for speed but I imagine redefining the function
# might be somewhat time-consuming
mergeLists_internal <- function(o_element, n_element){
if (is.list(n_element)){
# Fill in non-existant element with NA elements
if (length(n_element) != length(o_element)){
n_unique <- names(n_element)[! names(n_element) %in% names(o_element)]
if (length(n_unique) > 0){
for (n in n_unique){
if (is.matrix(n_element[[n]])){
o_element[[n]] <- matrix(NA,
nrow=nrow(n_element[[n]]),
ncol=ncol(n_element[[n]]))
}else{
o_element[[n]] <- rep(NA,
times=length(n_element[[n]]))
}
}
}
o_unique <- names(o_element)[! names(o_element) %in% names(n_element)]
if (length(o_unique) > 0){
for (n in o_unique){
if (is.matrix(n_element[[n]])){
n_element[[n]] <- matrix(NA,
nrow=nrow(o_element[[n]]),
ncol=ncol(o_element[[n]]))
}else{
n_element[[n]] <- rep(NA,
times=length(o_element[[n]]))
}
}
}
}
# Now merge the two lists
return(mergeLists(o_element,
n_element))
}
if(length(n_element)>1){
new_cols <- ifelse(is.matrix(n_element), ncol(n_element), length(n_element))
old_cols <- ifelse(is.matrix(o_element), ncol(o_element), length(o_element))
if (new_cols != old_cols)
stop("Your length doesn't match on the elements,",
" new element (", new_cols , ") !=",
" old element (", old_cols , ")")
}
return(rbind(o_element,
n_element,
deparse.level=0))
return(c(o_element,
n_element))
}
mergeLists <- function(old, new){
if (is.null(old))
return (new)
m <- mapply(mergeLists_internal, old, new, SIMPLIFY=FALSE)
return(m)
}
Here's my example:
v1 <- list("a"=c(1,2), b="test 1", sublist=list(one=20:21, two=21:22))
v2 <- list("a"=c(3,4), b="test 2", sublist=list(one=10:11, two=11:12, three=1:2))
mergeLists(v1, v2)
This results in:
$a
[,1] [,2]
[1,] 1 2
[2,] 3 4
$b
[1] "test 1" "test 2"
$sublist
$sublist$one
[,1] [,2]
[1,] 20 21
[2,] 10 11
$sublist$two
[,1] [,2]
[1,] 21 22
[2,] 11 12
$sublist$three
[,1] [,2]
[1,] NA NA
[2,] 1 2
Yeah, I know - perhaps not the most logical merge but I have a complex parallel loop that I had to generate a more customized .combine function for, and therefore I wrote this monster :-)
merged = map(names(first), ~c(first[[.x]], second[[.x]])
merged = set_names(merged, names(first))
Using purrr. Also solves the problem of your lists not being in order.
In general one could,
merge_list <- function(...) by(v<-unlist(c(...)),names(v),base::c)
Note that the by() solution returns an attributed list, so it will print differently, but will still be a list. But you can get rid of the attributes with attr(x,"_attribute.name_")<-NULL. You can probably also use aggregate().
We can do a lapply with c(), and use setNames to assign the original name to the output.
setNames(lapply(1:length(first), function(x) c(first[[x]], second[[x]])), names(first))
$a
[1] 1 2
$b
[1] 2 3
$c
[1] 3 4
Following #Aaron left Stack Overflow and #Theo answer, the merged list's elements are in form of vector c.
But if you want to bind rows and columns use rbind and cbind.
merged = map(names(first), ~rbind(first[[.x]], second[[.x]])
merged = set_names(merged, names(first))
Using dplyr, I found that this line works for named lists using the same names:
as.list(bind_rows(first, second))

row.names(x) versus row.names<- in lapply on a list of data frames

When I'm attempting to answer this question, I noticed that row.names(x) <- vector is not working when using lapply on a list of data frames, while row.names<- worked.
Here's some sample data, I'm attempting to append the names of the list to the rownames of the dataframes:
l <- list(a=data.frame(col = c(1,2,3),row.names = c("k","l","m")),
b=data.frame(col = c(4,5,6), row.names = c("o","p","r")))
$a
col
k 1
l 2
m 3
$b
col
o 4
p 5
r 6
For example when looping over the names of the list and setting row.names in lapply, this is the result:
lapply(names(l), \(x) row.names(l[[x]]) <- paste0(x, ".", rownames(l[[x]])))
[[1]]
[1] "a.k" "a.l" "a.m"
[[2]]
[1] "b.o" "b.p" "b.r"
When we do the same code outside lapply, it worked:
> row.names(l[["a"]]) <- paste0("a.", rownames(l[["a"]]))
> l
$a
col
a.k 1
a.l 2
a.m 3
The solution I have for this problem is to use row.names<- (as suggested by the original OP):
setNames(lapply(names(l),
\(x) l[[x]] %>% `row.names<-` (paste(x, rownames(l[[x]]), sep = "."))),
names(l))
$a
col
a.k 1
a.l 2
a.m 3
$b
col
b.o 4
b.p 5
b.r 6
But I'm not sure why did the row.names(x) <- vector syntax failed and why row.names<- worked.
The assignment is local, i.e. it creates a modified object l with assigned row names inside the anonymous function.
The result of the <- assignment is returned from that anonymous function. However, the result of <- is always the RHS, even for replacement functions. You can make this visible by putting parentheses around the assignment in the global scope:
(row.names(l$a) <- paste0('a.', row.names(l$a)))
# [1] "a.k" "a.l" "a.m"
So in your second piece of code, the the transformation used by lapply is just the result of the assignment, which is the RHS of the assignment, which is the transformed row names. To fix the code, you need to return the modified object, l[[x]], itself:
lapply(
names(l),
\(x) {
row.names(l[[x]]) <- paste0(x, ".", row.names(l[[x]]))
l[[x]]
}
)
Or you use row.names<- as a function call:
lapply(names(l), \(x) `row.names<-`(l[[x]], paste0(x, ".", row.names(l[[x]]))))
And to make this a bit more readable use Map instead of lapply:
Map(\(x, n) `row.names<-`(x, paste0(n, ".", row.names(x))), l, names(l))
Or, without anonymous function (although I’m not convinced this is more readable):
Map(`row.names<-`, l, Map(paste0, names(l), ".", Map(row.names, l)))

How to store multidimensional subscript as variable in R

Suppose I have a matrix,
mat <- matrix((1:9)^2, 3, 3)
I can slice the matrix like so
> mat[2:3, 2]
[1] 25 36
How does one store the subscript as a variable? That is, what should my_sub be, such that
> mat[my_sub]
[1] 25 36
A list gets "invalid subscript type" error. A vector will lose the multidimensionality. Seems like such a basic operation to not have a primitive type that fits this usage.
I know I can access the matrix via vector addressing, which means converting from [2:3, 2] to c(5, 6), but that mapping presumes knowledge of matrix shape. What if I simply want [2:3, 2] for any matrix shape (assuming it is at least those dimensions)?
Here are some alternatives. They both generalize to higher dimenional arrays.
1) matrix subscripting If the indexes are all scalar except possibly one, as in the question, then:
mi <- cbind(2:3, 2)
mat[mi]
# test
identical(mat[mi], mat[2:3, 2])
## [1] TRUE
In higher dimensions:
a <- array(1:24, 2:4)
mi <- cbind(2, 2:3, 3)
a[mi]
# test
identical(a[mi], a[2, 2:3, 3])
## [1] TRUE
It would be possible to extend this to eliminate the scalar restriction using:
L <- list(2:3, 2:3)
array(mat[as.matrix(do.call(expand.grid, L))], lengths(L))
however, in light of (2) which also uses do.call but avoids the need for expand.grid it seems unnecessarily complex.
2) do.call This approach does not have the scalar limitation. mat and a are from above:
L2 <- list(2:3, 1:2)
do.call("[", c(list(mat), L2))
# test
identical(do.call("[", c(list(mat), L2)), mat[2:3, 1:2])
## [1] TRUE
L3 <- list(2, 2:3, 3:4)
do.call("[", c(list(a), L3))
# test
identical(do.call("[", c(list(a), L3)), a[2, 2:3, 3:4])
## [1] TRUE
This could be made prettier by defining:
`%[%` <- function(x, indexList) do.call("[", c(list(x), indexList))
mat %[% list(2:3, 1:2)
a %[% list(2, 2:3, 3:4)
Use which argument arr.ind = TRUE.
x <- c(25, 36)
inx <- which(mat == x, arr.ind = TRUE)
Warning message:
In mat == x :
longer object length is not a multiple of shorter object length
mat[inx]
#[1] 25 36
This is an interesting question. The subset function can actually help. You cannot subset directly your matrix using a vector or a list, but you can store the indexes in a list and use subset to do the trick.
mat <- matrix(1:12, nrow=4)
mat[2:3, 1:2]
# example using subset
subset(mat, subset = 1:nrow(mat) %in% 2:3, select = 1:2)
# double check
identical(mat[2:3, 1:2],
subset(mat, subset = 1:nrow(mat) %in% 2:3, select = 1:2))
# TRUE
Actually, we can write a custom function if we want to store the row- and column- indexes in the same list.
cust.subset <- function(mat, dim.list){
subset(mat, subset = 1:nrow(mat) %in% dim.list[[1]], select = dim.list[[2]])
}
# initialize a list that includes your sub-setting indexes
sbdim <- list(2:3, 1:2)
sbdim
# [[1]]
# [1] 2 3
# [[2]]
# [1] 1 2
# subset using your custom f(x) and your list
cust.subset(mat, sbdim)
# [,1] [,2]
# [1,] 2 6
# [2,] 3 7

apply a list of functions to a single argument in R

Hi I am trying to apply a list of functions to a single argument in R. For example,
flist <- list(F,G,H) #F,G,H are function objects
and say I want as a result a list or vector
(F(x),G(x),H(x)) where x is a scalar number.
Do you know how i can achieve that?
The most efficient way (it seems) to achieve this would be using a single lapply (instead of 3 different functions), such as
flist <- list(mean, unique, max) # Example functions list
MyScalar <- 1 # Some scalar
lapply(flist, function(f) f(MyScalar))
# [[1]]
# [1] 1
#
# [[2]]
# [1] 1
#
# [[3]]
# [1] 1
Though, if all the functions give the same size/class result, you could improve it even more using vapply
vapply(flist, function(x) x(MyScalar), FUN.VALUE = double(1))
## [1] 1 1 1
f <- function(x) x^1
g <- function(x) x^2
h <- function(x) x^3
l <- list(f, g, h)
sapply(l, do.call, list(2))
## [1] 2 4 8
do.call allows for function delegation with variable-length argument lists.
For example, c(1, 2, 3) can be called like so: do.call(c, list(1, 2, 3)).
(s|l)apply just iterates through a list and applies the specified function to each item. So the first iteration through l will be: do.call(l[[1]], list(2)), which is equivalent to l[[1]](2), which is equivalent to f(2).

Check that a vector is contained in a matrix in R

I can't believe this is taking me this long to figure out, and I still can't figure it out.
I need to keep a collection of vectors, and later check that a certain vector is in that collection. I tried lists combined with %in% but that doesn't appear to work properly.
My next idea was to create a matrix and rbind vectors to it, but now I don't know how to check if a vector is contained in a matrix. %in appears to compare sets and not exact rows. Same appears to apply to intersect.
Help much appreciated!
Do you mean like this:
wantVec <- c(3,1,2)
myList <- list(A = c(1:3), B = c(3,1,2), C = c(2,3,1))
sapply(myList, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or, is the vector in the set?
any(sapply(myList, function(x, want) isTRUE(all.equal(x, want)), wantVec))
We can do a similar thing with a matrix:
myMat <- matrix(unlist(myList), ncol = 3, byrow = TRUE)
## As the vectors are now in the rows, we use apply over the rows
apply(myMat, 1, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or
any(apply(myMat, 1, function(x, want) isTRUE(all.equal(x, want)), wantVec))
Or by columns:
myMat2 <- matrix(unlist(myList), ncol = 3)
## As the vectors are now in the cols, we use apply over the cols
apply(myMat, 2, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or
any(apply(myMat, 2, function(x, want) isTRUE(all.equal(x, want)), wantVec))
If you need to do this a lot, write your own function
vecMatch <- function(x, want) {
isTRUE(all.equal(x, want))
}
And then use it, e.g. on the list myList:
> sapply(myList, vecMatch, wantVec)
A B C
FALSE TRUE FALSE
> any(sapply(myList, vecMatch, wantVec))
[1] TRUE
Or even wrap the whole thing:
vecMatch <- function(x, want) {
out <- sapply(x, function(x, want) isTRUE(all.equal(x, want)), want)
any(out)
}
> vecMatch(myList, wantVec)
[1] TRUE
> vecMatch(myList, 5:3)
[1] FALSE
EDIT: Quick comment on why I used isTRUE() wrapped around the all.equal() calls. This is due to the fact that where the two arguments are not equal, all.equal() doesn't return a logical value (FALSE):
> all.equal(1:3, c(3,2,1))
[1] "Mean relative difference: 1"
isTRUE() is useful here because it returns TRUE iff it's argument is TRUE, whilst it returns FALSE if it is anything else.
> M
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
v <- c(2, 5, 8)
check each column:
c1 <- which(M[, 1] == v[1])
c2 <- which(M[, 2] == v[2])
c3 <- which(M[, 3] == v[3])
Here is a way to still use intersect() on more than 2 elements
> intersect(intersect(c1, c2), c3)
[1] 2

Resources