Using dplyr functions inside apply - r

I want to use dplyr functions inside apply, to every element of a matrix (BRCK), which is a matrix of dataframes.
I tried something like this:
apply(BRCK, c(1,2), function(x) dplyr::select(x, dplyr::contains("_01_"), 1) %>%
dplyr::filter((month(`BRCK[[la, lo]]`) == 1)) %>%
dplyr::select(-contains("BRCK"))
But it returns
Error: Variable context not set
And the traceback:
13. stop(cnd)
12. abort("Variable context not set")
11. cur_vars_env$selected %||% abort("Variable context not set")
10. current_vars()
9. tolower(vars)
8. dplyr::contains("_01_")
7. select.list(x, dplyr::contains("_01_"), 1)
6. dplyr::select(x, dplyr::contains("_01_"), 1)
5. eval(lhs, parent, parent)
4. eval(lhs, parent, parent)
3. dplyr::select(x, dplyr::contains("_01_"), 1) %>% dplyr::filter(x,
(month(`BRCK[[la, lo]]`) == 1)) %>% dplyr::select(x, -contains("BRCK"))
2. FUN(newX[, i], ...)
1. apply(BRCK, c(1, 2), function(x) dplyr::select(x, dplyr::contains("_01_"), 1) %>% dplyr::filter(x, (month(`BRCK[[la, lo]]`) == 1)) %>%
dplyr::select(x, -contains("BRCK")))
BRCK is a very large object, It works with for cycles but I'm trying to replace them with apply functions.

With apply, x is passed as a list in the function and dplyr only deals with dataframe.
apply(BRCK, c(1,2), is.data.frame)
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
but :
apply(BRCK, c(1,2), function(x) is.data.frame(x[[1]]))
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] TRUE TRUE TRUE
[3,] TRUE TRUE TRUE
so :
library(tidyverse)
apply(BRCK, c(1,2),
function(x) {
x[[1]] %>%
dplyr::select(dplyr::contains("_01_"), 1) %>%
dplyr::filter(lubridate::month(`BRCK[[la, lo]]`) == 1) %>%
dplyr::select(-contains("BRCK"))
}
)

One problem is that each element in the loop is a list of a single data frame, not an actual data frame. Compare:
apply(BRCK, c(1,2), function(x) {
class(x)
})
[,1] [,2] [,3]
[1,] "list" "list" "list"
[2,] "list" "list" "list"
[3,] "list" "list" "list"
apply(BRCK, c(1,2), function(x) {
class(x[[1]])
})
[,1] [,2] [,3]
[1,] "data.frame" "data.frame" "data.frame"
[2,] "data.frame" "data.frame" "data.frame"
[3,] "data.frame" "data.frame" "data.frame"
I would suggest not using apply loop (rather use lapply on the indices) since the way apply subsets objects and modifies them is not well documented.
I would also suggest not storing data.frames in a matrix. You could store them in a list, and set attributes for the metadata that is implied by the matrix indices.

Related

Using R `outer` with `%in%` operator

I am trying to perform the following outer operation:
x <- c(1, 11)
choices <- list(1:10, 10:20)
outer(x, choices, FUN=`%in%`)
I expect the following matrix:
[,1] [,2]
[1,] TRUE FALSE
[2,] FALSE TRUE
which would correspond to the following operations:
outer(x, choices, FUN=paste, sep=" %in% ")
[,1] [,2]
[1,] "1 %in% 1:10" "1 %in% 10:20"
[2,] "11 %in% 1:10" "11 %in% 10:20"
But for some reason I am getting:
[,1] [,2]
[1,] FALSE FALSE
[2,] FALSE FALSE
What is happening?
As expressed in the comments, the table argument of match (the function called by %in%) isn't intended to be a list (if it is, it gets coerced to a character). You should use vapply:
vapply(choices,function(y) x %in% y,logical(length(x)))
# [,1] [,2]
#[1,] TRUE FALSE
#[2,] FALSE TRUE
Another way that is close to your train of thought, would be to use expand.grid() to create the combinations, and then Map the two columns via %in% function, i.e.
d1 <- expand.grid(x, choices)
matrix(mapply(`%in%`, d1$Var1, d1$Var2), nrow = length(x))
#or you can use Map(`%in%`, ...) in order to keep results in a list
OR
As #nicola suggests, in order to make things better,
d1 <- expand.grid(list(x), choices)
mapply(%in%, d1$Var1, d1$Var2)
both giving,
[,1] [,2]
[1,] TRUE FALSE
[2,] FALSE TRUE

Extract elements from matrix diagonal saved in multiple lists in R

I´m trying to get different elements from multiple diagonal saved as lists. My data looks something like this:
res <- list()
res[[1]] <- matrix(c(0.04770856,0.02854005,0.02854005,0.03260190), nrow=2, ncol=2)
res[[2]] <- matrix(c(0.05436957,0.04887182,0.04887182, 0.10484454), nrow=2, ncol=2)
> res
[[1]]
[,1] [,2]
[1,] 0.04770856 0.02854005
[2,] 0.02854005 0.03260190
[[2]]
[,1] [,2]
[1,] 0.05436957 0.04887182
[2,] 0.04887182 0.10484454
> diag(res[[1]])
[1] 0.04770856 0.03260190
> diag(res[[2]])
[1] 0.05436957 0.10484454
I would like to save the first and second elements of each diagonal of a given list into a vector similar to this:
d.1st.el <- c(0.04770856, 0.05436957)
d.2nd.el <- c(0.03260190, 0.10484454)
My issue is to write the function that runs for all given lists and get the diagonals. For some reason, when I use unlist() to extract the values of each matrix for a given level, it doesn't get me the number but the full matrix.
Does anyone have a simple solution?
sapply(res, diag)
[,1] [,2]
[1,] 0.04770856 0.05436957
[2,] 0.03260190 0.10484454
# or
lapply(res, diag)
[[1]]
[1] 0.04770856 0.03260190
[[2]]
[1] 0.05436957 0.10484454
If you want the vectors for some reason in your global environment:
alld <- lapply(res, diag)
names(alld) <- sprintf("d.%d.el", 1:length(alld))
list2env(alld, globalenv())
In two steps you can do:
# Step 1 - Get the diagonals
all_diags <- sapply(res, function(x) diag(t(x)))
print(all_diags)
[,1] [,2]
[1,] 0.04770856 0.05436957
[2,] 0.03260190 0.10484454
# Step 2 - Append to vectors
d.1st.el <- all_diags[1,]
d.2nd.el <- all_diags[2,]

Apply an operation across multiple lists

I know in R if I have a list of matrices, I can use the Reduce function to apply an operation across all the matrices. For example:
l <- list(matrix(rnorm(16), 4, 4), matrix(rnorm(16), 4, 4))
Reduce(`*`, l)
But what if I want to apply this operation across multiple lists? I could do a brute-force approach with a for loop but I feel like there should be a better way. I can do two lists with mapply
l2 <- l
mapply(`*`, l, l2, SIMPLIFY = FALSE)
But if I have more that two I'm not sure how to solve that.
The following thoughts all result in errors:
l3 <- l2
mapply(`*`, l, l2, l3, SIMPLIFY = FALSE)
Error in .Primitive("*")(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]) :
operator needs one or two arguments
Reduce(`*`, list(l, l2, l3))
Error in f(init, x[[i]]) : non-numeric argument to binary operator
The desired output is a list of length 2 with the elementwise products of each matrix within each list. The brute-force loop would look like this:
out <- vector("list", length = 2)
for(i in 1:2){
out[[i]] <- l[[i]] * l2[[i]] * l3[[i]]
}
This combination of Reduce and Map will produce the desired result in base R.
# copy the matrix list
l3 <- l2 <- l
out2 <- Reduce(function(x, y) Map(`*`, x, y), list(l, l2, l3))
which returns
out2
[[1]]
[,1] [,2] [,3] [,4]
[1,] -5.614351e-01 -0.06809906 -0.16847839 0.8450600
[2,] -1.201886e-05 0.02008037 5.64656727 -2.4845526
[3,] 5.587296e-02 -0.54793853 0.02254552 0.4608697
[4,] -9.732049e-04 11.73020448 1.83408770 -1.4844601
[[2]]
[,1] [,2] [,3] [,4]
[1,] -4.7372339865 -0.398501528 0.8918474 0.12433983
[2,] 0.0007413892 0.151864126 -0.2138688 -0.10223482
[3,] -0.0790846342 -0.413330364 2.0640126 -0.01549591
[4,] -0.1888032661 -0.003773035 -0.9246891 -2.30731237
We can check that this is the same as the for loop in the OP.
identical(out, out2)
[1] TRUE

R matrix of lists

a=matrix(list(),2,2)
a[1,2]=list(2) ##works
a=matrix(list(),2,2)
a[1,2]=list(2,3) ##doesn't work
Error in a[1, 2] = list(2, 3) : number of items to replace is not a
multiple of replacement length
That's the error message from the fourth line. If I try
x=list()
x=list(2,4)
it works, I don't see the difference as a[1,2] is a NULL list..
Thanks in advance.
When you replace with list(2), look at the output:
a=matrix(list(),2,2)
a[1,2]=list(2) sapply(a, class)
# [1] "NULL" "NULL" "numeric" "NULL"
That is, the [1,2] element is not a list.
list(2,3) can be coerced like that to a single element; as RichardScriven points out, the replacement must be length-1; the following works (a bit silly, I agree):
a = matrix(list( ), 2, 2)
a[1, 2] = list(list(2, 3))
a
# [,1] [,2]
# [1,] NULL List,2
# [2,] NULL NULL
(just for reference, I figured this out by playing with dput, like so:)
#What happens if we declare 'a' as a
# matrix of appropriately-sized lists to start with?
a <- matrix(replicate(4, vector("list", 2), simplify = FALSE), 2, 2)
a
# [,1] [,2]
# [1,] List,2 List,2
# [2,] List,2 List,2
#
# can we replace now?
a[1,2] <- list(2,3)
# (same error; what IS 'a[1,2]' for this matrix?)
dput(a[1, 2])
# list(list(NULL, NULL))
# BINGO! we must replace 'a[1,2]' with a length-one list.

positions of non-NA cells in a matrix

Consider the following matrix,
m <- matrix(letters[c(1,2,NA,3,NA,4,5,6,7,8)], 2, byrow=TRUE)
## [,1] [,2] [,3] [,4] [,5]
## [1,] "a" "b" NA "c" NA
## [2,] "d" "e" "f" "g" "h"
I wish to obtain the column indices corresponding to all non-NA elements, merged with the NA elements immediately following:
result <- c(list(1), list(2:3), list(4,5),
list(1), list(2), list(3), list(4), list(5))
Any ideas?
The column (and row) indicies of non-NA elements can be obtained with
which(!is.na(m), TRUE)
A full answer:
Since you want to work row-wise, but R treats vector column-wise, it is easier to work on the transpose of m.
t_m <- t(m)
n_cols <- ncol(m)
We get the array indicies as mentioned above, which gives the start point of each list.
ind_non_na <- which(!is.na(t_m), TRUE)
Since we are working on the transpose, we want the row indices, and we need to deal with each column separately.
start_points <- split(ind_non_na[, 1], ind_non_na[, 2])
The length of each list is given by the difference between starting points, or the difference between the last point and the end of the row (+1). Then we just call seq to get a sequence.
unlist(
lapply(
start_points,
function(x)
{
len <- c(diff(x), n_cols - x[length(x)] + 1L)
mapply(seq, x, length.out = len, SIMPLIFY = FALSE)
}
),
recursive = FALSE
)
This will get you close:
cols <- col(m)
cbind(cols[which(is.na(m))-1],cols[is.na(m)])
[,1] [,2]
[1,] 2 3
[2,] 4 5

Resources