I have an array in R, created by a function like this:
A <- array(data=NA, dim=c(2,4,4), dimnames=list(c("x","y"),NULL,NULL))
And I would like to select along one dimension, so for the example above I would have:
A["x",,]
dim(A["x",,]) #[1] 4 4
Is there a way to generalize if I do not know in advance how many dimensions (in addition to the named one I want to select by) my array might have? I would like to write a function that takes input that might formatted as A above, or as:
B <- c(1,2)
names(B) <- c("x", "y")
C <- matrix(1, 2, 2, dimnames=list(c("x","y"),NULL))
Background
The general background is that I am working on an ODE model, so for deSolve's ODE function it must take a single named vector with my current state. For some other functions, like calculating phase-planes/direction fields, it would be more practical to have a higher-dimensional array to apply the differential equation to, and I would like to avoid having many copies of the same function, simply with different numbers of commas after the dimension I want to select.
I spent quite a lot of time figuring out the fastest way to do this for plyr, and the best I could come up with was manually constructing the call to [:
index_array <- function(x, dim, value, drop = FALSE) {
# Create list representing arguments supplied to [
# bquote() creates an object corresponding to a missing argument
indices <- rep(list(bquote()), length(dim(x)))
indices[[dim]] <- value
# Generate the call to [
call <- as.call(c(
list(as.name("["), quote(x)),
indices,
list(drop = drop)))
# Print it, just to make it easier to see what's going on
print(call)
# Finally, evaluate it
eval(call)
}
(You can find more information about this technique at https://github.com/hadley/devtools/wiki/Computing-on-the-language)
You can then use it as follows:
A <- array(data=NA, dim=c(2,4,4), dimnames=list(c("x","y"),NULL,NULL))
index_array(A, 2, 2)
index_array(A, 2, 2, drop = TRUE)
index_array(A, 3, 2, drop = TRUE)
It would also generalise in a straightforward way if you want to extract based on more than one dimension, but you'd need to rethink the arguments to the function.
I wrote this general function. Not necessarily super fast but a nice application for arrayInd and matrix indexing:
extract <- function(A, .dim, .value) {
val.idx <- match(.value, dimnames(A)[[.dim]])
all.idx <- arrayInd(seq_along(A), dim(A))
keep.idx <- all.idx[all.idx[, .dim] == val.idx, , drop = FALSE]
array(A[keep.idx], dim = dim(A)[-.dim], dimnames = dimnames(A)[-.dim])
}
Example:
A <- array(data=1:32, dim=c(2,4,4),
dimnames=list(c("x","y"), LETTERS[1:4], letters[1:4]))
extract(A, 1, "x")
extract(A, 2, "D")
extract(A, 3, "b")
The abind package has a function, asub, to do this in addition to other very useful array manipulation functions:
library(abind)
A <- array(data=1:32, dim=c(2,4,4),
dimnames=list(c("x","y"), LETTERS[1:4], letters[1:4]))
asub(A, 'x', 1)
asub(A, 'D', 2)
asub(A, 'b', 3)
And it allows indexing in multiple dimensions:
asub(A, list('x', c('C', 'D')), c(1,2))
Perhaps there is an easier way, but this works:
do.call("[",c(list(A,"x"),lapply(dim(A)[-1],seq)))
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] NA NA NA NA
[3,] NA NA NA NA
[4,] NA NA NA NA
Let's generalize it into a function that can extract from any dimension, not necessarily the first one:
extract <- function(A, .dim, .value) {
idx.list <- lapply(dim(A), seq_len)
idx.list[[.dim]] <- .value
do.call(`[`, c(list(A), idx.list))
}
Example:
A <- array(data=1:32, dim=c(2,4,4),
dimnames=list(c("x","y"), LETTERS[1:4], letters[1:4]))
extract(A, 1, "x")
extract(A, 2, "D")
extract(A, 3, "b")
Related
My question is about getting rid of a for loop while retaining the functionality of the code.
I have a matrix of pairwise orderings of elements A_1, A_2, ... A_N. Each ordering is represented as a row of a matrix. The code below shows an example.
# Matrix representing the relations
# A1 < A2, A1 < A5, A2 < A4
(mat <- matrix(c(1, 2, 1, 5, 2, 4), ncol = 2, byrow = TRUE))
#> [,1] [,2]
#> [1,] 1 2
#> [2,] 1 5
#> [3,] 2 4
I want this whole matrix as a set of ordered pairs. The reason is that I later need to generate the transitive closure of these relations. I have been using the sets package and created the function below.
create_sets <- function(mat){
# Empty set
my_set <- sets::set()
# For loop for adding pair elements to the set, one at a time
for(i in seq(from = 1, to = nrow(mat), by = 1)){
my_set <- sets::set_union(my_set,
sets::pair(mat[[i, 1]], mat[[i, 2]]))
}
return(my_set)
}
create_sets(mat)
#> {(1, 2), (1, 5), (2, 4)}
This function works well, but I believe the for loop is unnecessary, and am not capable of replacing it. For the particular example matrix above with exactly three rows, I could instead have used to following code:
my_set2 <- sets::set(
sets::pair(mat[[1, 1]], mat[[1, 2]]),
sets::pair(mat[[2, 1]], mat[[2, 2]]),
sets::pair(mat[[3, 1]], mat[[3, 2]])
)
my_set2
#> {(1, 2), (1, 5), (2, 4)}
The reason why this works, is that sets::set takes any number of pairs.
args(sets::set)
#> function (...)
#> NULL
However, the matrix mat will have an arbitrary number of rows, and I want the function to be able to handle all possible cases. This is why I have not been able to get rid of the for loop.
My question is hence: Given a matrix mat in which each row represents an ordered pair, is there some generic way of passing the pairs in each row as separate arguments to sets::set, without looping?
The OP has asked
[...] is there some generic way of passing the pairs in each row as separate arguments to sets::set, without looping?
Yes, the do.call() function is probably what you are looking for. From help(do.call):
do.call constructs and executes a function call from a name or a function and a list of arguments to be passed to it.
So, OP's create_sets() function can be replaced by
do.call(sets::set, apply(mat, 1, function(x) sets::pair(x[1], x[2])))
{(1, 2), (1, 5), (2, 4)}
The second argument to do.call() requires a list. This is created by
apply(mat, 1, function(x) sets::pair(x[1], x[2]))
which returns the list
[[1]]
(1, 2)
[[2]]
(1, 5)
[[3]]
(2, 4)
apply(mat, 1, FUN) is a kind of implied for loop which loops over the rows of a matrix mat and takes the vector of row values as argument when calling function FUN.
Edit: as.tuple() instead of pair()
The pair() function requires exactly two arguments. This is why we were forced to define an anonymous function function(x) sets::pair(x[1], x[2]).
The as.tuple() function coerces the elements of an object into elements of a set. So, the code can be even more simplified :
do.call(sets::set, apply(mat, 1, sets::as.tuple))
{(1, 2), (1, 5), (2, 4)}
Here, as.tuple() takes the whole vector of row values and coerces it to a set.
Option 1: do nothing
for loops aren't always the end of the world, this doesn't look too bad if your matrices aren't enormous.
Option 2: the split, apply, combine way (by way of a new function)
Write a function that combines the row things (there is a shorter way to do this, but this makes your task explicit)
f <- function(x) {
sets::pair(x[1], x[2])
}
Reduce(sets::set_union, lapply(split(mat, 1:nrow(mat)), f))
## {(1, 2), (1, 5), (2, 4)}
The Reduce does the same thing as the for loop (repeatedly apply set_union), and the lapply turns the matrix into a list of pairs (also like a for loop would)
Suppose I have a matrix,
mat <- matrix((1:9)^2, 3, 3)
I can slice the matrix like so
> mat[2:3, 2]
[1] 25 36
How does one store the subscript as a variable? That is, what should my_sub be, such that
> mat[my_sub]
[1] 25 36
A list gets "invalid subscript type" error. A vector will lose the multidimensionality. Seems like such a basic operation to not have a primitive type that fits this usage.
I know I can access the matrix via vector addressing, which means converting from [2:3, 2] to c(5, 6), but that mapping presumes knowledge of matrix shape. What if I simply want [2:3, 2] for any matrix shape (assuming it is at least those dimensions)?
Here are some alternatives. They both generalize to higher dimenional arrays.
1) matrix subscripting If the indexes are all scalar except possibly one, as in the question, then:
mi <- cbind(2:3, 2)
mat[mi]
# test
identical(mat[mi], mat[2:3, 2])
## [1] TRUE
In higher dimensions:
a <- array(1:24, 2:4)
mi <- cbind(2, 2:3, 3)
a[mi]
# test
identical(a[mi], a[2, 2:3, 3])
## [1] TRUE
It would be possible to extend this to eliminate the scalar restriction using:
L <- list(2:3, 2:3)
array(mat[as.matrix(do.call(expand.grid, L))], lengths(L))
however, in light of (2) which also uses do.call but avoids the need for expand.grid it seems unnecessarily complex.
2) do.call This approach does not have the scalar limitation. mat and a are from above:
L2 <- list(2:3, 1:2)
do.call("[", c(list(mat), L2))
# test
identical(do.call("[", c(list(mat), L2)), mat[2:3, 1:2])
## [1] TRUE
L3 <- list(2, 2:3, 3:4)
do.call("[", c(list(a), L3))
# test
identical(do.call("[", c(list(a), L3)), a[2, 2:3, 3:4])
## [1] TRUE
This could be made prettier by defining:
`%[%` <- function(x, indexList) do.call("[", c(list(x), indexList))
mat %[% list(2:3, 1:2)
a %[% list(2, 2:3, 3:4)
Use which argument arr.ind = TRUE.
x <- c(25, 36)
inx <- which(mat == x, arr.ind = TRUE)
Warning message:
In mat == x :
longer object length is not a multiple of shorter object length
mat[inx]
#[1] 25 36
This is an interesting question. The subset function can actually help. You cannot subset directly your matrix using a vector or a list, but you can store the indexes in a list and use subset to do the trick.
mat <- matrix(1:12, nrow=4)
mat[2:3, 1:2]
# example using subset
subset(mat, subset = 1:nrow(mat) %in% 2:3, select = 1:2)
# double check
identical(mat[2:3, 1:2],
subset(mat, subset = 1:nrow(mat) %in% 2:3, select = 1:2))
# TRUE
Actually, we can write a custom function if we want to store the row- and column- indexes in the same list.
cust.subset <- function(mat, dim.list){
subset(mat, subset = 1:nrow(mat) %in% dim.list[[1]], select = dim.list[[2]])
}
# initialize a list that includes your sub-setting indexes
sbdim <- list(2:3, 1:2)
sbdim
# [[1]]
# [1] 2 3
# [[2]]
# [1] 1 2
# subset using your custom f(x) and your list
cust.subset(mat, sbdim)
# [,1] [,2]
# [1,] 2 6
# [2,] 3 7
I've created code that will take an input vector, create a dataframe based on the input, optimise some values and return some of these values. I'm now turning this into a function that will apply the calculations rowwise on an input dataframe. Below is a minimum working example of what I would like to achieve (my actual function would be too long to share here!):
# Randomly generated dataframe
df <- data.frame(a = rnorm(10, 0, 1), x = rnorm(10, 1, 3), y = rnorm(10, 2, 3))
# Function that takes multiple arguments and returns multiple values in a list
zsummary <- function(x, y) {
if (y < 0) return(list(NA, NA))
z = rnorm(10, x, abs(y))
return(list(mean(z), sd(z)))
}
# Example of something that works using dplyr
# However, this results in a lot of function calls...
# especially if there were a lot of columns in the list...
library(dplyr)
df %>% rowwise() %>%
mutate(mean = zsummary(x,y)[[1]], sd = zsummary(x,y)[[1]])
As you can see, I can't apply individual functions to each new df$mean and dfsd columns as they depend on a z vector that can only be generated once. I've looked around on SO already, but I haven't been able to find an answer yet. I think a solution would be using one of the apply functions and not something from dplyr, but I've honestly never fully understood apply functions. I would also not like solutions that use for loops with rbind as I've tried this in previous projects and for large dataframes it becomes very slow!
We can use mapply for this. As the zsummary takes two arguments, the mapply would be one option as it take corresponding element of 'x' and 'y' to apply the zsummary.
t(mapply(zsummary, df$x, df$y))
We can also change the function slightly and get the output with dplyr
zsummary <- function(x, y) {
if (y < 0) return(data.frame(mean = NA, sd = NA))
z = rnorm(10, x, abs(y))
data.frame(mean = mean(z), sd = sd(z))
}
df %>%
rowwise() %>%
do(data.frame(., zsummary(.$x, .$y)))
Or as we discussed in the comments, instead of having the function taking multiple arguments, have a single argument and use apply with MARGIN=1 for applying it on each row.
zsummary2 <- function(v1){
if(v1[2] < 0) return(c(mean = NA, sd = NA))
z <- rnorm(10, v1[1], abs(v1[2]))
c(mean = mean(v1), sd= sd(v1))
}
t(apply(df[-1], 1, zsummary2))
# mean sd
# [1,] 1.403066 0.8757504
# [2,] 5.058188 5.1401507
# [3,] 4.288365 1.4194393
# [4,] 1.932829 6.7587054
# [5,] -1.864236 3.7587462
# [6,] NA NA
# [7,] 3.328629 1.3711950
# [8,] -2.347699 5.0449958
# [9,] 2.936615 1.7332283
#[10,] NA NA
NOTE: The values will be different in each run as we didn't set any seed for the rnorm.
I wish to combine equivalent, deeply-nested columns from all elements of a reasonably long list. What I would like to do, though it's not possible in R, is this:
combined.columns <- my.list[[1:length(my.list)]]$my.matrix[,"my.column"]
The only thing I can think of is to manually type out all the elements in cbind() like this:
combined.columns <- cbind(my.list[[1]]$my.matrix[,"my.column"], my.list[[2]]$my.matrix[,"my.column"], . . . )
This answer is pretty close to what I need, but I can't figure out how to make it work for the extra level of nesting.
There must be a more elegant way of doing this, though. Any ideas?
Assuming all your matrices have the same column name you wish to extract you could use sapply
set.seed(123)
my.list <- vector("list")
my.list[[1]] <- list(my.matrix = data.frame(A=rnorm(10,sd=0.3), B=rnorm(10,sd=0.3)))
my.list[[2]] <- list(my.matrix = data.frame(C=rnorm(10,sd=0.3), B=rnorm(10,sd=0.3)))
my.list[[3]] <- list(my.matrix = data.frame(D=rnorm(10,sd=0.3), B=rnorm(10,sd=0.3)))
sapply(my.list, FUN = function(x) x$my.matrix[,"B"])
Free data:
myList <- list(list(myMat = matrix(1:10, 2, dimnames=list(NULL, letters[1:5])),
myVec = 1:10),
list(myMat = matrix(10:1, 2, dimnames=list(NULL, letters[1:5])),
myVec = 10:1))
We can get column a of myMat a few different ways. Here's one that uses with.
sapply(myList, with, myMat[,"a"])
# [,1] [,2]
# [1,] 1 10
# [2,] 2 9
This mapply one might be better for a more recursive type problem. It works too and might be faster than sapply.
mapply(function(x, y, z) x[[y]][,z] , myList, "myMat", "a")
# [,1] [,2]
# [1,] 1 10
# [2,] 2 9
I can't believe this is taking me this long to figure out, and I still can't figure it out.
I need to keep a collection of vectors, and later check that a certain vector is in that collection. I tried lists combined with %in% but that doesn't appear to work properly.
My next idea was to create a matrix and rbind vectors to it, but now I don't know how to check if a vector is contained in a matrix. %in appears to compare sets and not exact rows. Same appears to apply to intersect.
Help much appreciated!
Do you mean like this:
wantVec <- c(3,1,2)
myList <- list(A = c(1:3), B = c(3,1,2), C = c(2,3,1))
sapply(myList, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or, is the vector in the set?
any(sapply(myList, function(x, want) isTRUE(all.equal(x, want)), wantVec))
We can do a similar thing with a matrix:
myMat <- matrix(unlist(myList), ncol = 3, byrow = TRUE)
## As the vectors are now in the rows, we use apply over the rows
apply(myMat, 1, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or
any(apply(myMat, 1, function(x, want) isTRUE(all.equal(x, want)), wantVec))
Or by columns:
myMat2 <- matrix(unlist(myList), ncol = 3)
## As the vectors are now in the cols, we use apply over the cols
apply(myMat, 2, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or
any(apply(myMat, 2, function(x, want) isTRUE(all.equal(x, want)), wantVec))
If you need to do this a lot, write your own function
vecMatch <- function(x, want) {
isTRUE(all.equal(x, want))
}
And then use it, e.g. on the list myList:
> sapply(myList, vecMatch, wantVec)
A B C
FALSE TRUE FALSE
> any(sapply(myList, vecMatch, wantVec))
[1] TRUE
Or even wrap the whole thing:
vecMatch <- function(x, want) {
out <- sapply(x, function(x, want) isTRUE(all.equal(x, want)), want)
any(out)
}
> vecMatch(myList, wantVec)
[1] TRUE
> vecMatch(myList, 5:3)
[1] FALSE
EDIT: Quick comment on why I used isTRUE() wrapped around the all.equal() calls. This is due to the fact that where the two arguments are not equal, all.equal() doesn't return a logical value (FALSE):
> all.equal(1:3, c(3,2,1))
[1] "Mean relative difference: 1"
isTRUE() is useful here because it returns TRUE iff it's argument is TRUE, whilst it returns FALSE if it is anything else.
> M
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
v <- c(2, 5, 8)
check each column:
c1 <- which(M[, 1] == v[1])
c2 <- which(M[, 2] == v[2])
c3 <- which(M[, 3] == v[3])
Here is a way to still use intersect() on more than 2 elements
> intersect(intersect(c1, c2), c3)
[1] 2