How to pass an arbitrary number of arguments to R function without for loop? - r

My question is about getting rid of a for loop while retaining the functionality of the code.
I have a matrix of pairwise orderings of elements A_1, A_2, ... A_N. Each ordering is represented as a row of a matrix. The code below shows an example.
# Matrix representing the relations
# A1 < A2, A1 < A5, A2 < A4
(mat <- matrix(c(1, 2, 1, 5, 2, 4), ncol = 2, byrow = TRUE))
#> [,1] [,2]
#> [1,] 1 2
#> [2,] 1 5
#> [3,] 2 4
I want this whole matrix as a set of ordered pairs. The reason is that I later need to generate the transitive closure of these relations. I have been using the sets package and created the function below.
create_sets <- function(mat){
# Empty set
my_set <- sets::set()
# For loop for adding pair elements to the set, one at a time
for(i in seq(from = 1, to = nrow(mat), by = 1)){
my_set <- sets::set_union(my_set,
sets::pair(mat[[i, 1]], mat[[i, 2]]))
}
return(my_set)
}
create_sets(mat)
#> {(1, 2), (1, 5), (2, 4)}
This function works well, but I believe the for loop is unnecessary, and am not capable of replacing it. For the particular example matrix above with exactly three rows, I could instead have used to following code:
my_set2 <- sets::set(
sets::pair(mat[[1, 1]], mat[[1, 2]]),
sets::pair(mat[[2, 1]], mat[[2, 2]]),
sets::pair(mat[[3, 1]], mat[[3, 2]])
)
my_set2
#> {(1, 2), (1, 5), (2, 4)}
The reason why this works, is that sets::set takes any number of pairs.
args(sets::set)
#> function (...)
#> NULL
However, the matrix mat will have an arbitrary number of rows, and I want the function to be able to handle all possible cases. This is why I have not been able to get rid of the for loop.
My question is hence: Given a matrix mat in which each row represents an ordered pair, is there some generic way of passing the pairs in each row as separate arguments to sets::set, without looping?

The OP has asked
[...] is there some generic way of passing the pairs in each row as separate arguments to sets::set, without looping?
Yes, the do.call() function is probably what you are looking for. From help(do.call):
do.call constructs and executes a function call from a name or a function and a list of arguments to be passed to it.
So, OP's create_sets() function can be replaced by
do.call(sets::set, apply(mat, 1, function(x) sets::pair(x[1], x[2])))
{(1, 2), (1, 5), (2, 4)}
The second argument to do.call() requires a list. This is created by
apply(mat, 1, function(x) sets::pair(x[1], x[2]))
which returns the list
[[1]]
(1, 2)
[[2]]
(1, 5)
[[3]]
(2, 4)
apply(mat, 1, FUN) is a kind of implied for loop which loops over the rows of a matrix mat and takes the vector of row values as argument when calling function FUN.
Edit: as.tuple() instead of pair()
The pair() function requires exactly two arguments. This is why we were forced to define an anonymous function function(x) sets::pair(x[1], x[2]).
The as.tuple() function coerces the elements of an object into elements of a set. So, the code can be even more simplified :
do.call(sets::set, apply(mat, 1, sets::as.tuple))
{(1, 2), (1, 5), (2, 4)}
Here, as.tuple() takes the whole vector of row values and coerces it to a set.

Option 1: do nothing
for loops aren't always the end of the world, this doesn't look too bad if your matrices aren't enormous.
Option 2: the split, apply, combine way (by way of a new function)
Write a function that combines the row things (there is a shorter way to do this, but this makes your task explicit)
f <- function(x) {
sets::pair(x[1], x[2])
}
Reduce(sets::set_union, lapply(split(mat, 1:nrow(mat)), f))
## {(1, 2), (1, 5), (2, 4)}
The Reduce does the same thing as the for loop (repeatedly apply set_union), and the lapply turns the matrix into a list of pairs (also like a for loop would)

Related

outer reuses first element of X instead of doing its job

I have a two argument function that takes as its first input a triple of pairs of numbers in the form "(a, b)(c, d)(e, f)" (as a character string) and as second argument a pair of numbers (also written as a character string of the form "(a, b)") and outputs a logical that states if the pair (the second argument) is one of the three pairs in the triple (the first argument). I actually wrote two versions:
version1 <- function(x, y){#x is a triple of pairs, y is a pair
pairsfromthistriple <- paste(c("", "(", "("), strsplit(x, split = ")(", fixed = T)[[1]], c(")", ")", ""), sep = "")
y %in% pairsfromthistriple
}
version2 <- function(x, y){#x is triple of pairs, y is pair
y == substr(x, 1, 6) | y == substr(x, 7, 12) | y == substr(x, 13, 18)
}
I want to set this function loose for every triple-of-pairs from a vector of triples an every pair from some vector of pairs using outer. For here I'll us the following very short vectors:
triples <- c("(1, 2)(3, 4)(5, 6)", "(1, 2)(3, 5)(4, 6)")
names(triples) <- triples
pairs <- c("(5, 6)", "(3, 5)")
names(pairs) <- pairs
So here we go:
test1 <- outer(X = triples, Y = pairs, FUN = version1)
test2 <- outer(X = triples, Y = pairs, FUN = version2)
test2 evaluates to exactly what you expect, but test1 gives a non-sensical output:
> test1
(5, 6) (3, 5)
(1, 2)(3, 4)(5, 6) TRUE FALSE
(1, 2)(3, 5)(4, 6) TRUE FALSE
> test2
(5, 6) (3, 5)
(1, 2)(3, 4)(5, 6) TRUE FALSE
(1, 2)(3, 5)(4, 6) FALSE TRUE
The natural conclusion is that there is an error in version1, but it is not as simple as that. 'Manually' computing the terms in the matrix using version1 gives:
> version1(triples[1], pairs[1])
[1] TRUE
> version1(triples[1], pairs[2])
[1] FALSE
> version1(triples[2], pairs[1])
[1] FALSE
> version1(triples[2], pairs[2])
[1] TRUE
exactly as it should! So at least part of the fault is with the function outer. In fact what happens (in this small example it is not so clear, but this is very visible in larger examples) is that outer correctly computes the first row of its output matrix, but then copies this first row over and over to make up the subsequent rows. Obviously this is not what I want. If I only wanted to compute version1(x, y) for all y in some vector but just one single x, I would have used sapply rather than outer.
What is going on here?
Note this detail from the documentation for ?outer:
X and Y must be suitable arguments for FUN. Each will be extended by rep to length the products of the lengths of X and Y before FUN is called.
FUN is called with these two extended vectors as arguments (plus any arguments in ...). It must be a vectorized function (or the name of one) expecting at least two arguments and returning a value with the same length as the first (and the second).
Your version1 function is not vectorized properly like version2 is. You can see this by simply testing it on the original triples and pairs vectors, which should both match.
version1(triples, pairs)
#> [1] TRUE FALSE
version2(triples, pairs)
#> (5, 6) (3, 5)
#> TRUE TRUE
Your version1 function seems designed for use with apply(), because you retrieve a list from strsplit() but then just take the first element. If you want to maintain the approach of splitting the vector, then you would have to use the apply family of functions. Without using them, you are going to expand the triples or x vector into something much longer than y and you can't do element wise comparison.
However, I would just use something very simple. stringr::str_detect is already vectorized for string and pattern, so you can just use that directly.
library(stringr)
outer(X = triples, Y = pairs, FUN = str_detect)
#> (5, 6) (3, 5)
#> (1, 2)(3, 4)(5, 6) TRUE FALSE
#> (1, 2)(3, 5)(4, 6) FALSE TRUE

Extracting sub-elements from list of lists vs list of vectors using lapply

How does lapply extract sub-elements from a list? More specifically, how does lapply extract sub-elements from a list of lists versus a list of vectors? Even more specifically, suppose I have the following:
my_list_of_lists <- list(list(a = 1, b = 2), list(a = 2, c = 3), list(b = 4, c = 5))
my_list_of_lists[[1]][["a"]] # just checking
# [1] 1
# that's what I expected
and apply the following:
lapply(my_list_of_lists, function(x) x[["a"]])
# [[1]]
# [1] 1
#
# [[2]]
# [1] 2
#
# [[3]]
# NULL
So lapply extracts the a element from each of the 3 sublists, returning each in its own list, contained in the length=3 list. At this point, my mental model is the following: lapply applies FUN to each element of my_list, returning FUN(my_list[[i]]) for i in 1:3. Great! So I expect my mental model should work for lists of vectors as well. For example,
my_list_of_vecs <- list(c(a = 1, b = 2), c(a = 2, c = 3), c(b = 4, c = 5))
my_list_of_vecs[[1]][["a"]] # Just checking
# [1] 1
# that's what I expected
and apply the following:
lapply(my_list_of_vecs, function(x) x[["a"]])
# Error in x[["a"]] : subscript out of bounds
# Wait...What!?
What's going on here!? Shouldn't this just work? I found a section in help(lapply) which might be relevant:
For historical reasons, the calls created by lapply are unevaluated,
and code has been written (e.g., bquote) that relies on this. This
means that the recorded call is always of the form FUN(X[[i]], ...),
with i replaced by the current (integer or double) index. This is not
normally a problem, but it can be if FUN uses sys.call or match.call
or if it is a primitive function that makes use of the call. This
means that it is often safer to call primitive functions with a
wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is
required to ensure that method dispatch for is.numeric occurs
correctly.
I really don't know how to make sense of this.
I think it's related to the fact that you can use both [[ and [ extraction of single elements from a vector but you can ONLY use [ extraction of ranges of elements. For example,
my_list_of_vecs[[1]][1:2]
# a b
# 1 2
my_list_of_vecs[[1]][[1:2]]
# Error in my_list_of_vecs[[1]][[1:2]] :
# attempt to select more than one element in vectorIndex
So under the hood, lapply must be using function(x) x[["a"]] over a range. Is that right?
Debugging doesn't help me here since these functions rely on .Internal functions.

Changing every element in vector or list by the same number

I have a list (that could be changed to a vector by unlist()) and I want to increment every number by 1.
For example, if I have x <- list(1, 2, 3, 6, 4)
I want to end up with a list or a vector that will be 2 3 4 7 5.
I was thinking about using apply or making a function that loops through every element but I feel like that would be messy and there must be an easier way to do it
I don't really see how this would be messy. You can add 1 to every element of a list with lapply(). In R, this is the standard method for applying a function over a list.
x <- list(1, 2, 3, 6, 4)
lapply(x, "+", 1) ## for a list result; unlist(x) + 1 for atomic result
Or if you are entering the International Code Golf Championships, you can use
Map("+", x, 1)
Or you can use a for() loop
for(i in seq_along(x)) x[[i]] <- x[[i]] + 1
Or if you have an unlisted list, you can relist it with x as its skeleton.
relist(unlist(x) + 1, x) ## same as as.list(unlist(x) + 1) here

Select along one of n dimensions in array

I have an array in R, created by a function like this:
A <- array(data=NA, dim=c(2,4,4), dimnames=list(c("x","y"),NULL,NULL))
And I would like to select along one dimension, so for the example above I would have:
A["x",,]
dim(A["x",,]) #[1] 4 4
Is there a way to generalize if I do not know in advance how many dimensions (in addition to the named one I want to select by) my array might have? I would like to write a function that takes input that might formatted as A above, or as:
B <- c(1,2)
names(B) <- c("x", "y")
C <- matrix(1, 2, 2, dimnames=list(c("x","y"),NULL))
Background
The general background is that I am working on an ODE model, so for deSolve's ODE function it must take a single named vector with my current state. For some other functions, like calculating phase-planes/direction fields, it would be more practical to have a higher-dimensional array to apply the differential equation to, and I would like to avoid having many copies of the same function, simply with different numbers of commas after the dimension I want to select.
I spent quite a lot of time figuring out the fastest way to do this for plyr, and the best I could come up with was manually constructing the call to [:
index_array <- function(x, dim, value, drop = FALSE) {
# Create list representing arguments supplied to [
# bquote() creates an object corresponding to a missing argument
indices <- rep(list(bquote()), length(dim(x)))
indices[[dim]] <- value
# Generate the call to [
call <- as.call(c(
list(as.name("["), quote(x)),
indices,
list(drop = drop)))
# Print it, just to make it easier to see what's going on
print(call)
# Finally, evaluate it
eval(call)
}
(You can find more information about this technique at https://github.com/hadley/devtools/wiki/Computing-on-the-language)
You can then use it as follows:
A <- array(data=NA, dim=c(2,4,4), dimnames=list(c("x","y"),NULL,NULL))
index_array(A, 2, 2)
index_array(A, 2, 2, drop = TRUE)
index_array(A, 3, 2, drop = TRUE)
It would also generalise in a straightforward way if you want to extract based on more than one dimension, but you'd need to rethink the arguments to the function.
I wrote this general function. Not necessarily super fast but a nice application for arrayInd and matrix indexing:
extract <- function(A, .dim, .value) {
val.idx <- match(.value, dimnames(A)[[.dim]])
all.idx <- arrayInd(seq_along(A), dim(A))
keep.idx <- all.idx[all.idx[, .dim] == val.idx, , drop = FALSE]
array(A[keep.idx], dim = dim(A)[-.dim], dimnames = dimnames(A)[-.dim])
}
Example:
A <- array(data=1:32, dim=c(2,4,4),
dimnames=list(c("x","y"), LETTERS[1:4], letters[1:4]))
extract(A, 1, "x")
extract(A, 2, "D")
extract(A, 3, "b")
The abind package has a function, asub, to do this in addition to other very useful array manipulation functions:
library(abind)
A <- array(data=1:32, dim=c(2,4,4),
dimnames=list(c("x","y"), LETTERS[1:4], letters[1:4]))
asub(A, 'x', 1)
asub(A, 'D', 2)
asub(A, 'b', 3)
And it allows indexing in multiple dimensions:
asub(A, list('x', c('C', 'D')), c(1,2))
Perhaps there is an easier way, but this works:
do.call("[",c(list(A,"x"),lapply(dim(A)[-1],seq)))
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] NA NA NA NA
[3,] NA NA NA NA
[4,] NA NA NA NA
Let's generalize it into a function that can extract from any dimension, not necessarily the first one:
extract <- function(A, .dim, .value) {
idx.list <- lapply(dim(A), seq_len)
idx.list[[.dim]] <- .value
do.call(`[`, c(list(A), idx.list))
}
Example:
A <- array(data=1:32, dim=c(2,4,4),
dimnames=list(c("x","y"), LETTERS[1:4], letters[1:4]))
extract(A, 1, "x")
extract(A, 2, "D")
extract(A, 3, "b")

R - how to get a value of a multi-dimensional array by a vector of indices

Let's say I have a multi-dimensional array called pi, and its number of dimensions isn't known until the runtime:
dims <- rep(3, dim_count)
pi <- array(0, dims)
As you can see the dimension count depends on dim_count. How do I retrieve a value from the array when I have a vector of the indexes? For example when I have:
dim_count <- 5
indexes <- c(1, 2, 3, 3, 3)
I want to retrieve
pi[1, 2, 3, 3, 3]
Is there a short, effective and hopefully elegant way of doing this?
Making use of a little known usage of [:
When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
you can simply do:
pi[matrix(indexes, 1)]
do.call("[",...) seems to work.
indexes <- c(1,2,3,3,3)
pi[1,2,3,3,3] <- 17 ## so we know if we succeeded or not
do.call("[",c(list(pi),as.list(indexes)))
Note that your example wouldn't work -- your dimensions were all 3, but some of your index elements were >3 ...
do.call() is an option:
dim_count <- 5
indexes <- c(1, 2, 2, 2, 3)
dims <- rep(3, dim_count)
pi <- array(seq_len(prod(dims)), dims)
do.call(`[`, c(list(x = pi), as.list(indexes)))
Which gives:
> do.call(`[`, c(list(x = pi), as.list(indexes)))
[1] 202
> pi[1, 2, 2, 2, 3]
[1] 202
The tricky bit is getting the list of arguments in the right format. pi should be the first argument to "[" (or named as argument x, see ?"["), whilst we want each element of indexes itself to be a component of the supplied list, not a vector within that list. Hence the convoluted c(list(x = pi), as.list(indexes)).
An alternative way to construct the argument list which might be easier to follow is:
ARGS <- vector("list", length = dim_count + 1)
ARGS[[1]] <- pi
ARGS[2:length(ARGS)] <- indexes
do.call("[", ARGS)
which gives
> do.call("[", ARGS)
[1] 202
> pi[1, 2, 2, 2, 3]
[1] 202

Resources