I have a sublist of principal component rotation vectors computed by prcomp, where each list item is an Nx2 array (i.e., two column vectors), for each class.
Using those vectors, I'd like to project some data similarly structured into a list of classes, each class item containing arrays with dimension NxMxT, where T is the number of trials.
My problem is, I can write simple vectorized functions with apply and its variants, but I'm having trouble generalizing this to apply that over each list.
Example data:
somedata <- list(array(rnorm(100),dim=c(5,4,5)),array(rnorm(100),dim=c(5,4,5)))
somevectors <- list(array(rnorm(10),dim=c(5,2)),array(rnorm(10),dim=c(5,2)))
Here is a simple example of the operation over each list element:
o.proj.1 <- apply(somedata[[1]],3,function(x){
t(somevectors[[1]]) %*% x
}) # returns an array where each projected trial is a column
I tried fitting this inside a call to lapply(), but didn't find much success:
lapply(somedata, y = somevectors, function(x,y){
apply(x,3,function(z){
t(y) %*% z
})
})
Error in t(y) %*% z : requires numeric/complex matrix/vector arguments
Basically my algorithm is to put the appropriate apply type (here lapply) around the more local function and remove the index that will be vectorized (here [[]]). What am I missing?
Of the *apply family of functions, mapply is the one to use when you want to loop simultaneously over two or more objects. Try:
o.proj <- mapply(function(x,y){
apply(x,3,function(z){
t(y) %*% z
})
}, somedata, somevectors, SIMPLIFY = FALSE)
I suppose you will want to use SIMPLIFY = FALSE to return a list, otherwise mapply will attempt to simplify your output into an array, a little like sapply does.
Also know that you can use Map as a shortcut for mapply(..., SIMPLIFY = FALSE).
Related
Consider a function f(x,y), where x is a vector (1xn) and data a matrix (nxm), returning a numeric scalar.
Now, I have a matrix A and a three-dimensional array B and would like to apply f across the first dimension of A and B.
Specifically, I would like f to be evaluated at x=A[1,] y=B[1,,], followed by x=A[2,] y=B[2,,] and so on, returning a vector of numeric scalars.
Is there a way to use any function of the "apply" family to solve this problem, thus avoiding a loop?
You can do:
sapply(1:nrow(A), function(i) f(A[i,], B[i,,]))
This is loop hiding because the looping is done inside of sapply(). I suppose in this case it is better to use a explicit loop:
result <- numeric(nrow(A))
for (i in 1:nrow(A)) result[i] <- f(A[i,], B[i,,]
Assume you have an undefined nr of lists as possible arguments for a function, for example the following 3 can be picked (this example is as simple as possible, so vectors are stored in lists):
a <- list(c(1,2,3,4,5))
b <- list(c(3,6,7,2,1))
c <- list(c(3,9,8))
If I want to calculate the intersection of all three lists, this can be done as follows:
Map(intersect,c,Map(intersect,a,b))
# or equivalent:
mapply(intersect,c,mapply(intersect,a,b,SIMPLIFY=F))
# [1] 3
But how can I change the nr of arguments to be undefined? I read about ..., but I cannot get it to work. First idea was to write a function, that can have multiple list arguments defined by ...:
intersectio <- function(...){
Map(function(...){
intersect(...)
})
}
Q: But that doesn't work of course, because intersect must be applied recursively. Is there any way to achieve this in R?
Q2: Here is an updated example with a nested list structure. How can it be done in this case, i.e. intersect every sublist of the parent list with the associated sublist (same index) of the other parent lists?
a <- list(list(c(1,2,3,4,5)),list(c(3,6,7,2,1)),list(c(3,9,8)))
b <- list(list(c(1,2)),list(c(3,6,9,11,12)),list(c(3)))
c <- list(list(c(1,9)),list(c(65,23,12)),list(c(14,15)))
As #Roland suggested, you can use Reduce to solve your problem. In the case of flat lists (as in the first version of the question), you can use the following:
Reduce(intersect, c(a, b, c))
In the case of nested lists (as in the updated question), you can just have to wrap that inside a mapply call:
mapply(function(...) Reduce(intersect, c(...)), a, b, c)
To generalize, you can define a function and then call it with as many arguments as you want.
list_intersect <- function(...){
mapply(function(...) Reduce(intersect, c(...)), ...)
}
list_intersect(a, b, c)
I am used to use apply familiy functions to avoid for loop with R. In this context I was wondering it there is a way to avoid typing a bound variable. For example, say I want to do 100 times an operation do.call(myfun, args). With for I'd write:
res = seq(100)
for(i in seq(100)){res[i] = do.call(myfun, args)}
with apply I type:
res = sapply(seq(100), function(i) do.call(myfun, args))
I understand that sapply tries to apply the function to one argument, it is an element of seq(100), but is there a way to avoid this, because indeed this variable (here i) has no meaning neither utility ?
thanks for the insight
I'm trying to figure out how to vectorize the following code block in R:
X is an N x M matrix
centers is a K x M matrix
phi <- matrix(0, nrow(X), nrow(centers))
for(i in 1:nrow(phi)) {
for(j in 1:ncol(phi)) {
phi[i, j] <- norm(as.matrix(X[i, ]) - as.matrix(centers[j, ]), type = 'F')
}
}
I'm constructing an N x K matrix, phi, which at each position, [i, j], contains the norm of the difference between the vectors at row i of X and row j of centers:
phi[i, j] = || X[i, ] - centers[j, ] ||
My approach so far has been to attempt to use R's outer() function. I'm new to the outer() function, so I've searched for several examples, however, the examples I've come across involve using outer() to apply some function to a pair of vectors of scalar values. As I'm dealing with the differences between pairs of rows from two matrices, outer() behaves different than expected. I'm not sure how to get it to recognize the matrices I'm passing it (X and centers) as vectors of vectors, where each row represents a vector to be involved in the computation of phi.
In addition, when I define a function to compute the norm of the difference between two M-length vectors, that function returns a scalar. It is my understanding that in order to vectorize a function using R's Vectorize(), that function must return a result of the same length as its arguments. I'm not sure how to define a function which, when used in conjunction with outer(), recognizes each row of a matrix as a single element (in spite of it being an M-length vector).
Below are a couple of my attempts to use outer() with toy examples of the matrices X and centers.
X <- matrix(c(7,8,9,1,2,3,4,5,6), 3, 3)
centers <- matrix(c(1,2,3,4,5,6), 2, 3)
fun <- function(y, x) norm(as.matrix(y) - as.matrix(x), type = 'F')
outer(X, centers, fun)
This was my first attempt. I was trying to use outer() in a manner analogous to the way it is used when it is passed a pair of vectors. I was (naively) hoping it would take one row from each matrix at a time, pass them as the two arguments to fun, and position the result appropriately in the product matrix. Instead, I get the following error message.
Error in outer(X, centers, fun) :
dims [product 54] do not match the length of object [1]
I also tried vectorizing my function using R's Vectorize() before calling outer().
Vecfun <- Vectorize(fun)
outer(X, centers, Vecfun)
In this case, I no longer get an error message, but the result is an erroneous matrix of matrices. I'm also new to the Vectorize() function, so I'm not too sure why it produces the result that it does as I don't have a real grasp on what it does; using it was sort of a shot in the dark.
I'll appreciate any help in vectorizing my original problem; I'm completely open to suggestions that do not involve outer().
Clarifications regarding outer() and Vectorize() also welcome.
I have a some true and predicted labels
truth <- factor(c("+","+","-","+","+","-","-","-","-","-"))
pred <- factor(c("+","+","-","-","+","+","-","-","+","-"))
and I would like to build the confusion matrix.
I have a function that works on unary elements
f <- function(x,y){ sum(y==pred[truth == x])}
however, when I apply it to the outer product, to build the matrix, R seems unhappy.
outer(levels(truth), levels(truth), f)
Error in outer(levels(x), levels(x), f) :
dims [product 4] do not match the length of object [1]
What is the recommended strategy for this in R ?
I can always go through higher order stuff, but that seems clumsy.
I sometimes fail to understand where outer goes wrong, too. For this task I would have used the table function:
> table(truth,pred) # arguably a lot less clumsy than your effort.
pred
truth - +
- 4 2
+ 1 3
In this case, you are test whether a multivalued vector is "==" to a scalar.
outer assumes that the function passed to FUN can take vector arguments and work properly with them. If m and n are the lengths of the two vectors passed to outer, it will first create two vectors of length m*n such that every combination of inputs occurs, and pass these as the two new vectors to FUN. To this, outer expects, that FUN will return another vector of length m*n
The function described in your example doesn't really do this. In fact, it doesn't handle vectors correctly at all.
One way is to define another function that can handle vector inputs properly, or alternatively, if your program actually requires a simple matching, you could use table() as in #DWin 's answer
If you're redefining your function, outer is expecting a function that will be run for inputs:
f(c("+","+","-","-"), c("+","-","+","-"))
and per your example, ought to return,
c(3,1,2,4)
There is also the small matter of decoding the actual meaning of the error:
Again, if m and n are the lengths of the two vectors passed to outer, it will first create a vector of length m*n, and then reshapes it using (basically)
dim(output) = c(m,n)
This is the line that gives an error, because outer is trying to shape the output into a 2x2 matrix (total 2*2 = 4 items) while the function f, assuming no vectorization, has given only 1 output. Hence,
Error in outer(levels(x), levels(x), f) :
dims [product 4] do not match the length of object [1]