Lapply over several parameters, faster method - r

Suppose I have two vectors
a <- c(1,2,3,4,5)
b <- c(6,7,8,9,10)
and a function
calc <- function(x,y){x + y)
I want to apply this function for the 1st value in a for each value in b. Suppose in my case calc only allows a single value from a and b as input, so lapply(a,calc,b) wouldn't work because the length(b) is not 1 then (gives me an error).
Also mapply doesnt give me the wanted solution either, it only applies the function on paired values, i.e. 1+6, 2+7, etc.
So I built a function that gave me the wanted solution
myfunc <- function(z){lapply(a,calc,z)}
and applied it on b
solution <- lapply(b,myfunc)
We see here that the difference to lapply(a,calc,b) or a nested lapply(a,lapply,calc,b) is that it gives me all the values in its own list. Thats what I wanted or at least it was a function that gave me the right result with no error.
Now, is there a faster/ more trivial method, because I just experimented here a little. And with my function which is much larger than calc it takes 10 minutes, but maybe I have to slim down my original function and there would not be a faster method here...
EDIT:
In my function there is something like this,
calc <- function(x,y){
# ...
number <- x
example <- head(number,n=y)
# ...
}
where a vector as an input for y doesnt work anymore. With lapply(a,lapply,calc,b) or lapply(a,calc,b) I get an error,
Error in head.default(number, n = y) : length(n) == 1L is not TRUE

As Florian says, outer() could be an option.
outer(a, b, calc)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 7 8 9 10 11
# [2,] 8 9 10 11 12
# [3,] 9 10 11 12 13
# [4,] 10 11 12 13 14
# [5,] 11 12 13 14 15
But as MichaelChirico mentions, with a function that isn't vectorized it won't work. In that case something else has to be hacked together. These might or might not be quicker than your current solution.
All combinations (so both calc(1, 6) and calc(6, 1) are performed, similar to outer()
Number of calculations: n2
eg <- expand.grid(a, b)
m1 <- mapply(calc, eg[,1], eg[, 2])
matrix(m1, 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 7 8 9 10 11
# [2,] 8 9 10 11 12
# [3,] 9 10 11 12 13
# [4,] 10 11 12 13 14
# [5,] 11 12 13 14 15
Only unique combinations (so assumes your function is symmetric)
Number of calculations: (n2 - n) / 2
cn <- t(combn(1:length(a), 2))
m2 <- mapply(calc, a[cn[, 1]], b[cn[, 2]])
mat <- matrix(, length(a), length(a))
mat[upper.tri(mat)] <- m2
mat
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 8 9 10 11
# [2,] NA NA 10 11 12
# [3,] NA NA NA 12 13
# [4,] NA NA NA NA 14
# [5,] NA NA NA NA NA
This second one ignores the diagonal, but adding those values are easy, as that's what the OPs mapply() call returned.
diag(mat) <- mapply(calc, a, b)
mat
# [,1] [,2] [,3] [,4] [,5]
# [1,] 7 8 9 10 11
# [2,] NA 9 10 11 12
# [3,] NA NA 11 12 13
# [4,] NA NA NA 13 14
# [5,] NA NA NA NA 15

This solved it for me, adding SIMPLIFY=FALSE to the mapply function, thanks to #AkselA.
eg <- expand.grid(a, b)
m1 <- mapply(calc, eg[,1], eg[, 2],SIMPLIFY=FALSE)
However, this method is only slightly faster than my own solution in my OP.

Related

Column having maximum L2 norm

How to find the column of a matrix which has the maximum L2 norm? The matrix has NA values in some columns, we want to ignore those columns.
The following code I am trying, but it shows error due to NA values.
#The matrix is T
for(i in 1:ncol(T)){
if(norm(y,type='2') < norm(T[,i],type = '2'))
y = T[,i]
}
I think it would also be useful if we could somehow get the columns of T as a list, since we could use which.max function then, but I could not do that. Is that possible?
Please help
Maybe you can write your own L2 norm and find the column with the maximum, i.e.,
which.max(sqrt(colSums(T**2)))
Example
T <- matrix(c(1:10,NA,12:19,NA),nrow = 4)
> T
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 NA 15 19
[4,] 4 8 12 16 NA
> which.max(sqrt(colSums(T**2)))
[1] 4

Indexing matrices when some elements of the selector are missing (R)

When some elements of a vector used for row-indexing a matrix or a data.frame are missing NA in R, the indexing operation has results that I find unexpected.
m = matrix(1:15,ncol = 3)
m[1,1] = NA
m[m[,1] < 4 ,]
Gives
[,1] [,2] [,3]
[1,] NA NA NA
[2,] 2 7 12
[3,] 3 8 13
While I would have expected
[,1] [,2] [,3]
[1,] NA 4 11
[2,] 2 7 12
[3,] 3 8 13
One option seems to be
m[m[,1] < 4 | is.na(m[,1]) ,]
But I find this unhandy. It often happens to me that I lose data by mistake when indexing matrices and data.frames that contains missings. Is there an easier and safer way to reach the desired result?

Apply function to each cell in DataFrame or matrix multithreadedly in R

Is it possible to apply function to each cell in a DataFrame/matrix multithreadedly in R?
I'm aware of apply() but it doesn't seem to allow multithreading natively:
x <- cbind(x1 = 3, x2 = c(4:1, 2:5))
cave <- function(x, c1, c2) {
a = 1000
for (i in 1:100) { # Useless busy work
b=matrix(runif(a*a), nrow = a, ncol=a)
}
c1 + c2 * x
}
apply(x, 1, cave, c1 = 3, c2 = 4)
returns:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
x1 15 15 15 15 15 15 15 15
x2 19 15 11 7 11 15 19 23
Instead, I would like to use more than one core to perform the operation, since the applied function may be complex. For example, one can apply a function to each cell in DataFrame multithreadedly in pandas.
There are probably a few ways to do this, but I've always found it easiest to run parallel operations on list objects. If you convert the input matrix to a list, the function can be applied using parallel::parLapply as follows:
## convert the input object to a list
x.list <- split(t(x), rep(1:nrow(x), each = ncol(x)))
## parallelize the operation over e.g. 2 cores
cl <- parallel::makeCluster(2)
out <- parallel::parLapply(cl, x.list, cave, c1 = 3, c2 = 4)
parallel::stopCluster(cl)
## transform the output list back to a matrix
out <- t(matrix(unlist(out, use.names = FALSE), nrow = ncol(x)))
colnames(out) <- colnames(x)
This should work across platforms.
> x
x1 x2
[1,] 3 4
[2,] 3 3
[3,] 3 2
[4,] 3 1
[5,] 3 2
[6,] 3 3
[7,] 3 4
[8,] 3 5
> out
x1 x2
[1,] 15 19
[2,] 15 15
[3,] 15 11
[4,] 15 7
[5,] 15 11
[6,] 15 15
[7,] 15 19
[8,] 15 23

R: applying function over matrix and keeping matrix dimensions

So I want to apply a function over a matrix in R. This works really intuitively for simple functions:
> (function(x)x*x)(matrix(1:10, nrow=2))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 9 25 49 81
[2,] 4 16 36 64 100
...but clearly I don't understand all of its workings:
> m = (matrix(1:10, nrow=2))
> (function(x) if (x %% 3 == 0) { return(NA) } else { return(x+1) })(m)
[,1] [,2] [,3] [,4] [,5]
[1,] 2 4 6 8 10
[2,] 3 5 7 9 11
Warning message:
In if (x == 3) { :
the condition has length > 1 and only the first element will be used
I read up on this and found out about Vectorize and sapply, which both seemed great and just like what I wanted, except that both of them convert my matrix into a list:
> y = (function(x) if (x %% 3 == 0) { return(NA) } else { return(x+1) })
> sapply(m, y)
[1] 2 3 NA 5 6 NA 8 9 NA 11
> Vectorize(y)(m)
[1] 2 3 NA 5 6 NA 8 9 NA 11
...whereas I'd like to keep it in a matrix with its current dimensions. How might I do this? Thanks!
#Joshua Ulrich (and Dason) has a great answer. And doing it directly without the function y is the best solution. But if you really need to call a function, you can make it faster using vapply. It produces a vector without dimensions (as sapply, but faster), but then you can add them back using structure:
# Your function (optimized)
y = function(x) if (x %% 3) x+1 else NA
m <- matrix(1:1e6,1e3)
system.time( r1 <- apply(m,1:2,y) ) # 4.89 secs
system.time( r2 <- structure(sapply(m, y), dim=dim(m)) ) # 2.89 secs
system.time( r3 <- structure(vapply(m, y, numeric(1)), dim=dim(m)) ) # 1.66 secs
identical(r1, r2) # TRUE
identical(r1, r3) # TRUE
...As you can see, the vapply approach is about 3x faster than apply... And the reason vapply is faster than sapply is that sapply must analyse the result to figure out that it can be simplified to a numeric vector. With vapply, you specified the result type (numeric(1)), so it doesn't have to guess...
UPDATE I figured out another (shorter) way of preserving the matrix structure:
m <- matrix(1:10, nrow=2)
m[] <- vapply(m, y, numeric(1))
You simply assign the new values to the object using m[] <-. Then all other attributes are preserved (like dim, dimnames, class etc).
One way is to use apply on both rows and columns:
apply(m,1:2,y)
[,1] [,2] [,3] [,4] [,5]
[1,] 2 NA 6 8 NA
[2,] 3 5 NA 9 11
You can also do it with subscripting because == is already vectorized:
m[m %% 3 == 0] <- NA
m <- m+1
m
[,1] [,2] [,3] [,4] [,5]
[1,] 2 NA 6 8 NA
[2,] 3 5 NA 9 11
For this specific example you can just do something like this
> # Create some fake data
> mat <- matrix(1:16, 4, 4)
> # Set all elements divisible by 3 to NA
> mat[mat %% 3 == 0] <- NA
> # Add 1 to all non NA elements
> mat <- mat + 1
> mat
[,1] [,2] [,3] [,4]
[1,] 2 6 NA 14
[2,] 3 NA 11 15
[3,] NA 8 12 NA
[4,] 5 9 NA 17
There's a slight refinement of Dason and Josh's solution using ifelse.
mat <- matrix(1:16, 4, 4)
ifelse(mat %% 3 == 0, NA, mat + 1)
[,1] [,2] [,3] [,4]
[1,] 2 6 NA 14
[2,] 3 NA 11 15
[3,] NA 8 12 NA
[4,] 5 9 NA 17

Simple question regarding the use of outer() and user-defined functions?

> fun1 <- function(x,y){x+y}
> outer(seq(1,5,length=5),seq(6,10,length=5),fun1)
[,1] [,2] [,3] [,4] [,5]
[1,] 7 8 9 10 11
[2,] 8 9 10 11 12
[3,] 9 10 11 12 13
[4,] 10 11 12 13 14
[5,] 11 12 13 14 15
> fun2 <- function(x,y){z<-c(x,y);z[1]+z[2]}
> outer(seq(1,5,length=5),seq(6,10,length=5),fun2)
Error in dim(robj) <- c(dX, dY) :
dims [product 25] do not match the length of object [1]
Why doesn't fun2() work? Aren't fun2() and fun1() essentially the same thing?
As an alternative, you can just replace fun2 with Vectorize(fun2) when passing it as argument to outer:
fun2 <- function(x,y){z<-c(x,y);z[1]+z[2]}
outer(seq(1,5,length=5),seq(6,10,length=5), Vectorize(fun2))
The answer becomes obvious if you read ?outer:
Details:
‘X’ and ‘Y’ must be suitable arguments for ‘FUN’. Each will be
extended by ‘rep’ to length the products of the lengths of ‘X’ and
‘Y’ before ‘FUN’ is called.
‘FUN’ is called with these two extended vectors as arguments.
Therefore, it must be a vectorized function (or the name of one),
expecting at least two arguments.
Think about what you are doing, you are concatenating two vectors into one vector, then sum the first and second elements of this vector. fun1() on the other hand does the vectorised sum of the inputs, so the returned object is of the same length as the individual lengths of the inputs. In fun2(), the output is a vector of length 1 and it was expecting 25.
The way to make the idea behind fun2() work is to cbind() not c() the two inputs:
> fun3 <- function(x, y) { z <- cbind(x, y); z[,1] + z[,2]}
> outer(seq(1,5,length=5),seq(6,10,length=5),fun3)
[,1] [,2] [,3] [,4] [,5]
[1,] 7 8 9 10 11
[2,] 8 9 10 11 12
[3,] 9 10 11 12 13
[4,] 10 11 12 13 14
[5,] 11 12 13 14 15

Resources