do.call in r with matrix and lists - r

in language R, in order to generate a new matrix (N*6) as from an older one (N*3), is there a better way than the next one to do it without having to "unpack/unlist" the inner lists created in the apply function in order to "expand" the source matrix?
transformed <- matrix(byrow=T)
transformed <- as.matrix(
do.call("rbind", as.list(
apply(dataset, 1, function(x) {
x <- list(x[1], x[2], x[3], x[2]*x[3], x[2]^2, x[3]^2)
})
))
)
#Unpack all inner lists from the expanded matrix
ret_trans <- as.matrix( apply(transformed, 2, function(x) unlist(x)) )
EDIT: I add an example of that
dataset
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 2 7 12
[3,] 3 8 13
[4,] 4 9 14
[5,] 5 10 15
and on applying the code above I want to expand to N*6, 5*6 (sorry, I misspelled the column dimension up there, and the margin of apply function) it should be like that
transformed
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 6 11 66 36 121
[2,] 2 7 12 84 49 144
[3,] 3 8 13 104 64 169
[4,] 4 9 14 126 81 196
[5,] 5 10 15 150 100 225
The question is if there is another way of doing that without having to use the last apply function, without having to coerce the x to be a list
thanks all for your replies

Like suggested in the comments, do:
cbind(dataset, dataset[,2] * dataset[,3], dataset[,c(2, 3)]^2)
It will be a lot faster than using apply, which should have looked like this:
transformed <- function(x) c(x[1], x[2], x[3], x[2]*x[3], x[2]^2, x[3]^2)
apply(dataset, 1, transformed)

Related

How can I multiply two large matrices by corresponding columns and rows

I have two matrices for example:
> A
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> B
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
and I want a vector called C whose element C[i]=A[i,]*B[,i], so the outcome should be:
> C
[,1]
[1,] 76
[2,] 136
I used the for loop for (i in 1:2) {C[i]=A[i,]%*%B[,i]}. But it is very slow.
And I also tried A%*%B and take elements in the diagonal, and it just make my computer crash when the matrix is large.
Could you please give me some suggestions? Thanks so much!
A straight multiplication (not matrix multiplication but element-wise multiplication) could work for what we want. That gets the multiplications we want - after that we just want to take the sum of the rows. If we need the result to be a column matrix we can convert to matrix.
> A <- matrix(1:6, nrow = 2)
> A
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> B <- matrix(7:12, ncol = 2)
> B
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
> rowSums(A * t(B))
[1] 76 136
> as.matrix(rowSums(A * t(B)))
[,1]
[1,] 76
[2,] 136
mapply(function(a,b) sum(a*b), asplit(A, 1), asplit(B, 2))
# [1] 76 136

More general or efficient approach for this matrix multiplication?

In R, is there a more efficient and/or general way to produce the desired output from the two matrices below? I'm suspicious that what I've done is just some esoteric matrix multiplication operation of which I'm not aware.
ff <- matrix(1:6,ncol=2)
# [,1] [,2]
# [1,] 1 4
# [2,] 2 5
# [3,] 3 6
bb <- matrix(7:10,ncol=2)
# [,1] [,2]
# [1,] 7 9
# [2,] 8 10
# DESIRE:
# 7 36
# 14 45
# 21 54
# 8 40
# 16 50
# 24 60
This works, but isn't the general solution I'm looking for:
rr1 <- t(t(ff) * bb[1,])
rr2 <- t(t(ff) * bb[2,])
rbind(rr1,rr2)
# [,1] [,2]
# [1,] 7 36
# [2,] 14 45
# [3,] 21 54
# [4,] 8 40
# [5,] 16 50
# [6,] 24 60
This next code block seems pretty efficient and is general. But is there a better way?
Something like kronecker(ffa,bba)? (which clearly doesn't work in this case)
ffa <- matrix(rep(t(ff),2), ncol=2, byrow=T)
bba <- matrix(rep(bb,each=3), ncol=2)
ffa * bba
# [,1] [,2]
# [1,] 7 36
# [2,] 14 45
# [3,] 21 54
# [4,] 8 40
# [5,] 16 50
# [6,] 24 60
This is related to my other questions:
Using apply function over the row margin with expectation of stacked results, where I'm trying to understand the behavior of apply itself and:
Is this an example of some more general matrix product?, where I'm asking about the theoretical math, specifically.
Use a kronecker product and pick off the appropriate columns:
kronecker(bb, ff)[, c(diag(ncol(bb))) == 1]
or using the infix operator for kronecker:
(bb %x% ff)[, c(diag(ncol(bb))) == 1]
Another approach is to convert the arguments to data frames and mapply kronecker across them. For the case in the question this performs the calculation cbind(bb[, 1] %x% ff[, 1], bb[, 2] %x% ff[, 2]) but in a more general manner without resorting to indices:
mapply(kronecker, as.data.frame(bb), as.data.frame(ff))
or using the infix operator for kronecker:
mapply(`%x%`, as.data.frame(bb), as.data.frame(ff))
The functionality you are seeking for is available within the Matrix package as the function KhatriRao. Since the function is in Matrix, output is a matrix of class "dgCMatrix" (sparse matrix). You can transform it to an ordinary matrix of class "matrix" by as.matrix.
library(Matrix)
as.matrix(KhatriRao(bb, ff))

apply function to subsets of each row in R

I am struggling to find a way to apply a specific function using apply, only to a "chunk" of a specific row.
For instance, I have a matrix:
x <- matrix(c(5,12,4,3,2,8,10,7,9,1,11,6),nrow=3)
[,1] [,2] [,3] [,4]
[1,] 5 3 10 1
[2,] 12 2 7 11
[3,] 4 8 9 6
And I would like to end up with a new matrix, made up of a sum of the first and last two values in each row. Like so:
[,1] [,2]
[1,] 8 11
[2,] 14 18
[3,] 12 15
I have tried something like this:
chunks<-c("1:2","3:4")
sumchunks<-function(x,chunks){
apply(x,1,
function(row){
for (i in chunks){
v<-sum(row[chunks[i]])
}})
}
But it doesn't work at all. Any suggestion on successful ways?
Thank you.
You can do:
chunks <- list(1:2, 3:4)
sumchunks <- function(x, chunks) sapply(chunks, function(ch) sum(x[ch]))
x <- matrix(c(5,12,4,3,2,8,10,7,9,1,11,6),nrow=3)
apply(x, 1, sumchunks, chunks=chunks)
# [,1] [,2] [,3]
# [1,] 8 14 12
# [2,] 11 18 15
Eventually you want to transpose the result.
Here is a vectorized variant:
chunks <- list(1:2, 3:4)
x <- matrix(c(5,12,4,3,2,8,10,7,9,1,11,6),nrow=3)
sapply(chunks, function(ch) rowSums(x[,ch]))
# [,1] [,2]
# [1,] 8 11
# [2,] 14 18
# [3,] 12 15
We can convert to array and then do
t(apply(array(x, c(3, 2, 2)), 1, colSums))
Or
sapply(seq(1, ncol(x), 2), function(i) rowSums(x[,i:(i+1)]))
# [,1] [,2]
#[1,] 8 11
#[2,] 14 18
#[3,] 12 15
like this?
x <- matrix(sample(1:12),nrow=3)
f = function(s) {
c(sum(s[1:2]), sum(s[3:4]))
}
t(apply(x, 1, f))
rowSums was built to sum over rows so should be quite fast. You can limit the columns you want to sum over and then cbind them to get what you want:
cbind(rowSums(x[,c(1,2)]), rowSums(x[,c(3,4)]))
# [,1] [,2]
#[1,] 8 11
#[2,] 14 18
#[3,] 12 15

How do I apply a multi-parameter function in R?

I have the following data frame and vector.
> y
v1 v2 v3
1 1 6 43
2 4 7 5
3 0 2 32
> v
[1] 1 2 3
I want to apply the following function to every ROW in that data frame such that v is added to every ROW of y:
x <- function(vector1,vector2) {
x <- vector1 + vector2
}
... in order to get THESE results:
v1 v2 v3
1 2 8 46
2 5 9 8
3 1 4 35
mapply applies the function to COLUMNS:
> z <- mapply(x, y, MoreArgs=list(vector2=v))
> z
v1 v2 v3
[1,] 2 7 44
[2,] 6 9 7
[3,] 3 5 35
I've tried transposing the data frame so that the function will be applied to rows and not columns, but mapply gives me weird results after transposing:
> transposed <- t(y)
> transposed
[,1] [,2] [,3]
v1 1 4 0
v2 6 7 2
v3 43 5 32
> z <- mapply(x, transposed, MoreArgs=list(vector2=v))
> z
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 2 7 44 5 8 6 1 3 33
[2,] 3 8 45 6 9 7 2 4 34
[3,] 4 9 46 7 10 8 3 5 35
...Help?
############################ EDIT #########################
Thanks for all the answers! I'm learning tons of new R functions that I've never seen before, which is fantastic.
I want to clarify my earlier question a bit. What I'm really asking is a much more general question - how to apply a multi-parameter function to each row in R (at the moment, I'm tempted to conclude that I should just use a loop, but I would like to figure out if it IS possible, just for future reference...) (I also purposefully refrained from showing the code I'm working with since it's kind of messy).
I tried using the sweep function as was suggested, but I get the following error:
testsweep <- function(vector, z, n) {
testsweep <- z
}
> n <- names(Na_exp)
> n
[1] "NaCl.10000.2hr.AVG_Signal" "NaCl.10000.4hr.AVG_Signal"
> t <- head(Li_fcs,n=1)
> t
LiCl.1000.1hr.FoldChange LiCl.2000.1hr.FoldChange LiCl.5000.1hr.FoldChange
[1,] -0.05371838 -0.1010928 -0.01939986
LiCl.10000.1hr.FoldChange LiCl.1000.2hr.FoldChange
[1,] 0.1275617 -0.107154
LiCl.2000.2hr.FoldChange LiCl.5000.2hr.FoldChange
[1,] -0.06760782 -0.09770226
LiCl.10000.2hr.FoldChange LiCl.1000.4hr.FoldChange
[1,] -0.1124188 -0.06140386
LiCl.2000.4hr.FoldChange LiCl.5000.4hr.FoldChange
[1,] -0.04323497 -0.04275953
LiCl.10000.4hr.FoldChange LiCl.1000.8hr.FoldChange
[1,] 0.03633496 0.01879461
LiCl.2000.8hr.FoldChange LiCl.5000.8hr.FoldChange
[1,] 0.257977 -0.06357423
LiCl.10000.8hr.FoldChange
[1,] 0.07214176
> z <- colnames(Li_fcs)
> z
[1] "LiCl.1000.1hr.FoldChange" "LiCl.2000.1hr.FoldChange"
[3] "LiCl.5000.1hr.FoldChange" "LiCl.10000.1hr.FoldChange"
[5] "LiCl.1000.2hr.FoldChange" "LiCl.2000.2hr.FoldChange"
[7] "LiCl.5000.2hr.FoldChange" "LiCl.10000.2hr.FoldChange"
[9] "LiCl.1000.4hr.FoldChange" "LiCl.2000.4hr.FoldChange"
[11] "LiCl.5000.4hr.FoldChange" "LiCl.10000.4hr.FoldChange"
[13] "LiCl.1000.8hr.FoldChange" "LiCl.2000.8hr.FoldChange"
[15] "LiCl.5000.8hr.FoldChange" "LiCl.10000.8hr.FoldChange"
But when I try to apply sweep...
> test <- sweep(t, 2, z, n, FUN="testsweep")
Error in if (check.margin) { : argument is not interpretable as logical
In addition: Warning message:
In if (check.margin) { :
the condition has length > 1 and only the first element will be used
When I remove the n parameter from this test example, sweep works fine. This suggests to me that sweep cannot be used unless the all parameters provided to sweep are either the same number of columns as the t vector, or of length 1. Please correct me if I am mistaken...
You are asking to "sweeping" v across rows of y with the "+" function:
sweep(y, 1, v, FUN="+")
v1 v2 v3
1 2 7 44
2 6 9 7
3 3 5 35
If your actual problem is really no more complicated than this, you can take advantage of R's recycling rules. You need to transpose y first, then add, then transpose the result because R matrices are stored in column-major order.
t(t(y)+v)
v1 v2 v3
1 2 8 46
2 5 9 8
3 1 4 35
I don't think you need mapply here. Just use t() directly or you can use rep() to make the recycling match as you want:
> set.seed(1)
> mat <- matrix(sample(1:100, 9, TRUE), ncol = 3)
> vec <- 1:3
>
> mat
[,1] [,2] [,3]
[1,] 27 91 95
[2,] 38 21 67
[3,] 58 90 63
#Approach 1 using t()
> ans1 <- t(t(mat) + vec)
#Approach 2 using rep()
> ans2 <- mat + rep(vec, each = nrow(mat))
#Are they the same?
> identical(ans1, ans2)
[1] TRUE
#Hurray!
> ans1
[,1] [,2] [,3]
[1,] 28 93 98
[2,] 39 23 70
[3,] 59 92 66
How about using apply?
t(apply(y, 1, function(x) x + v))
[,1] [,2] [,3]
[1,] 2 8 46
[2,] 5 9 8
[3,] 1 4 35
I don't know why apply returns the row as columms so it needs to be transposed.
I would defintely take a look at mdply form the plyr package. This exactly does what you want to do:
mdply(data.frame(mean = 1:5, sd = 1:5), rnorm, n = 2)

Form matrix from rows in 3-dimensional array

I have X, a three-dimensional array in R. I want to take a vector of indices indx (length equal to dim(X)[1]) and form a matrix where the first row is the first row of X[ , , indx[1]], the second row is the second row of X[ , , indx[2]], and so on.
For example, I have:
R> X <- array(1:18, dim = c(3, 2, 3))
R> X
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 3
[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18
R> indx <- c(2, 3, 1)
My desired output is
R> rbind(X[1, , 2], X[2, , 3], X[3, , 1])
[,1] [,2]
[1,] 7 10
[2,] 14 17
[3,] 3 6
As of now I'm using the inelegant (and slow) sapply(1:dim(X)[2], function(x) X[cbind(1:3, x, indx)]). Is there any way to do this using the built-in indexing functions? I had no luck experimenting with the matrix indexing methods described in ?Extract, but I may just be doing it wrong.
Maybe like this:
t(sapply(1:3,function(x) X[,,idx][x,,x]))
I may be answering the wrong question (I can't reconcile your first description and your sample output)... This produces your sample output, but I can't say that it's much faster without running it on your data.
do.call(rbind, lapply(1:dim(X)[1], function(i) X[i, , indx[i]]))
Matrix indexing to the rescue! No applys needed.
Figure out which indices you want:
n <- dim(X)[2]
foo <- cbind(rep(seq_along(indx),n),
rep(seq.int(n), each=length(indx)),
rep(indx,n))
(the result is this)
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 1 3
[3,] 3 1 1
[4,] 1 2 2
[5,] 2 2 3
[6,] 3 2 1
and use it as index, converting back to a matrix to make it look like your output.
> matrix(X[foo],ncol=n)
[,1] [,2]
[1,] 7 10
[2,] 14 17
[3,] 3 6

Resources