operating on pairs of columns in R (or numpy) - r

I have two matrices: A (k rows, m columns), B(k rows, n columns)
I want to operate on all pairs of columns (one from A and one from B), the result should be a matrix C (m rows, n columns) where C[i,j] = f(A[,i],B[,j])
now, if the function f was the sum of the dot product, then the whole thing was just a simple multiplication of matrices (C = t(A) %*% B)
but my f is different (specifically, I count the number equal entries:
f = function(x,y) sum(x==y)
my question if there is a simple (and fast, because my matrices are big) way to compute the result?
preferably in R, but possibly in python (numpy). I thought about using outer(A,B,"==") but this results in a 4 dimensional array which I havent figured out what exactly to do with it.
Any help is appreciated

In R, we can split them into list and apply the function f with a nested lapply/sapply
lapply(asplit(A, 2), function(x) sapply(asplit(B, 2), function(y) f(x, y)))
Or using outer after converting to data.frame because the unit will be column, while for matrix, it is a single element (as matrix is a vector with dim attributes)
outer(as.data.frame(A), as.data.frame(B), FUN = Vectorize(f))
data
A <- cbind(1:5, 6:10)
B <- cbind(c(1:3, 1:2), c(5:7, 6:7))

Related

Sapply function for element-wise calculation a matrix in R

I have an (5x4) matrix in R, namely data defined as follows:
set.seed(123)
data <- matrix(rnorm(5*4,mean=0,sd=1), 5, 4)
and I want to create 4 different matrices that follows this formula: Assume that data[,1] = [A1,A2,A3,A4,A5]. I want to create the following matrix:
A1-A1 A1-A2 A1-A3 A1-A4 A1-A5
A2-A1 A2-A2 A2-A3 A2-A4 A2-A5
G1 = A3-A1 A3-A2 A3-A3 A3-A4 A3-A5
A4-A1 A4-A2 A4-A3 A4-A4 A4-A5
A5-A1 A5-A2 A5-A3 A5-A4 A5-A5
Similarly for the other columns i want to calculate at once all the G matrices (G1,G2,G3,G4). How can i achieve that with the sapply funciton?
We may use elementwise subtraction of column with outer
outer(data[,1], data[,1], `-`)
If it should be done on each column, loop over the columns (or do asplit with MARGIN = 2 to split by column), loop over the list and apply the outer
lapply(asplit(data, 2), function(x) outer(x, x, `-`))

R: using mapply for a function of two vectors

I have an R function that calculates the Hamming distance of two vectors:
Hamming = function(x,y){
get_dist = sum(x != y, na.rm=TRUE)
return(get_dist)
}
that I would like to apply to every row of two matrices M1, M2 without using a for loop. What I currently have (where L is the number of rows in M1 and M2) is the very time-consuming loop:
xdiff = c()
for(i in 1:L){
xdiff = c(xdiff, Hamming(M1[i,],M2[i,]))
}
I thought that this could be done by executing
mapply(Hamming, t(M1), t(M2))
(with the transpose because mapply works across columns), but this doesn't generate a length L vector of Hamming distances for each row, so perhaps I'm misunderstanding what mapply is doing.
Is there a straightforward application of mapply or something else in the R apply family that would work?
If dim(M1) and dim(M2) are identical, then you can simply do:
rowSums(M1 != M2, na.rm = TRUE)
Your attempt with mapply didn't work because m-by-n matrices are stored as m*n-length vectors, and mapply handles them as such. To accomplish this with mapply, you would need to split each matrix into a list of row vectors:
mapply(Hamming, asplit(M1, 1L), asplit(M2, 1L))
vapply would be better, though:
vapply(seq_len(nrow(M1)), function(i) Hamming(M1[i, ], M2[i, ]), 0L)
In any case, just use rowSums.

How to multiply columns of two matrix with all combinations

I like to multiply all possible combinations of columns of two matrices that has same rows. Which means two matrices, e.g., a[3x3] and b[3x4] will generate 3x4 matrices with elements a[i,j]*a[k,j]. (i and k represents rows ranging from 1 to 3 and j represent column from 1 to 4)
I have created an example, that can do this job but was looking for elegant solution without for loop.
a <- matrix(1:12,3,4)
b <- matrix(1:9,3,3)
comb<-matrix(NA,3,(ncol(a)*ncol(b)))
for (i in 1:nrow(a)){
comb[i,]<-apply(expand.grid(a[i,],b[i,]),1,prod)
}
comb
Here a is 3x3 matrix, b is 3x4 matrix, and comb gives output of 3x12 matrix by multiplying various columns. I am looking for elegant solution that can be generalized to such multiplication to more than two matrices.
Here are a few one-liners in decreasing order of length:
t(sapply(1:3, function(i) tcrossprod(a[i, ], b[i, ])))
t(mapply(outer, split(a, 1:3), split(b, 1:3)))
matrix(apply(a %o% b, c(2, 4), diag), 3)
(b %x% a)[!!diag(3), ]

Returning head and tail means from list of vectors

I need to calculate the mean (or other summary functions) on the top x and bottom x portions on list of vectors of varying lengths.
Here is a list of 3 vectors of different lengths similar in format with what I am working with:
t <- list(a = exp(-4:3), b = exp(-2:12), c = exp(-5:3))
Ideally, I would like a single vector of numbers for each type of means (I manually ran mean(head(t$a),2)) and mean(tail(t$a),2)) for each vectors):
Ideal output yielding a nameless vector of means of the first two elements from each vector:
[1] 0.2516074 1.859141 0.09256118
Second vector of means for last two entries in each vector:
[1] 1.859141 15064.77 1.859141
Looking for a clever lapply-type construct to get a vector of numbers for each means without the attached names (in this case a,b,c). Thanks!
What about
n = 2
v = lapply(t, function(i) mean(head(i, n)))
The variable v is list. So to get a vector, just use unlist
v = unlist(v)
To extract the numbers use as.vector
as.vector(v)
For the tail, just use
lapply(t, function(i) mean(tail(i, n)))
Using sapply you can wrap this in a function:
sapply(dat,function(x,length=2)
c(mean(head(x,length)),mean(head(x,length))))
# a b c
# [1,] 0.03405135 0.2516074 0.01252679
# [2,] 0.03405135 0.2516074 0.01252679

How to vectorize indexing?

I average coordinates stored in a data frame as follows:
sapply(coords[N:M,],mean) # mean of coordinates N to M
I need the average of several sets of coordinates, so I made this loop, which finds the mean of coordinates 1-4, 5-11 and 20-30.
N <- c(1, 5,20)
M <- c(4,11,30)
for ( i in 1:length(N) ) {
sapply(coords[N(i):M(i),],mean)
}
How can I vectorize that loop? I've tried to pass a matrix to coords (coords[NM,]), but that doesn't give me what I want.
You may replace your sapply(x, mean) by colMeans(x) in the sake of simplicity and efficiency.
Perhaps by a vector thinking you prefer to convert several variables (N and M) to a single vector - here array - when possible and simple.
N <- data.frame(from=c(1,5,20), to=c(4,11,30))
apply(N, 1, function(x) colMeans(coords[x[1]:x[2],]))

Resources