sorting correlation matrix R - r

I have created a correlation matrix in R, using the cor function.
I would like to extract the 10 largest (closest to 1) and 10 smallest (closest to -1) from this matrix with the corresponding row and column indices.
Here is a sample code of how I am obtaining the correlation matrix:
xs = rnorm(10000)
ys = rnorm(10000)
zs = rnorm(10000)
cor1 <- cor(data.frame(xs,ys,zs))
I obtain:
xs ys zs
xs 1.00000000 -0.01077785 -0.01308803
ys -0.01077785 1.00000000 0.01176254
zs -0.01308803 0.01176254 1.00000000
Any suggestions?
Thanks!

If mat is your correlation matrix, you can get the locations of the top and bottom 10 like this...
min10 <- which(mat<=sort(mat)[10], arr.ind = TRUE)
max10 <- which(-mat<=sort(-mat)[10], arr.ind = TRUE)
Each of these is a n x 2 matrix, where the columns are the row and column numbers of mat for those elements meeting the criteria.

Related

Tensorflow: Find greater than pairs and stack along axis

The problem I have using tensorflow is as follows:
For one tensor X with dims n X m
X = [[x11,x12...,x1m],[x21,x22...,x2m],...[xn1,xn2...,xnm]]
I want to get an n X m X m tensor which are n m X m matrices
Each m X m matrix is the result of:
tf.math.greater(tf.reshape(x,(-1,1)), x) where x is a row of X
In words, for every row k in X, Im trying to get the pairs i,j where xki > xkj. This gives me a matrix, and then I want to stack those matrices along the first axis, to get a n m x m cube.
Example:
X = [[1,2],[4,3], [5,7]
Result = [[[False, False],[True, False]],[[False, True],[False, False]], [[False, False],[True, False]]]
Result has shape 3 X 2 X 2
Reshaping each row is the same as reshaping all rows. Try this:
def fun(X):
n, m = X.shape
X1 = tf.expand_dims(X, -1)
X2 = tf.reshape(X, (n, 1, m))
return tf.math.greater(X1, X2)
X = tf.Variable([[1,2],[4,3], [5,7]])
print(fun(X))
Output:
tf.Tensor(
[[[False False]
[ True False]]
[[False True]
[False False]]
[[False False]
[ True False]]], shape=(3, 2, 2), dtype=bool)

Simplifying matrix product with one unknown variable

I have to compute a product of 3 matrices D=ABC with:
A is a (1x3) matrix,
B is a (3x3) matrix,
C is a (3x1) matrix (and is equal to A', if it matters)
The result of this product is a simple value, and the calculation is very straightforward in R.
My problem is there is one unknown, namely X, inside A and C, and I would like to get the result as a formula: D = ABD = f(X).
Is there any way I could achieve this with R ?
Define D as shown below where argument B is the square matrix and A is a function of x returning a vector.
D <- function(B, A) function(x) t(A(x)) %*% B %*% A(x)
# test
A <- function(x) seq(3) * x
B <- matrix(1:9, 3)
Dfun <- D(B, A)
Dfun(10)
## [1] 22800

How to quickly multiply a list of matrices by a list of vectors?

I have a (13*122) x (14) matrix (122 stacked 13x14's), which I made into a list of 122 individual 13 x 14 matrices.
set.seed(1)
mat = matrix(rnorm(13*122*14,0,1),(13*122),14)
I have another matrix that is 122 x 14.
beta = matrix(rnorm(122*14,0,1),122,14)
I want to multiply each stacked matrix by the correspond row in beta, so the first 13 x 14 matrix would get multiplied by beta[1,] (which is 14x1), so I'd get 13x1 matrix, etc.
Should I do this with a list or is it unnecessary? I would like it to be as fast as possible.
I want to return a 13 x 122 matrix.
We could split the matrix into a 'list' of length '122' and use mapply to do the %*% of corresponding elements of 'lst' and rows of 'beta'
lst <- lapply(split(1:nrow(mat),(1:nrow(mat)-1) %/%13+1),
function(i) mat[i,])
res <- mapply(`%*%`, lst, split(beta, row(beta)))
dim(res)
#[1] 13 122
Or we could convert the matrix to array and then do the multiplication, which I guess would be fast
mat1 <- mat #if we need a copy of the original matrix
dim(mat1) <- c(13, 122, 14)
mat2 <- aperm(mat1, c(1,3,2))
res2 <- matrix(, ncol=122, nrow=13)
for(i in 1:(dim(mat2)[3])){
res2[,i] <- mat2[,,i] %*%beta[i,]
}
all.equal(res, res2, check.attributes=FALSE)
#[1] TRUE
Try this:
mat <- lapply(1:122, function(x) matrix(data = rnorm(13*14,0,1), nrow = 13, ncol = 14))
mat2 <- lapply(1:122, function(x) mat[[x]] %*% beta[x,])
look for the book introduction to algorithms and look at page 331. There is a pseodu algortihm to do so. you have to make a three of matrix products where it will sort it so that it will be an optimum for multiplication but short hand, if you have three matrices M1 of m x n, M2 of n x v, M3 of v x w then you wish to know if (M1 * M2) * M3 or M1 * (M2 * M3) is better the answer is to calculate the to numbers mnv and nvw and deside which is biggest. the smallest one is always better.

Computing the correlation between the auto-correlation and cross-correlation for each pair of rows in a matrix

I have a matrix with dimensions m by n. For example:
m = 4
n = 10
mat = matrix(rnorm(m*n), nrow = m, ncol=n)
For a certain pair of rows i, j:
i=1
j=2
I compute the correlation between the auto-correlation of row i and the cross-correlation of rows i and j. So given:
lag=5
The auto-correlation of row i would be:
acf.i = acf(mat[i,],lag.max=lag)
the cross-correlation of rows i and j would be:
ccf.i.j = ccf(mat[i,],mat[j,],lag.max=lag)
and the correlation between acf.i and ccf.i.j would be something like:
cor.acf.i.ccf.i.j = cor(acf.i$acf,ccf.i.j$acf[(lag+1):(2*lag+1)])
(since ccf computes the correlation with lag range of: -lag:lag and acf only in the range of 0:lag I arbitrarily choose to take the range 0:lag for ccf.i.j)
What I want is to efficiently do that for each row i and each other row in in mat , over all rows of mat. I guess this function should return a matrix with dimensions m by m.
Make sure you set plot to FALSE for acf, ccf. Then, you can just wrap your code in a call to outer to provide every pair of i and j values. Note that since outer expects a vectorized FUN (e.g. *), we need to vectorize your function:
set.seed(1)
m <- 4
n <- 10
mat <- matrix(rnorm(m*n), nrow = m, ncol=n)
lag <- 5
outer(1:nrow(mat), 1:nrow(mat),
Vectorize(
function(i, j) {
acf.i <- acf(mat[i,],lag.max=lag, plot=F)
ccf.i.j <- ccf(mat[i,],mat[j,],lag.max=lag, plot=F)
cor(acf.i$acf,ccf.i.j$acf[(lag+1):(2*lag+1)])
} ) )
# [,1] [,2] [,3] [,4]
# [1,] 1.0000000 0.47035200 -0.006371955 -0.85880247
# [2,] 0.4133899 1.00000000 -0.462744858 -0.13327111
# [3,] -0.3573965 0.01882691 1.000000000 0.09358042
# [4,] -0.8570117 -0.58359258 0.249930947 1.00000000
This is relatively efficient. There may be a better algorithm than the one you use to get the same answer, but I'm not familiar enough with this stuff to provide it.

Weighted Pearson's Correlation?

I have a 2396x34 double matrix named y wherein each row (2396) represents a separate situation consisting of 34 consecutive time segments.
I also have a numeric[34] named x that represents a single situation of 34 consecutive time segments.
Currently I am calculating the correlation between each row in y and x like this:
crs[,2] <- cor(t(y),x)
What I need now is to replace the cor function in the above statement with a weighted correlation. The weight vector xy.wt is 34 elements long so that a different weight can be assigned to each of the 34 consecutive time segments.
I found the Weighted Covariance Matrix function cov.wt and thought that if I first scale the data it should work just like the cor function. In fact you can specify for the function to return a correlation matrix as well. Unfortunately it does not seem like I can use it in the same manner because I cannot supply my two variables (x and y) separately.
Does anyone know of a way I can get a weighted correlation in the manner I described without sacrificing much speed?
Edit: Perhaps some mathematical function could be applied to y prior to the cor function in order to get the same results that I'm looking for. Maybe if I multiply each element by xy.wt/sum(xy.wt)?
Edit #2 I found another function corr in the boot package.
corr(d, w = rep(1, nrow(d))/nrow(d))
d
A matrix with two columns corresponding to the two variables whose correlation we wish to calculate.
w
A vector of weights to be applied to each pair of observations. The default is equal weights for each pair. Normalization takes place within the function so sum(w) need not equal 1.
This also is not what I need but it is closer.
Edit #3
Here is some code to generate the type of data I am working with:
x<-cumsum(rnorm(34))
y<- t(sapply(1:2396,function(u) cumsum(rnorm(34))))
xy.wt<-1/(34:1)
crs<-cor(t(y),x) #this works but I want to use xy.wt as weight
Unfortunately the accepted answer is wrong when y is a matrix of more than one row. The error is in the line
vy <- rowSums( w * y * y )
We want to multiply the columns of y by w, but this will multiply the rows by the elements of w, recycled as necessary. Thus
> f(x, y[1, , drop = FALSE], xy.wt)
[1] 0.103021
is correct, because in this case the multiplication is performed element-wise, which is equivalent to column-wise multiplication here, but
> f(x, y, xy.wt)[1]
[1] 0.05463575
gives a wrong answer due to the row-wise multiplication.
We can correct the function as follows
f2 <- function( x, y, w = rep(1,length(x))) {
stopifnot(length(x) == dim(y)[2] )
w <- w / sum(w)
# Center x and y, using the weighted means
x <- x - sum(x * w)
ty <- t(y - colSums(t(y) * w))
# Compute the variance
vx <- sum(w * x * x)
vy <- colSums(w * ty * ty)
# Compute the covariance
vxy <- colSums(ty * x * w)
# Compute the correlation
vxy / sqrt(vx * vy)
}
and check the results against those produced by corr from the boot package:
> res1 <- f2(x, y, xy.wt)
> res2 <- sapply(1:nrow(y),
+ function(i, x, y, w) corr(cbind(x, y[i,]), w = w),
+ x = x, y = y, w = xy.wt)
> all.equal(res1, res2)
[1] TRUE
which in itself gives another way that this problem could be solved.
You can go back to the definition of the correlation.
f <- function( x, y, w = rep(1,length(x))) {
stopifnot( length(x) == dim(y)[2] )
w <- w / sum(w)
# Center x and y, using the weighted means
x <- x - sum(x*w)
y <- y - apply( t(y) * w, 2, sum )
# Compute the variance
vx <- sum( w * x * x )
vy <- rowSums( w * y * y ) # Incorrect: see Heather's remark, in the other answer
# Compute the covariance
vxy <- colSums( t(y) * x * w )
# Compute the correlation
vxy / sqrt(vx * vy)
}
f(x,y)[1]
cor(x,y[1,]) # Identical
f(x, y, xy.wt)
Here is a generalization to compute the weighted Pearson correlation between two matrices (instead of a vector and a matrix, as in the original question):
matrix.corr <- function (a, b, w = rep(1, nrow(a))/nrow(a))
{
# normalize weights
w <- w / sum(w)
# center matrices
a <- sweep(a, 2, colSums(a * w))
b <- sweep(b, 2, colSums(b * w))
# compute weighted correlation
t(w*a) %*% b / sqrt( colSums(w * a**2) %*% t(colSums(w * b**2)) )
}
Using the above example and the correlation function from Heather, we can verify it:
> sum(matrix.corr(as.matrix(x, nrow=34),t(y),xy.wt) - f2(x,y,xy.wt))
[1] 1.537507e-15
In terms of calling syntax, this resembles the unweighted cor:
> a <- matrix( c(1,2,3,1,3,2), nrow=3)
> b <- matrix( c(2,3,1,1,7,3,5,2,8,1,10,12), nrow=3)
> matrix.corr(a,b)
[,1] [,2] [,3] [,4]
[1,] -0.5 0.3273268 0.5 0.9386522
[2,] 0.5 0.9819805 -0.5 0.7679882
> cor(a, b)
[,1] [,2] [,3] [,4]
[1,] -0.5 0.3273268 0.5 0.9386522
[2,] 0.5 0.9819805 -0.5 0.7679882

Resources