I want to fill a 2x2 matrix for every row (N = 500) of my data.
N = 500 # Number of observations
S = 2 # Number of rows and columns of the data
Let's assume this is my example data. It contains 500 observations of 5 covariates.
X <- data.frame(matrix(rexp(2500, rate=.1), ncol=5))
X
From my model, I retrieved 2 coefficients for each covariate.
beta <- data.frame(matrix(rexp(10, rate=.1), ncol=5))
beta
Because I want to fill a 2x2 matrix for each row of my data, I create an output array of size 22n.
output_array = array(NA, dim = c(S,S,N))
Now I want to fill this array in the following way:
If the position in the 2x2 matrix is [1,1] or [2,2], I want it to be 1.
If the position in the matrix is [1,2], I want it to be the product of the coefficients in the first row of beta and the first row of X
If the position in the matrix is [2,1], I want it to be the product of the coefficients in the second row of beta and the first row of X
I want to follow this procedure for all 500 rows of data (...so it goes through the rows), resulting in 500 2x2 matrices (one for each row of data).
My idea was the following function, but it seems that there is a mismatch in dimensions and I'm doing something wrong.
for(t in 1:N){
betarow = 1
for (k in 1:S){
for (j in 1:S){
if(k == j){
output_array[t,k,j] = 1;
} else {
output_array = X1[t,]*beta[betarow]
betarow = betarow + 1;
}
}
}
}
In R the product of a 5-element vector and a 5-element vector is another 5-element vector, with the values multiplied element-wise. You are trying to put five numbers into a single "cell". Presumably you meant to get the sum of X[i,] * beta[1,] as a scalar and put that into each cell.
Also, in the line output_array = X1[t,]*beta[betarow] you are over-writing the whole of output_array rather than just a single element of it.
Remember to take advantage of vectorization in R where possible. We can just create the matrices individually in an lapply, and create our whole array that way:
X <- data.frame(matrix(rexp(2500, rate=.1), ncol = 5))
beta <- data.frame(matrix(rexp(10, rate=.1), ncol = 5))
output_array <- `dim<-`(unlist(lapply(seq(nrow(X)), function(i) {
matrix(c(1, sum(X[i,] * beta[1,]), sum(X[i,] * beta[2,]), 1), nrow = 2)
})), c(2, 2, nrow(X)))
So the first three "slices" of output_array look like this:
output_array[,,1:3]
#> , , 1
#>
#> [,1] [,2]
#> [1,] 1.0000 184.826
#> [2,] 677.8113 1.000
#>
#> , , 2
#>
#> [,1] [,2]
#> [1,] 1.0000 263.7545
#> [2,] 335.3813 1.0000
#>
#> , , 3
#>
#> [,1] [,2]
#> [1,] 1.0000 156.0655
#> [2,] 235.1856 1.0000
Related
I have a 4x100 matrix where I would like to multiply column 1 with row 1 in its transpose etc and store these matrices somewhere to be able to take the sum of these new matrices lateron.
I really don't know where to start due to the fact that I get 4x4 matrices after the column-row-multiplication. Due to this fact I cannot store them in a matrix
data:
mm num[1:4,1:100]
mm_t num[1:100,1:4]
I'm thinking of creating a list in some way
list1=list()
for(i in 1:100){
list1[i] <- mm[,i]%*%mm_t[i,]
}
but I need some more indices i think because this just leaves me with a number in each argument..
First, your call for data is not clear. Second, are you tryign to multiply each value by itself, or do matrix multiplication
We create a 4x100 matrix and its transpose:
mm <- matrix(1:400, nrow = 4, ncol = 100)
mm.t <- t(mm)
Then we can do the matrix multiplication (which is what you did, and you get a 4 x 4 matrix from the definition of matrix multiplication https://www.wikiwand.com/en/Matrix_multiplication)
If we want to multiply each index by itself (so mm[1,1] by mm [1,1]) then:
mm * mm
This will result in 4x100 matrix where each value is the square of the original value.
If we want the matrix multiplication of each column with itself, then:
sapply(1:100, function(x) {
mm[, x] %*% mm[, x]
})
This results in 100 values: each one is the matrix product of a 4x1 vector with itself.
Let's start with some sample data. Please get in the habit of including things like this in your question:
nr = 4
nc = 100
set.seed(47)
mm = matrix(runif(nr * nc), nrow = nr)
Here's a working answer, very similar to your attempt:
result = list()
for (i in 1:ncol(mm)) result[[i]] = mm[, i] %*% t(mm[, i])
result[1:2]
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 0.9544547 0.3653018 0.7439585 0.8035430
# [2,] 0.3653018 0.1398132 0.2847378 0.3075428
# [3,] 0.7439585 0.2847378 0.5798853 0.6263290
# [4,] 0.8035430 0.3075428 0.6263290 0.6764924
#
# [[2]]
# [,1] [,2] [,3] [,4]
# [1,] 0.3289532 0.3965557 0.2231443 0.2689613
# [2,] 0.3965557 0.4780511 0.2690022 0.3242351
# [3,] 0.2231443 0.2690022 0.1513691 0.1824490
# [4,] 0.2689613 0.3242351 0.1824490 0.2199103
As to why yours didn't work, we can experiment and see that indeed we get a number rather than a matrix. The reason is that when you subset a single row or column of a matrix, the dimensions are "dropped" and it is coerced to a plain vector. And when you matrix multiply two vectors, you get their dot product.
mmt = t(mm)
mm[, 1] %*% mmt[1, ]
# [,1]
# [1,] 2.350646
dim(mm[, 1])
# NULL
dim(mmt[1, ])
# NULL
We can avoid this by specifying drop = FALSE in the subset code
dim(mmt[1, , drop = FALSE])
# [1] 1 4
And thus slightly modify your attempt, just adding drop = FALSE will make it work.
res2 = list()
for (i in 1:ncol(mm)) res2[[i]] = mm[, i] %*% mmt[i, , drop = FALSE]
identical(result, res2)
# [1] TRUE
How to efficiently retrieve top K-similar vectors by cosine similarity using R? asks how to calculate top similar vectors for each vector of one matrix, relative to another matrix. It's satisfactorily answered, and I'd like to tweak it to operate on a single matrix.
That is, I'd like the top k similar other rows for each row in a matrix. I suspect the solution is very similar, but can be optimized.
This function is based on the linked answer:
CosineSimilarities <- function(m, top.k) {
# Computes cosine similarity between each row and all other rows in a matrix.
#
# Args:
# m: Matrix of values.
# top.k: Number of top rows to show for each row.
#
# Returns:
# Data frame with columns for pair of rows, and cosine similarity, for top
# `top.k` rows per row.
#
# Similarity computation
cp <- tcrossprod(m)
mm <- rowSums(m ^ 2)
result <- cp / sqrt(outer(mm, mm))
# Top similar rows from train (per row)
# Use `top.k + 1` to remove the self-reference (similarity = 1)
top <- apply(result, 2, order, decreasing=TRUE)[seq(top.k + 1), ]
result.df <- data.frame(row.id1=c(col(top)), row.id2=c(top))
result.df$cosine.similarity <- result[as.matrix(result.df[, 2:1])]
# Remove same-row records and return
return(result.df[result.df$row.id1 != result.df$row.id2, ])
}
For example:
(m <- matrix(1:9, nrow=3))
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
CosineSimilarities(m, 1)
# row.id1 row.id2 cosine.similarity
# 2 1 2 0.9956
# 4 2 3 0.9977
# 6 3 2 0.9977
I am learning R and reading the book Guide to programming algorithms in r.
The book give an example function:
# MATRIX-VECTOR MULTIPLICATION
matvecmult = function(A,x){
m = nrow(A)
n = ncol(A)
y = matrix(0,nrow=m)
for (i in 1:m){
sumvalue = 0
for (j in 1:n){
sumvalue = sumvalue + A[i,j]*x[j]
}
y[i] = sumvalue
}
return(y)
}
How do I call this function in the R console? And what exactly is passing into this function A, X?
The function takes an argument A, which should be a matrix, and x, which should be a numeric vector of same length as values per row in A.
If
A <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
then you have 3 values (number of columns, ncol) per row, thus x needs to be something like
x <- c(4,5,6)
The function itself iterates all rows, and in each row, each value is multiplied with a value from x, where the value in the first column is multiplied with the first value in x, the value in As second column is multiplied with the second value in x and so on. This is repeated for each row, and the sum for each row is returned by the function.
matvecmult(A, x)
[,1]
[1,] 49 # 1*4 + 3*5 + 5*6
[2,] 64 # 2*4 + 4*5 + 6*6
To run this function, you first have to compile (source) it and then consecutively run these three code lines:
A <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
x <- c(4,5,6)
matvecmult(A, x)
This function is designed to return the product of a matrix A with a vector x; i.e. the result will be the matrix product A x (where - as is usual in R, the vector is a column vector). An example should make things clear.
# define a matrix
mymatrix <- matrix(sample(12), nrow <- 4)
# see what the matrix looks like
mymatrix
# [,1] [,2] [,3]
# [1,] 2 10 9
# [2,] 3 1 12
# [3,] 11 7 5
# [4,] 8 4 6
# define a vector where multiplication of our matrix times the vector will be defined
vec3 <- c(-1,0,1)
# apply the function to our matrix and vector
result <- matvecmult(mymatrix, vec3)
result
# [,1]
# [1,] 7
# [2,] 9
# [3,] -6
# [4,] -2
class(result)
# [1] "matrix"
So matvecmult(mymatrix, vec3) is how you would call this function, and the result is an n by 1 matrix, where n is the number of rows in the matrix argument.
You can also get some insight by playing around and seeing what happens when you pass something other than a matrix-vector pair where the product is defined. In some cases, you will get an error; sometimes you get nonsense; and sometimes you get something you might not expect just from the function name. See what happens when you call matvecmult(mymatrix, mymatrix).
The function is calculating the product of a Matrix and a column vector. It assumes both the number of columns of the matrix is equal to the number of elements in the vector.
It stores the number of columns of A in n and number of rows in m.
It then initializes a matrix of mrows with all values as 0.
It iterates along the rows of A and multiplies each value in each row with the values in x.
The answer is the stored in y and finally it returns the single column matrix y.
I have a matrix with dimensions m by n. For example:
m = 4
n = 10
mat = matrix(rnorm(m*n), nrow = m, ncol=n)
For a certain pair of rows i, j:
i=1
j=2
I compute the correlation between the auto-correlation of row i and the cross-correlation of rows i and j. So given:
lag=5
The auto-correlation of row i would be:
acf.i = acf(mat[i,],lag.max=lag)
the cross-correlation of rows i and j would be:
ccf.i.j = ccf(mat[i,],mat[j,],lag.max=lag)
and the correlation between acf.i and ccf.i.j would be something like:
cor.acf.i.ccf.i.j = cor(acf.i$acf,ccf.i.j$acf[(lag+1):(2*lag+1)])
(since ccf computes the correlation with lag range of: -lag:lag and acf only in the range of 0:lag I arbitrarily choose to take the range 0:lag for ccf.i.j)
What I want is to efficiently do that for each row i and each other row in in mat , over all rows of mat. I guess this function should return a matrix with dimensions m by m.
Make sure you set plot to FALSE for acf, ccf. Then, you can just wrap your code in a call to outer to provide every pair of i and j values. Note that since outer expects a vectorized FUN (e.g. *), we need to vectorize your function:
set.seed(1)
m <- 4
n <- 10
mat <- matrix(rnorm(m*n), nrow = m, ncol=n)
lag <- 5
outer(1:nrow(mat), 1:nrow(mat),
Vectorize(
function(i, j) {
acf.i <- acf(mat[i,],lag.max=lag, plot=F)
ccf.i.j <- ccf(mat[i,],mat[j,],lag.max=lag, plot=F)
cor(acf.i$acf,ccf.i.j$acf[(lag+1):(2*lag+1)])
} ) )
# [,1] [,2] [,3] [,4]
# [1,] 1.0000000 0.47035200 -0.006371955 -0.85880247
# [2,] 0.4133899 1.00000000 -0.462744858 -0.13327111
# [3,] -0.3573965 0.01882691 1.000000000 0.09358042
# [4,] -0.8570117 -0.58359258 0.249930947 1.00000000
This is relatively efficient. There may be a better algorithm than the one you use to get the same answer, but I'm not familiar enough with this stuff to provide it.
I have a large mxn matrix, and I have identified the linearly dependent columns. However, I want to know if there's a way in R to write the linearly dependent columns in terms of the linearly independent ones. Since it's a large matrix, it's not possible to do based on inspection.
Here's a toy example of the type of matrix I have.
> mat <- matrix(c(1,1,0,1,0,1,1,0,0,1,1,0,1,1,0,1,0,1,0,1), byrow=TRUE, ncol=5, nrow=4)
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 0 1 0
[2,] 1 1 0 0 1
[3,] 1 0 1 1 0
[4,] 1 0 1 0 1
Here it's obvious that x3 = x1-x2, x5=x1-x4. I want to know if there's an automated way to get that for a larger matrix.
Thanks!
I'm sure there is a better way but I felt like playing around with this. I basically do a check at the beginning to see if the input matrix is full column rank to avoid unnecessary computation in case it is full rank. After that I start with the first two columns and check if that submatrix is of full column rank, if it is then I check the first thee columns and so on. Once we find some submatrix that isn't of full column rank I regress the last column in that submatrix on the previous one which tells us how to construct linear combinations of the first columns to get the last column.
My function isn't very clean right now and could do some additional checking but at least it's a start.
mat <- matrix(c(1,1,0,1,0,1,1,0,0,1,1,0,1,1,0,1,0,1,0,1), byrow=TRUE, ncol=5, nrow=4)
linfinder <- function(mat){
# If the matrix is full rank then we're done
if(qr(mat)$rank == ncol(mat)){
print("Matrix is of full rank")
return(invisible(seq(ncol(mat))))
}
m <- ncol(mat)
# cols keeps track of which columns are linearly independent
cols <- 1
for(i in seq(2, m)){
ids <- c(cols, i)
mymat <- mat[, ids]
if(qr(mymat)$rank != length(ids)){
# Regression the column of interest on the previous
# columns to figure out the relationship
o <- lm(mat[,i] ~ mat[,cols] + 0)
# Construct the output message
start <- paste0("Column_", i, " = ")
# Which coefs are nonzero
nz <- !(abs(coef(o)) <= .Machine$double.eps^0.5)
tmp <- paste("Column", cols[nz], sep = "_")
vals <- paste(coef(o)[nz], tmp, sep = "*", collapse = " + ")
message <- paste0(start, vals)
print(message)
}else{
# If the matrix subset was of full rank
# then the newest column in linearly independent
# so add it to the cols list
cols <- ids
}
}
return(invisible(cols))
}
linfinder(mat)
which gives
> linfinder(mat)
[1] "Column_3 = 1*Column_1 + -1*Column_2"
[1] "Column_5 = 1*Column_1 + -1*Column_4"