I am learning R and reading the book Guide to programming algorithms in r.
The book give an example function:
# MATRIX-VECTOR MULTIPLICATION
matvecmult = function(A,x){
m = nrow(A)
n = ncol(A)
y = matrix(0,nrow=m)
for (i in 1:m){
sumvalue = 0
for (j in 1:n){
sumvalue = sumvalue + A[i,j]*x[j]
}
y[i] = sumvalue
}
return(y)
}
How do I call this function in the R console? And what exactly is passing into this function A, X?
The function takes an argument A, which should be a matrix, and x, which should be a numeric vector of same length as values per row in A.
If
A <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
then you have 3 values (number of columns, ncol) per row, thus x needs to be something like
x <- c(4,5,6)
The function itself iterates all rows, and in each row, each value is multiplied with a value from x, where the value in the first column is multiplied with the first value in x, the value in As second column is multiplied with the second value in x and so on. This is repeated for each row, and the sum for each row is returned by the function.
matvecmult(A, x)
[,1]
[1,] 49 # 1*4 + 3*5 + 5*6
[2,] 64 # 2*4 + 4*5 + 6*6
To run this function, you first have to compile (source) it and then consecutively run these three code lines:
A <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
x <- c(4,5,6)
matvecmult(A, x)
This function is designed to return the product of a matrix A with a vector x; i.e. the result will be the matrix product A x (where - as is usual in R, the vector is a column vector). An example should make things clear.
# define a matrix
mymatrix <- matrix(sample(12), nrow <- 4)
# see what the matrix looks like
mymatrix
# [,1] [,2] [,3]
# [1,] 2 10 9
# [2,] 3 1 12
# [3,] 11 7 5
# [4,] 8 4 6
# define a vector where multiplication of our matrix times the vector will be defined
vec3 <- c(-1,0,1)
# apply the function to our matrix and vector
result <- matvecmult(mymatrix, vec3)
result
# [,1]
# [1,] 7
# [2,] 9
# [3,] -6
# [4,] -2
class(result)
# [1] "matrix"
So matvecmult(mymatrix, vec3) is how you would call this function, and the result is an n by 1 matrix, where n is the number of rows in the matrix argument.
You can also get some insight by playing around and seeing what happens when you pass something other than a matrix-vector pair where the product is defined. In some cases, you will get an error; sometimes you get nonsense; and sometimes you get something you might not expect just from the function name. See what happens when you call matvecmult(mymatrix, mymatrix).
The function is calculating the product of a Matrix and a column vector. It assumes both the number of columns of the matrix is equal to the number of elements in the vector.
It stores the number of columns of A in n and number of rows in m.
It then initializes a matrix of mrows with all values as 0.
It iterates along the rows of A and multiplies each value in each row with the values in x.
The answer is the stored in y and finally it returns the single column matrix y.
Related
I want to fill a 2x2 matrix for every row (N = 500) of my data.
N = 500 # Number of observations
S = 2 # Number of rows and columns of the data
Let's assume this is my example data. It contains 500 observations of 5 covariates.
X <- data.frame(matrix(rexp(2500, rate=.1), ncol=5))
X
From my model, I retrieved 2 coefficients for each covariate.
beta <- data.frame(matrix(rexp(10, rate=.1), ncol=5))
beta
Because I want to fill a 2x2 matrix for each row of my data, I create an output array of size 22n.
output_array = array(NA, dim = c(S,S,N))
Now I want to fill this array in the following way:
If the position in the 2x2 matrix is [1,1] or [2,2], I want it to be 1.
If the position in the matrix is [1,2], I want it to be the product of the coefficients in the first row of beta and the first row of X
If the position in the matrix is [2,1], I want it to be the product of the coefficients in the second row of beta and the first row of X
I want to follow this procedure for all 500 rows of data (...so it goes through the rows), resulting in 500 2x2 matrices (one for each row of data).
My idea was the following function, but it seems that there is a mismatch in dimensions and I'm doing something wrong.
for(t in 1:N){
betarow = 1
for (k in 1:S){
for (j in 1:S){
if(k == j){
output_array[t,k,j] = 1;
} else {
output_array = X1[t,]*beta[betarow]
betarow = betarow + 1;
}
}
}
}
In R the product of a 5-element vector and a 5-element vector is another 5-element vector, with the values multiplied element-wise. You are trying to put five numbers into a single "cell". Presumably you meant to get the sum of X[i,] * beta[1,] as a scalar and put that into each cell.
Also, in the line output_array = X1[t,]*beta[betarow] you are over-writing the whole of output_array rather than just a single element of it.
Remember to take advantage of vectorization in R where possible. We can just create the matrices individually in an lapply, and create our whole array that way:
X <- data.frame(matrix(rexp(2500, rate=.1), ncol = 5))
beta <- data.frame(matrix(rexp(10, rate=.1), ncol = 5))
output_array <- `dim<-`(unlist(lapply(seq(nrow(X)), function(i) {
matrix(c(1, sum(X[i,] * beta[1,]), sum(X[i,] * beta[2,]), 1), nrow = 2)
})), c(2, 2, nrow(X)))
So the first three "slices" of output_array look like this:
output_array[,,1:3]
#> , , 1
#>
#> [,1] [,2]
#> [1,] 1.0000 184.826
#> [2,] 677.8113 1.000
#>
#> , , 2
#>
#> [,1] [,2]
#> [1,] 1.0000 263.7545
#> [2,] 335.3813 1.0000
#>
#> , , 3
#>
#> [,1] [,2]
#> [1,] 1.0000 156.0655
#> [2,] 235.1856 1.0000
I have a 4x100 matrix where I would like to multiply column 1 with row 1 in its transpose etc and store these matrices somewhere to be able to take the sum of these new matrices lateron.
I really don't know where to start due to the fact that I get 4x4 matrices after the column-row-multiplication. Due to this fact I cannot store them in a matrix
data:
mm num[1:4,1:100]
mm_t num[1:100,1:4]
I'm thinking of creating a list in some way
list1=list()
for(i in 1:100){
list1[i] <- mm[,i]%*%mm_t[i,]
}
but I need some more indices i think because this just leaves me with a number in each argument..
First, your call for data is not clear. Second, are you tryign to multiply each value by itself, or do matrix multiplication
We create a 4x100 matrix and its transpose:
mm <- matrix(1:400, nrow = 4, ncol = 100)
mm.t <- t(mm)
Then we can do the matrix multiplication (which is what you did, and you get a 4 x 4 matrix from the definition of matrix multiplication https://www.wikiwand.com/en/Matrix_multiplication)
If we want to multiply each index by itself (so mm[1,1] by mm [1,1]) then:
mm * mm
This will result in 4x100 matrix where each value is the square of the original value.
If we want the matrix multiplication of each column with itself, then:
sapply(1:100, function(x) {
mm[, x] %*% mm[, x]
})
This results in 100 values: each one is the matrix product of a 4x1 vector with itself.
Let's start with some sample data. Please get in the habit of including things like this in your question:
nr = 4
nc = 100
set.seed(47)
mm = matrix(runif(nr * nc), nrow = nr)
Here's a working answer, very similar to your attempt:
result = list()
for (i in 1:ncol(mm)) result[[i]] = mm[, i] %*% t(mm[, i])
result[1:2]
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 0.9544547 0.3653018 0.7439585 0.8035430
# [2,] 0.3653018 0.1398132 0.2847378 0.3075428
# [3,] 0.7439585 0.2847378 0.5798853 0.6263290
# [4,] 0.8035430 0.3075428 0.6263290 0.6764924
#
# [[2]]
# [,1] [,2] [,3] [,4]
# [1,] 0.3289532 0.3965557 0.2231443 0.2689613
# [2,] 0.3965557 0.4780511 0.2690022 0.3242351
# [3,] 0.2231443 0.2690022 0.1513691 0.1824490
# [4,] 0.2689613 0.3242351 0.1824490 0.2199103
As to why yours didn't work, we can experiment and see that indeed we get a number rather than a matrix. The reason is that when you subset a single row or column of a matrix, the dimensions are "dropped" and it is coerced to a plain vector. And when you matrix multiply two vectors, you get their dot product.
mmt = t(mm)
mm[, 1] %*% mmt[1, ]
# [,1]
# [1,] 2.350646
dim(mm[, 1])
# NULL
dim(mmt[1, ])
# NULL
We can avoid this by specifying drop = FALSE in the subset code
dim(mmt[1, , drop = FALSE])
# [1] 1 4
And thus slightly modify your attempt, just adding drop = FALSE will make it work.
res2 = list()
for (i in 1:ncol(mm)) res2[[i]] = mm[, i] %*% mmt[i, , drop = FALSE]
identical(result, res2)
# [1] TRUE
Does anyone know of a way to add up combinations of numbers within a vector?
Suppose I am going through a for loop and each time I end up with a vector of different lengths, how could I combine each element of this vector such that I have the sum of 2, 3, etc elements?
For example if I have:
vector <- c(1:5)
And want to go through it as in:
element 1 + element 2; element 2 + element 3, etc
But also:
element 1 + element 2 + element 3
How would I do this? It's important to note that in many of the vectors the lengths will be different. So whilst one vector might contain 3 elements another might contain 12.
I know you can do vector[1]+vector[2], but I need some way to iterate throughout the vector wherein it takes into account the above note.
Use you can use combn:
> combn(vector, 3, FUN = NULL, simplify = TRUE)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 2 2 2 3
[2,] 2 2 2 3 3 4 3 3 4 4
[3,] 3 4 5 4 5 5 4 5 5 5
The trick here is that each call will return a matrix of results, and you will have to decide how you want to aggregate and store all the various combinations.
If you don't mind having a list, then the following should do the trick:
> sapply(c(1:length(vector)),
function(x) {
combn(vector, x, FUN = NULL, simplify = TRUE)
})
Generate pair IDs
In this case, we need to get the pairs:
combn(3, 2)
Output:
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 3 3
Pairs are generated by column.
Sum Over Vector Elements (Using a Subset)
To access each element and perform a summation, we opt to define a helper function that takes the combination and the vector.
# Write a helper function
# sums of the index of the vector
comb_subset_sum = function(x, vec){
return(sum(vec[x]))
}
From this, we can use combn directly or use sapply.
Summing for 1 k:
combn directly:
# Input Vector
vec = 1:5
# Length of vector
n = length(vec)
# Generate pairwise combinations and obtain pair_sum
# Specify the k (m in R)
m = combn(n, m = 2, FUN = comb_subset_sum, vec = vec)
sapply usage:
# Input Vector
vec = 1:5
# Number of Observations
n = length(vec)
# Combinations
# Specify the k (m in R)
combinations = combn(n, m = 2)
# Obtain vectorized sum over subset
subset_summed = apply(combinations, 2, comb_subset_sum, vec = vec)
Example Output:
combinations:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 2 2 2 3 3 4
[2,] 2 3 4 5 3 4 5 4 5 5
subset_summed:
[1] 3 4 5 6 5 6 7 7 8 9
Trace:
vec[1]+vec[2]=3
vec[1]+vec[3]=4
vec[1]+vec[4]=5
vec[1]+vec[5]=6
vec[2]+vec[3]=5
vec[2]+vec[4]=6
vec[2]+vec[5]=7
vec[3]+vec[4]=7
vec[3]+vec[5]=8
vec[4]+vec[5]=9
To obtain the trace output, add the following before return() in comb_subset_sum():
cat(paste0("vec[",x,"]", collapse = "+"), "=", sum(vec[x]), "\n")
Summing for multiple k:
Here, we apply the same logic, just in a way that enables the k value of the combination to take multiple values.
# Input Vector
vec = 1:5
# Length of Vec
n = length(vec)
# Store output
o = vector('list',n)
for(i in seq_along(vec)){
o[[i]] = combn(n, i, FUN = comb_subset_sum, vec = vec)
}
Note: The size of each element of o will vary as the number of combinations will increase and then decrease.
Summing over combinations
If we do not care about vector element values, we can then just sum over the actual combinations in a similar way to how we obtained the vector elements.
To generate pairs and then sum, use:
# Input Vector
vec = 1:5
# Length of Vec
n = length(vec)
# Generate all combinations (by column)
# Specify the k (m in R)
m = combn(n, m = 2)
# Obtain sum by going over columns
sum_m = apply(m, 2, sum)
Or do it in one go:
# Specify the k (m in R)
sum_inplace = combn(n, m = 2, FUN = sum)
Equality:
all.equal(sum_m,sum_inplace)
Sum over k uses
And, as before, we can set it up to get all sums under different k by using:
# Input Vector
vec = 1:5
# Length of Vec
n = length(vec)
# Store output (varying lengths)
o = vector('list',n)
for(i in seq_along(vec)){
o[[i]] = combn(n, i, FUN = sum)
}
The following relies on the binary representation of number. Basically, you have 2^n combinations to check. By writing any number between 1 and 2^n in binary with 'n' bits, you have all the permutations of elements you might want.
The number2binary function comes from Paul Hiestra's answer in this tread: How to convert integer number into binary vector?
number2binary = function(number, noBits) {
binary_vector = rev(as.numeric(intToBits(number)))
if(missing(noBits)) {
return(binary_vector)
} else {
binary_vector[-(1:(length(binary_vector) - noBits))]
}
}
vector <- 1:5
n <- length(vector)
comp_sum <- function(x) {
binary <- number2binary(x, noBits = n)
result <- sum(vector[which(binary==1)])
names(result) <- paste(which(binary == 1), collapse = "+")
return(result)
}
binaries <- sapply(1:2^n-1, comp_sum)
Note: I only go up to 2^n - 1 as you do not need the "zero". By adding some conditions in your comp_sum function, you can pick only sums of two elements or of three elements...
You might be looking for rollsum from zoo package, where you can specify the number of elements you want to add up:
lapply(2:5, function(i) zoo::rollsum(1:5, i))
[[1]]
[1] 3 5 7 9 # two elements roll sum
[[2]]
[1] 6 9 12 # three elements roll sum
[[3]]
[1] 10 14 # four elements roll sum
[[4]]
[1] 15 # five elements roll sum
How to efficiently retrieve top K-similar vectors by cosine similarity using R? asks how to calculate top similar vectors for each vector of one matrix, relative to another matrix. It's satisfactorily answered, and I'd like to tweak it to operate on a single matrix.
That is, I'd like the top k similar other rows for each row in a matrix. I suspect the solution is very similar, but can be optimized.
This function is based on the linked answer:
CosineSimilarities <- function(m, top.k) {
# Computes cosine similarity between each row and all other rows in a matrix.
#
# Args:
# m: Matrix of values.
# top.k: Number of top rows to show for each row.
#
# Returns:
# Data frame with columns for pair of rows, and cosine similarity, for top
# `top.k` rows per row.
#
# Similarity computation
cp <- tcrossprod(m)
mm <- rowSums(m ^ 2)
result <- cp / sqrt(outer(mm, mm))
# Top similar rows from train (per row)
# Use `top.k + 1` to remove the self-reference (similarity = 1)
top <- apply(result, 2, order, decreasing=TRUE)[seq(top.k + 1), ]
result.df <- data.frame(row.id1=c(col(top)), row.id2=c(top))
result.df$cosine.similarity <- result[as.matrix(result.df[, 2:1])]
# Remove same-row records and return
return(result.df[result.df$row.id1 != result.df$row.id2, ])
}
For example:
(m <- matrix(1:9, nrow=3))
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
CosineSimilarities(m, 1)
# row.id1 row.id2 cosine.similarity
# 2 1 2 0.9956
# 4 2 3 0.9977
# 6 3 2 0.9977
I'm trying to write a function to determine the euclidean distance between x (one point) and y (a set of n points).
How should I pass y to the function? Until now, I used a matrix like that:
[,1] [,2] [,3]
[1,] 0 2 1
[2,] 1 1 1
Which would pass the points (0,2,1) and (1,1,1) to that function.
However, when I pass x as a normal (column) vector, the two variables don't match in the function.
I either have to transpose x or y, or save a vector of vectors an other way.
My question: What is the standard way to save more than one vector in R? (my matrix y)
Is it just my y transposed or maybe a list or dataframe?
There is no standard way, so you should just pick the most effective one, what on the other hand depends on how this vector of vectors looks just after creation (it is better to avoid any conversion which is not necessary) and on the speed of the function itself.
I believe that a data.frame with columns x, y and z should be pretty good choice; the distance function will be quite simple and fast then:
d<-function(x,y) sqrt((y$x-x[1])^2+(y$y-x[2])^2+(y$z-x[3])^2)
The apply function with the margin argument = 1 seems the most obvious:
> x
[,1] [,2] [,3]
[1,] 0 2 1
[2,] 1 1 1
> apply(x , 1, function(z) crossprod(z, 1:length(z) ) )
[1] 7 6
> 2*2+1*3
[1] 7
> 1*1+2*1+3*1
[1] 6
So if you wanted distances then square-root of the crossproduct of the differences to a chose point seems to work:
> apply(x , 1, function(z) sqrt(sum(crossprod(z -c(0,2,2), z-c(0,2,2) ) ) ) )
[1] 1.000000 1.732051