Quickly accessing matrix element, where indices are given by other matrices - r

I am given a matrix M. I now need to determine a matrix of the same dimension, which is defined by
N_{i,j} = M_{A(i,j),B(i,j)}
for two matrices A and B of the same dimension, which define indices.
As an example,
set.seed(1)
M <- matrix(LETTERS[1:(4*6)], ncol=6)
A <- matrix(sample(c(1:4), 4*6, replace=TRUE), ncol=6)
B <- matrix(sample(c(1:6), 4*6, replace=TRUE), ncol=6)
How do I now quickly determine N?

Try this:
replace(M, TRUE, M[cbind(c(A), c(B))])
or
array(M[cbind(c(A), c(B))], dim(M))

Related

Custom distance matrix with KNN

I need to get k nearest neighbors from distance matrix. Example:
I have two "training" vectors "a" <- c(1,1) and "b" <- c(2,2) which are two dimensional vectors. I have to classify c(3,3) and I didn't have regular distance because numbers are codes for characteristics, and distance(2,3) > distance(1,3)...so c(3,3) has "a" for nearest neighbor. Later I have to generalize and output n nearest neighbors, but only for one vector at a time.
This was most promising at first, but when I looked into documentation for k.nearest.neighbors I realized it won't help me. I can't do this with Python's scikit-learn, but have some hope for R implementation, any suggestions?
I need speed with this so if I'm going to implement it in high level language I need to do it with some library...I can easily code this up in Python's numpy, but will be almost certainly too slow.
EDIT:
library(FNN)
distance_matrix <- matrix( rep( 0, len=9), nrow = 3)
distance_matrix[1,3] <- 2
distance_matrix[3,1] <- 2
distance_matrix[2,3] <- 3
distance_matrix[3,2] <- 3
train <- rbind(c(1,1), c(2,2)
test <- rbind(c(3,3))
y <- c("one", "two")
fit <- knn(train, test, y, distance_matrix, k=1, prob=TRUE)
result <- data.frame(test, pred=fit, prob=attr(fit, "prob"))
But when I look at dataframe result I see result based on euclidian metric or something alike, not my distance matrix.

MIC correlation between 2 matrices in R

The MINERVA package provide a function to perform the Maximal Information Coefficient (MIC). The description of the package stipulates that the function mine (x,y) works only with 2 matrices A and B of the same size.
Here, I would like to obtain the MIC coefficient value obtained from the correlation of two A and B matrices of different size, respectfully, A is n by m and B is n by z, with n being the number of observations (rows).
In other words, my aim is to obtain a C matrix of m x z , which returns, for each value, give the MIC correlation coefficient values (and, if possible, the associated P value, if any).
I provide an example of what I want with the Pearson correlation.
set.seed(1)
x <- matrix(rnorm(20), nrow=5, ncol=10)
y <- matrix(rnorm(15), nrow=5, ncol=20)
P <- cor(x, y=y)
I mailed one author of the MINERVA package without success, is there any way I can apply the mine function to obtain the desired m by z correlation?
Let me answer to my own post. In the code below, I use the loop function, which may be not the smartest/fastest way to to do it, but it work as expected.
library(minerva)
set.seed(1)
x <- matrix(rnorm(20), nrow=5, ncol=10)
y <- matrix(rnorm(15), nrow=5, ncol=20)
Result = matrix(ncol = ncol(y),nrow = ncol(x))
for(i in 1:ncol(x))
{Thisvar = x[,i]
print(i)
for(k in 1:ncol(y))
{Thisvar2 = y[,k]
res = mine(Thisvar,Thisvar2, master=TRUE, use="all.obs")
Result[i,k] = res$MIC
}}

Generating matrices with special restriction in R

I would like to create two randomly generated matrices which sum of the row's elements of the first matrix is equal with the sum of column's elements of the second matrix in R.
I know how to create randomly generated matrix, for exapmle:
> A=matrix(rnorm(n=9, 0, 1), nrow=3, ncol=3)
> B=matrix(rnorm(n=9, 1, 5), nrow=3, ncol=3)
but how can I impose this restriction when I am generating these two matrices?
Would something like this work for you:
size <- 10
matrix_1 <- matrix(nrow = size, ncol = size)
matrix_2 <- matrix(nrow = size, ncol = size)
for (i in seq_len(size)){
vector <- rnorm(n=size, 1, 5)
matrix_1[i,] <- vector
matrix_2[,i] <- sample(vector, size)
}
The only problem is that the second matrix is not fully random, it is a random sample of the first matrix.

combine upper tri and lower tri matrices into a single data frame

I wish to represent p values and distances as lower triangular and upper triangular entries in a single matrix. While I managed to create a UT or LT matrix for both, I have ben unable to merge them into a single data frame in R.
dist[(upper.tri(dist,diag=FALSE))]=0 #upper tri of distances
pval[(lower.tri(pval,diag=FALSE))]=0 #lower tri of p-values
I tried the following line but does not work
dist[(upper.tri(dist,diag=FALSE))]=pval[(lower.tri(pval,diag=FALSE))]
Any possible way of doing this?
I'm sure this could be done more elegantly, but I think this does what you want:
a <- matrix(0, nrow = 10, ncol = 10)
b <- matrix(1, nrow = 10, ncol = 10)
a[upper.tri(a)]
b[lower.tri(b)]
new <- matrix(NA, nrow = 10, ncol = 10)
new[upper.tri(new)] <- a[upper.tri(a)]
new[lower.tri(new)] <- b[lower.tri(b)]
new
Since you did not supply a reproducible example, I can't be sure, but basically I just take the upper and lower of matrices (one of 0s and the other of 1s) and combine them in new. As proof of concept, new has 0s above the diagonal, 1s below, and NAs on the diagonal itself. Hopefully this gives you some insight into your issue.
Though, this question already answered I would like to add the following code for future use for anybody.
First, create two matrices of 10 by 10, with 1s and 2s only. Then using the package Matrix get only the lower and upper triangular matrices. Since there are no overlaps, we can simply use addition to combine the two matrices. Then convert the "dgeMatrix" first into a matrix and then to a data frame.
a <- matrix(1,10,10)
b <- matrix(2,10,10)
library(Matrix)
a <- tril(a, -1) # strict lower triangular matrix (omit diagonals)
b <- triu(b, 1) # strict upper triangular matrix
c <- a + b
c <- as.data.frame(as.matrix(c))

R Self Organising Subset

I have a matrix with a hundred rows.
Is there a way to obtain a subset of ten rows which are most similar to the first row.
res2 <- matrix(rexp(200, rate=.1), ncol=10, nrow=100)
set1 <- subset(res2, res2 >condition1)
set1[with(set1, order(condition)), ]
set2 <- head(set1,10)
Perhaps:
Generate data:
set.seed(101)
res2 <- matrix(rexp(200, rate=.1), ncol=10, nrow=100)
Calculate the distance matrix. This is very inefficient because we're computing all of the pairwise distances, but it's efficiently coded and easy to use and you have lots of choices of distance metric (see ?dist, look for method). For this size problem it's very quick.
dd <- dist(res2)
rr <- rank(as.matrix(dd)[1,])
You'll notice that the rank of the first element of the first row (which is the distance between row 1 and itself) is 1, and its value (as.matrix(dd)[1,1]) is zero. So all we need now are the rows with the next ten smallest distances ...
res2[rr>1 & rr<=11,]

Resources