Matrix of pairwise distances - r

I have a set of points coordinates and I want to use it to generate a matrix of distances. More specifically, I have two sets of points, A of size n and B of size m, given as 2d coordinates and I want to have all Euclidean distances between points from A and points from B and no other distances, in a matrix.
Edit: what if the situation is more complicated: what if I have my matrix but now I want to divide each row of it by the sum of Euclidean distances of the first point from A from all the points in set B: that is, normalise each row of distances. Is there an efficient way to do that?

set.seed(101)
n <- 10; m <- 20
A <- data.frame(x=runif(n),y=runif(n))
B <- data.frame(x=runif(m),y=runif(m))
We want
sqrt((x_{1,i}-x_{2,j})^2+(y_{1,i}-y_{2,j})^2)
for every i=1:n and j=1:m.
You can do this via
dists <- sqrt(outer(A$x,B$x,"-")^2 + outer(A$y,B$y,"-")^2)
which in this case is a 10x20 matrix. In words, we're finding the difference ("-" is a reference to the subtraction operator) between each pair of x values and each pair of y values, squaring, adding, and taking the square root.
If you want to normalize every row by its sum, I would suggest
norm.dists <- sweep(dists,MARGIN=1,STATS=rowSums(dists),FUN="/")

The dist(...) function in base R will not be helpful, because it calculates the auto-distances (distance from every point to every other point in a given dataset). You want cross-distances. There is a dist(...) function in package proxy which is designed for this.
Using the dataset kindly provided by #BenBolker,
library(proxy) # note that this masks the dist(...) fn in base R...
result <- dist(A,B)
result[1:5,1:5]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.5529902 0.7303561 0.1985409 0.6184414 0.7344280
# [2,] 0.7109408 0.9506428 0.1778637 0.7216595 0.9333687
# [3,] 0.2971463 0.3809688 0.4971621 0.4019629 0.3995298
# [4,] 0.4985324 0.5737397 0.4760870 0.5986826 0.5993541
# [5,] 0.4513063 0.7071025 0.3077415 0.4289675 0.6761988

Related

Find limiting distribution of transition matrix and plot in R

may I know how I can find and plot the results of the limiting distribution or a unique stationary distribution of a transition matrix in R? (my goal is to have a unique and constant result instead of a random result)
This is the P matrix used:
P=matrix(c(0.2,0.3,0.5,0.1,0.8,0.1,0.4,0.2,0.4),nrow=3,ncol=3,byrow=TRUE)
I misspoke in my earlier answer. Either the sums of the rows or the column need to all be 1 for a transition matrix. It depends on whether you are using v'P or Pv to transition to the next step.
I'll use Pv.
For the limiting distribution to be stable, we must have:
Pv = v, or (P - I)v = 0. So the limiting distribution is an eigenvector with eigenvalue 1. Then to be sure it's a distribution sum(v) == 1.
Since your matrix has rows that sum to 1, not columns, we need to use the transpose of the matrix to calculate the eigenvalues:
e <- eigen(t(P))$vectors[, 1]
e <- e / sum(e)
Gives:
e
[1] 0.1960784 0.5490196 0.2549020
To check this:
P=matrix(c(0.2,0.3,0.5,0.1,0.8,0.1,0.4,0.2,0.4),nrow=3,ncol=3, byrow = TRUE)
ans <- e
for (i in 1:1000) {
ans <- ans %*% P
}
ans
ans
[,1] [,2] [,3]
[1,] 0.1960784 0.5490196 0.254902
Same, so it's stable.
I'm not clear as to what you wanted to plot.

create a random non-singular matrix reliably

How can I create a matrix of pseudo-random values that is guaranteed to be non-singular? I tried the code below, but it failed. I suppose I could just loop until I got one by chance but I would prefer a more elegant "R-like" solution if anyone has an idea.
library(matrixcalc)
exampledf<- matrix(ceiling(runif(16,0,50)), ncol=4)
is.singular.matrix(exampledf) #this may or may not return false
using a while loop:
exampledf<-NULL
library(matrixcalc)
while(is.singular.matrix(exampledf)!=TRUE){
exampledf<- matrix(ceiling(runif(16,0,50)), ncol=4)
}
I suppose one method that guarantees (not is fairly likely, but actually guarantees) that the matrix is non-singular, is to start from a known non-singular matrix and apply the basic linear operations used for example in Gaussian Elimination: 1. add / subtract a multiple of one row from another row or 2. multiply row by a constant.
Depending on how "random" and how dense you want your matrix to be you can start from the identity matrix and multiply all elements with a random constant. Afterwards, you can apply a randomly selected set of operations from above, that will result in a non singular matrix. You can even apply a predefined set of operations, but using a randomly selected constant at each step.
An alternative could be to start from an upper triangular matrix for which the product of main diagonal entries is not zero. This is because the determinant of a triangular matrix is the product of the elements on the main diagonal. This effectively boils down to generating N random numbers, placing them on the main diagonal, and setting the rest of the entries (above the main diagonal) to whatever you like. If you want the matrix to be fully dense, add the first row to every other row of the matrix.
Of course this approach (like any other probably would) assumes that the matrix is relatively numerically stable and the singularity will not be affected by precision errors (as you know the precision of data types in all programming languages is limited). You would do well to avoid very small / very large values which can make the method numerically unstable.
It should be fairly unlikely that this will produce a singular matrix:
Mat1 <- matrix(rnorm(100), ncol=4)
Mat2 <- matrix(rnorm(100), ncol=4)
crossprod(Mat1,Mat2)
[,1] [,2] [,3] [,4]
[1,] 0.8138 5.112 2.945 -5.003
[2,] 4.9755 -2.420 1.801 -4.188
[3,] -3.8579 8.791 -2.594 3.340
[4,] 7.2057 6.426 2.663 -1.235
solve( crossprod(Mat1,Mat2) )
[,1] [,2] [,3] [,4]
[1,] -0.11273 0.15811 0.05616 0.07241
[2,] 0.03387 0.01187 0.07626 0.02881
[3,] 0.19007 -0.60377 -0.40665 0.17771
[4,] -0.07174 -0.31751 -0.15228 0.14582
inv1000 <- replicate(1000, {
Mat1 <- matrix(rnorm(100), ncol=4)
Mat2 <- matrix(rnorm(100), ncol=4)
try(solve( crossprod(Mat1,Mat2)))} )
str(inv1000)
#num [1:4, 1:4, 1:1000] 0.1163 0.0328 0.3424 -0.227 0.0347 ...
max(inv1000)
#[1] 451.6
> inv100000 <- replicate(100000, {Mat1 <- matrix(rnorm(100), ncol=4)
+ Mat2 <- matrix(rnorm(100), ncol=4)
+ is.singular.matrix( crossprod(Mat1,Mat2))} )
> sum(inv100000)
[1] 0

K-means clustering with my own distance function

I have defined a distance function as follow
jaccard.rules.dist <- function(x,y) ({
# implements feature distance. Feature "Airline" gets a different treatment, the rest
# are booleans coded as 1/0. Airline column distance = 0 if same airline, 1 otherwise
# the rest of the atributes' distance is cero iff both are 1, 1 otherwise
airline.column <- which(colnames(x)=="Aerolinea")
xmod <- x
ymod <-y
xmod[airline.column] <-ifelse(x[airline.column]==y[airline.column],1,0)
ymod[airline.column] <-1 # if they are the same, they are both ones, else they are different
andval <- sum(xmod&ymod)
orval <- sum(xmod|ymod)
return (1-andval/orval)
})
which modifies a little bit jaccard distance for dataframes of the form
t <- data.frame(Aerolinea=c("A","B","C","A"),atr2=c(1,1,0,0),atr3=c(0,0,0,1))
Now, I would like to perform some k-means clustering on my dataset, using the distance just defined. If I try to use the function kmeans, there is no way to specify my distance function. I tried the to use hclust, which accepts a distanca matrix, which I calculated as follows
distmat <- matrix(nrow=nrow(t),ncol=nrow(t))
for (i in 1:nrow(t))
for (j in i:nrow(t))
distmat[j,i] <- jaccard.rules.dist(t[j,],t[i,])
distmat <- as.dist(distmat)
and then invoked hclust
hclust(distmat)
Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") :
missing value where TRUE/FALSE needed
what am i doing wrong? is there another way to do clustering that just accepts an arbitrary distance function as its input?
thanks in advance.
I think distmat (from your code) has to be a distance structure (which is different from a matrix). Try this instead:
require(proxy)
d <- dist(t, jaccard.rules.dist)
clust <- hclust(d=d)
clust#centers
[,1] [,2]
[1,] 0.044128322 -0.039518142
[2,] -0.986798495 0.975132418
[3,] -0.006441892 0.001099211
[4,] 1.487829642 1.000431146

How to compute the power of a matrix in R [duplicate]

This question already has answers here:
A^k for matrix multiplication in R?
(6 answers)
Closed 9 years ago.
I'm trying to compute the -0.5 power of the following matrix:
S <- matrix(c(0.088150041, 0.001017491 , 0.001017491, 0.084634294),nrow=2)
In Matlab, the result is (S^(-0.5)):
S^(-0.5)
ans =
3.3683 -0.0200
-0.0200 3.4376
> library(expm)
> solve(sqrtm(S))
[,1] [,2]
[1,] 3.36830328 -0.02004191
[2,] -0.02004191 3.43755429
After some time, the following solution came up:
"%^%" <- function(S, power)
with(eigen(S), vectors %*% (values^power * t(vectors)))
S%^%(-0.5)
The result gives the expected answer:
[,1] [,2]
[1,] 3.36830328 -0.02004191
[2,] -0.02004191 3.43755430
The square root of a matrix is not necessarily unique (most real numbers have at least 2 square roots, so it is not just matricies). There are multiple algorithms for generating a square root of a matrix. Others have shown the approach using expm and eigenvalues, but the Cholesky decomposition is another possibility (see the chol function).
To extend this answer beyond square roots, the following function exp.mat() generalizes the "Moore–Penrose pseudoinverse" of a matrix and allows for one to calculate the exponentiation of a matrix via a Singular Value Decomposition (SVD) (even works for non square matrices, although I don't know when one would need that).
exp.mat() function:
#The exp.mat function performs can calculate the pseudoinverse of a matrix (EXP=-1)
#and other exponents of matrices, such as square roots (EXP=0.5) or square root of
#its inverse (EXP=-0.5).
#The function arguments are a matrix (MAT), an exponent (EXP), and a tolerance
#level for non-zero singular values.
exp.mat<-function(MAT, EXP, tol=NULL){
MAT <- as.matrix(MAT)
matdim <- dim(MAT)
if(is.null(tol)){
tol=min(1e-7, .Machine$double.eps*max(matdim)*max(MAT))
}
if(matdim[1]>=matdim[2]){
svd1 <- svd(MAT)
keep <- which(svd1$d > tol)
res <- t(svd1$u[,keep]%*%diag(svd1$d[keep]^EXP, nrow=length(keep))%*%t(svd1$v[,keep]))
}
if(matdim[1]<matdim[2]){
svd1 <- svd(t(MAT))
keep <- which(svd1$d > tol)
res <- svd1$u[,keep]%*%diag(svd1$d[keep]^EXP, nrow=length(keep))%*%t(svd1$v[,keep])
}
return(res)
}
Example
S <- matrix(c(0.088150041, 0.001017491 , 0.001017491, 0.084634294),nrow=2)
exp.mat(S, -0.5)
# [,1] [,2]
#[1,] 3.36830328 -0.02004191
#[2,] -0.02004191 3.43755429
Other examples can be found here.

Determining if a matrix is diagonalizable in the R Programming Language

I have a matrix and I would like to know if it is diagonalizable. How do I do this in the R programming language?
If you have a given matrix, m, then one way is the take the eigen vectors times the diagonal of the eigen values times the inverse of the original matrix. That should give us back the original matrix. In R that looks like:
m <- matrix( c(1:16), nrow = 4)
p <- eigen(m)$vectors
d <- diag(eigen(m)$values)
p %*% d %*% solve(p)
m
so in that example p %*% d %*% solve(p) should be the same as m
You can implement the full algorithm to check if the matrix reduces to a Jordan form or a diagonal one (see e.g., this document). Or you can take the quick and dirty way: for an n-dimensional square matrix, use eigen(M)$values and check that they are n distinct values. For random matrices, this always suffices: degeneracy has prob.0.
P.S.: based on a simple observation by JD Long below, I recalled that a necessary and sufficient condition for diagonalizability is that the eigenvectors span the original space. To check this, just see that eigenvector matrix has full rank (no zero eigenvalue). So here is the code:
diagflag = function(m,tol=1e-10){
x = eigen(m)$vectors
y = min(abs(eigen(x)$values))
return(y>tol)
}
# nondiagonalizable matrix
m1 = matrix(c(1,1,0,1),nrow=2)
# diagonalizable matrix
m2 = matrix(c(-1,1,0,1),nrow=2)
> m1
[,1] [,2]
[1,] 1 0
[2,] 1 1
> diagflag(m1)
[1] FALSE
> m2
[,1] [,2]
[1,] -1 0
[2,] 1 1
> diagflag(m2)
[1] TRUE
You might want to check out this page for some basic discussion and code. You'll need to search for "diagonalized" which is where the relevant portion begins.
All symmetric matrices across the diagonal are diagonalizable by orthogonal matrices. In fact if you want diagonalizability only by orthogonal matrix conjugation, i.e. D= P AP' where P' just stands for transpose then symmetry across the diagonal, i.e. A_{ij}=A_{ji}, is exactly equivalent to diagonalizability.
If the matrix is not symmetric, then diagonalizability means not D= PAP' but merely D=PAP^{-1} and we do not necessarily have P'=P^{-1} which is the condition of orthogonality.
you need to do something more substantial and there is probably a better way but you could just compute the eigenvectors and check rank equal to total dimension.
See this discussion for a more detailed explanation.

Resources