Is there a package to convert a distance matrix to a set of coordinates?
I have gone throught the below question. I was hoping there would be a package for this.
Finding the coordinates of points from distance matrix
I have considered Sammons Projection for this but from what I understand, it is an optimizer and gets you an optimum solution. I think there should be an algorithm to get a unique solution for this.
Multidimensional scaling (MDS) aims to project the distance matrix of your data to a lower dimension k, where desired k = 2 in your case, while trying to preserve the distances between data points:
# Multidimensional scaling
library(MASS)
set.seed(1)
labels <- as.factor(sample(LETTERS[1:5], 100, replace=TRUE))
dat <- mvrnorm(n=100, mu = c(1:4), Sigma=matrix(1:16, ncol=4)) + as.numeric(labels)^2
#> dim(dat)
#[1] 100 4
# Euclidean distance matrix (100x100)
d <- dist(dat)
# Classical MDS for distance matrix d
# http://en.wikipedia.org/wiki/Multidimensional_scaling
mds <- cmdscale(d, k = 2)
x <- mds[,1]
y <- mds[,2]
plot(x,y, col=rainbow(5)[as.numeric(labels)], pch=16, main="MDS for object 'dat'")
legend("topright", legend=unique(labels), col=rainbow(5)[unique(as.numeric(labels))], pch=16)
Further reading: https://stats.stackexchange.com/questions/14002/whats-the-difference-between-principal-components-analysis-and-multidimensional
Look-up an algorithm called Multi-Dimensional Scaling (MDS). An implementation in R is the cmdscale function from the stats package:
Multidimensional scaling takes a set of dissimilarities and returns a set of points such that the distances between the points are approximately equal to the dissimilarities.
The documentation also has an example where a distance matrix is turned into two vectors of x and y coordinates, then plotted.
Related
I'm working on some bioacoustical analysis and got stuck with an issue that I believe it can be worked out mathematically. I'll use an sound sample from seewavepackage:
library(seewave)
library(tuneR)
data(tico)
By storing a spectrogram (i.e. graphic representation of the sound wave tico) in an R object, we can now deal with the wave file computationally.
s <- spectro(tico, plot=F)
class(s)
>[1] "list"
length(s)
>[1] 3
The object created s consists in two numerical vectors x = s$time, y = s$freq representing the X and Y axis, respectively, and a matrix z = s$amp of amplitude values with the same dimensions of x and y. Z is a virtually a 3D matrix that can be plotted using persp3D (plot3D), plot_ly (plotly) or plot3d (rgl). Alternatively, the wave file can be plotted in 3D using seewave if one wishes to visualize it as an interative rgl plot.
spectro3D(tico)
That being said, the analysis I'm conducting aims to calculate contours of relative amplitude:
con <- contourLines(x=s$time, y=s$freq, z=t(s$amp), levels=seq(-25, -25, 1))
Select the longest contour:
n.con <- numeric(length(con))
for(i in 1:length(con)) n.con[i] <- length(con[[i]]$x)
n.max <- which.max(n.con)
con.max <- con[[n.max]]
And then plot the selected contour against the spectrogram of tico:
spectro(tico, grid=F, osc=F, scale=F)
polygon(x=con.max$x, y=con.max$y, lwd=2)
Now it comes the tricky part. I must find a way to "subset" the matrix of amplitude values s$amp using the coordinates of the longest contour con.max. What I aim to achieve is a new matrix containing only the amplitude values inside the polygon. The remaining parts of the spectrogram should then appear as blank spaces.
One approach I though it could work would be to create a loop that replaces every value outside the polygon for a given amplitude value (e.g. -25 dB). I once did an similar approach to remove the values below -30 dB and it worked out perfectly:
for(i in 1:length(s$amp)){if(s$amp[i] == -Inf |s$amp[i] <= -30)
{s$amp[i] <- -30}}
Another though would be to create a new matrix with the same dimensions of s$amp, subset s$amp using the coordinates of the contour, then replace the subset on the new matrix. Roughly:
mt <- matrix(-30, nrow=nrow(s$amp), ncol = ncol(s$amp))
sb <- s$amp[con.max$y, con.max$x]
new.mt <- c(mt, sb)
s$amp <- new.mt
I'll appreciate any help.
I would like to match points in 3-dimensional space.
Therefore, I am using the Hungarian Method described in this question: Finding the best matching pairwise points from 2 vectors
Here is my example using R:
# packages
library(rgl)
library(clue)
library(plyr)
library(fields)
set.seed(1)
a <- c(rep(2,7), 3,4,5,6,3,4,5,6,7,7,7,7,7,7) # x values
b <- c(rep(3,7),3,3,3,3, 3,3,3,3,3,3,3,3,3,3) # y values
c <- c(seq(1,7),1,1,1,1,7,7,7,7,1,2,3,4,5,6) # z values
# transform the points
set.seed(2)
a1 <- a + seq(1,length(a))
b1 <- b + 8
c1 <- c + 9
# plot the data
plot3d(a,b,c, col="red", pch=16,size=10)
plot3d(a1,b1,c1, lwd=10, col="blue", pch=16,size=10, add=TRUE)
# run the Hungarian Method
A <- cbind(a,b,c)
B <- cbind(a1,b1,c1)
distances <- rdist(A,B) # calculate Euclidean Distance between points
min.dist <- solve_LSAP(distances) # minimizing the sum of distance
min.dist.num <- as.numeric(min.dist)
# plot the minimized lines between point sets
for (ii in 1:dim(B)[1]){
D <- c(A[ii,1], B[min.dist.num[ii],1])
R <- c(A[ii,2], B[min.dist.num[ii],2])
W <- c(A[ii,3], B[min.dist.num[ii],3])
segments3d(D,R,W,col=2,lwd=1)
}
# calculate the share of points that is matched correctly
sum(1:dim(B)[1]==min.dist.num)/dim(B)[1]* 100
The problem here is that only 5% of the points are matched correctly (see last line of the code). In my view, the main trouble is that the algorithm does not take the structure of the object (a square) into account.
Question: Is there any method that performs better for this sample data?
In my original data, the dimensional structure of the points is way more complicated. I have a cloud of data and within this cloud there are multiple subfigures.
I am seeking primarily for a solution in R, but other implementations (e.g. MATLAB, Excel, Java) are also welcome.
I'm trying to find the euclidean distance between two points, confined by an irregular polygon. (ie. the distance would have to be calculated as a route through the window given)
Here is an reproducible example:
library(spatstat)
#Simple example of a polygon and points.
ex.poly <- data.frame(x=c(0,5,5,2.5,0), y=c(0,0,5,2.5,5))
points <- data.frame(x=c(0.5, 2.5, 4.5), y=c(4,1,4))
bound <- owin(poly=data.frame(x=ex.poly$x, y=ex.poly$y))
test.ppp <- ppp(x=points$x, y=points$y, window=bound)
pairdist.ppp(test.ppp)#distance between every point
#The distance result from this function between point 1 and point 3, is given as 4.0
However we know just from plotting the points
plot(test.ppp)
that the distance when the route is confined to the polygon should be greater (in this case, 5.00).
Is there another function that I am not aware of in {spatstat} that would do this? Or does anybody have any other suggestions for another package that could do this?
I'm trying to find the distance between two points in a water body, so the irregular polygon in my actual data is more complex.
Any help is greatly appreciated!
Cheers
OK, here's the gdistance-based approach I mentioned in comments yesterday. It's not perfect, since the segments of the paths it computes are all constrained to occur in one of 16 directions on a chessboard (king's moves plus knight's moves). That said, it gets within 2% of the correct values (always slightly overestimating) for each of the three pairwise distances in your example.
library(maptools) ## To convert spatstat objects to sp objects
library(gdistance) ## Loads raster and provides cost-surface functions
## Convert *.ppp points to SpatialPoints object
Pts <- as(test.ppp, "SpatialPoints")
## Convert the lake's boundary to a raster, with values of 1 for
## cells within the lake and values of 0 for cells on land
Poly <- as(bound, "SpatialPolygons") ## 1st to SpatialPolygons-object
R <- raster(extent(Poly), nrow=100, ncol=100) ## 2nd to RasterLayer ...
RR <- rasterize(Poly, R) ## ...
RR[is.na(RR)]<-0 ## Set cells on land to "0"
## gdistance requires that you 1st prepare a sparse "transition matrix"
## whose values give the "conductance" of movement between pairs of
## adjacent and next-to-adjacent cells (when using directions=16)
tr1 <- transition(RR, transitionFunction=mean, directions=16)
tr1 <- geoCorrection(tr1,type="c")
## Compute a matrix of pairwise distances between points
## (These should be 5.00 and 3.605; all are within 2% of actual value).
costDistance(tr1, Pts)
## 1 2
## 2 3.650282
## 3 5.005259 3.650282
## View the selected paths
plot(RR)
plot(Pts, pch=16, col="gold", cex=1.5, add=TRUE)
SL12 <- shortestPath(tr1, Pts[1,], Pts[2,], output="SpatialLines")
SL13 <- shortestPath(tr1, Pts[1,], Pts[3,], output="SpatialLines")
SL23 <- shortestPath(tr1, Pts[2,], Pts[3,], output="SpatialLines")
lapply(list(SL12, SL13, SL23), function(X) plot(X, col="red", add=TRUE, lwd=2))
I have a bunch of points that lie around y=x (see the examples below), and I hope to calculate the orthogonal distance of each point to this y=x. Suppose that a point has coordinates (a,b), then it's easy to see the projected point on the y=x has coordinates ((a+b)/2, (a+b)/2). I use the following native codes for the calculation, but I think I need a faster one without the for loops. Thank you very much!
set.seed(999)
n=50
typ.ord = seq(-2,3, length=n) # x-axis
#
good.ord = sort(c(rnorm(n/2, typ.ord[1:n/2]+1,0.1),rnorm(n/2,typ.ord[(n/2+1):n]-0.5,0.1)))
y.min = min(good.ord)
y.max = max(good.ord)
#
plot(typ.ord, good.ord, col="green", ylim=c(y.min, y.max))
abline(0,1, col="blue")
#
# a = typ.ord
# b = good.ord
cal.orth.dist = function(n, typ.ord, good.ord){
good.mid.pts = (typ.ord + good.ord)/2
orth.dist = numeric(n)
for (i in 1:n){
num.mat = rbind(rep(good.mid.pts[i],2), c(typ.ord[i], good.ord[i]))
orth.dist[i] = dist(num.mat)
}
return(orth.dist)
}
good.dist = cal.orth.dist(50, typ.ord, good.ord)
sum(good.dist)
As easy as
good.dist <- sqrt((good.ord - typ.ord)^2 / 2)
It all boils down to compute the distance between a point and a line. In the 2D case of y = x, this becomes particularly easy (try it yourself).
In the more general case (extending to other lines in possibly more than 2-D space), you can use the following. It works by constructing a projection matrix P from the subspace (here the vector A) onto which you want to project the points x. Subtracting the projected component from the points leaves the orthogonal component, for which it's easy to calculate the distances.
x <- cbind(typ.ord, good.ord) # Points to be projected
A <- c(1,1) # Subspace to project onto
P <- A %*% solve(t(A) %*% A) %*% t(A) # Projection matrix P_A = A (A^T A)^-1 A^T
dists <- sqrt(rowSums(x - x %*% P)^2) # Lengths of orthogonal residuals
Does R have a package for generating random numbers in multi-dimensional space? For example, suppose I want to generate 1000 points inside a cuboid or a sphere.
I have some functions for hypercube and n-sphere selection that generate dataframes with cartesian coordinates and guarantee a uniform distribution through the hypercube or n-sphere for an arbitrary amount of dimensions :
GenerateCubiclePoints <- function(nrPoints,nrDim,center=rep(0,nrDim),l=1){
x <- matrix(runif(nrPoints*nrDim,-1,1),ncol=nrDim)
x <- as.data.frame(
t(apply(x*(l/2),1,'+',center))
)
names(x) <- make.names(seq_len(nrDim))
x
}
is in a cube/hypercube of nrDim dimensions with a center and l the length of one side.
For an n-sphere with nrDim dimensions, you can do something similar, where r is the radius :
GenerateSpherePoints <- function(nrPoints,nrDim,center=rep(0,nrDim),r=1){
#generate the polar coordinates!
x <- matrix(runif(nrPoints*nrDim,-pi,pi),ncol=nrDim)
x[,nrDim] <- x[,nrDim]/2
#recalculate them to cartesians
sin.x <- sin(x)
cos.x <- cos(x)
cos.x[,nrDim] <- 1 # see the formula for n.spheres
y <- sapply(1:nrDim, function(i){
if(i==1){
cos.x[,1]
} else {
cos.x[,i]*apply(sin.x[,1:(i-1),drop=F],1,prod)
}
})*sqrt(runif(nrPoints,0,r^2))
y <- as.data.frame(
t(apply(y,1,'+',center))
)
names(y) <- make.names(seq_len(nrDim))
y
}
in 2 dimensions, these give :
From code :
T1 <- GenerateCubiclePoints(10000,2,c(4,3),5)
T2 <- GenerateSpherePoints(10000,2,c(-5,3),2)
op <- par(mfrow=c(1,2))
plot(T1)
plot(T2)
par(op)
Also check out the copula package. This will generate data within a cube/hypercube with uniform margins, but with correlation structures that you set. The generated variables can then be transformed to represent other shapes, but still with relations other than independent.
If you want more complex shapes but are happy with uniform and idependent within the shape then you can just do rejection sampling: generate data within a cube that contains your shape, then test if the points are within your shape, reject them if not, then keep doing this until there are enough points.
A couple of years ago, I made a package called geozoo. It is available on CRAN.
install.packages("geozoo")
library(geozoo)
It has many different functions to produce objects in N-dimensions.
p = 4
n = 1000
# Cube with points on it's face.
# A 3D version would be a box with solid walls and a hollow interior.
cube.face(p)
# Hollow sphere
sphere.hollow(p, n)
# Solid cube
cube.solid.random(p, n)
cube.solid.grid(p, 10) # evenly spaced points
# Solid Sphere
sphere.solid.random(p, n)
sphere.solid.grid(p, 10) # evenly spaced points
One of my favorite ones to watch animate is a cube with points along its edges, because it was one of the first objects that I made. It also gives you a sense of distance between vertices.
# Cube with points along it's edges.
cube.dotline(4)
Also, check out the website: http://streaming.stat.iastate.edu/~dicook/geometric-data/. It contains pictures and downloadable data sets.
Hope it meets your needs!
Cuboid:
df <- data.frame(
x = runif(1000),
y = runif(1000),
z = runif(1000)
)
head(df)
x y z
1 0.7522104 0.579833314 0.7878651
2 0.2846864 0.520284731 0.8435828
3 0.2240340 0.001686003 0.2143208
4 0.4933712 0.250840233 0.4618258
5 0.6749785 0.298335804 0.4494820
6 0.7089414 0.141114804 0.3772317
Sphere:
df <- data.frame(
radius = runif(1000),
inclination = 2*pi*runif(1000),
azimuth = 2*pi*runif(1000)
)
head(df)
radius inclination azimuth
1 0.1233281 5.363530 1.747377
2 0.1872865 5.309806 4.933985
3 0.2371039 5.029894 6.160549
4 0.2438854 2.962975 2.862862
5 0.5300013 3.340892 1.647043
6 0.6972793 4.777056 2.381325
Note: edited to include code for sphere
Here is one way to do it.
Say we hope to generate a bunch of 3d points of the form y = (y_1, y_2, y_3)
Sample X from multivariate Gaussian with mean zero and covariance matrix R.
(x_1, x_2, x_3) ~ Multivariate_Gaussian(u = [0,0,0], R = [[r_11, r_12, r_13],r_21, r_22, r_23], [r_31, r_32, r_33]]
You can find a function which generates Multivariate Gaussian samples in an R package.
Take the Gaussian cdf of each covariate (phi(x_1) , phi(x_2), phi(x_3)). In this case, phi is the Gaussian cdf of our variables. Ie phi(x_1) = Pr[x <= x_1] By the probability integral transform, these (phi(x_1) , phi(x_2), phi(x_3)) = (u_1, u_2, u_3), will each be uniformly distrubted on [0,1].
Then, take the inverse cdf of each uniformly distributed marginal. In other words take the inverse cdf of u_1, u_2, u_3:
F^{-1}(u_1), F^{-2}(u_2), F^{-3}(u_3) = (y_1, y_2, y_3), where F is the marginal cdf of the distrubution you are trying to sample from.