Voronoi diagrams based on non metric distances in R

Voronoi diagrams based on non metric distances in R - r

I want to voronoi diagrams in R. I have a set of points in N-dimensions(say 10). I dont want to use multi dimensional scaling(MDS). I want voronoi diagrams to be plotted using non metric measures. Is there any package which has this implementation? If not, then can you suggest me a suitable way to plot the tessellations using these N-dimensional co-ordinates.

It is not clear whether your problem is the dimension reduction
or plotting the tessellation: the problems are separate.
As suggested in the comments, you can use
library(sos)
???"non-metric"
???"Voronoi"
to find where the functions you need are.
# Sample data: a distance matrix
d <- dist( matrix( rnorm(200), nc=10 ) )
# Dimension reduction, via non-metric multidimensional scaling
library(MASS)
r <- sammon( d )
# Plot the Voronoi tessellation
library(tripack)
x <- r$points
plot( voronoi.mosaic(x[,1], x[,2]) )
points(x, pch=13)
Besides principal component analysis (prcomp)
and multidimensional scaling (MASS::isoMDS, MASS:sammon),
you can also look at
isomap (vegan::isomap),
local linear embedding (lle::lle),
maximum variance unfolding
or T-distributed stochastic neighbor embedding (tsne::tsne):
since some of those (Isomap, LLE, MVU) are based on the "neighbourhood graph",
which is not unlike the 2-dimensional tessellation you seek,
they may be more meaningful for your problem.

Related

R: Is it possible to plot a grid from x, y spatial coordinates?

I've been working with a spatial model which contains 21,000 grid cells of unequal size (i by j, where i is [1:175] and j is[1:120]). I have the latitude and longitude values in two seperate arrays (lat_array,lon_array) of i and j dimensions.
Plotting the coordinates:
> plot(lon_array, lat_array, main='Grid Coordinates')
Result:
My question: Is it possible to plot these spatial coordinates as a grid rather than as points? Does anyone know of a package or function that might be able to do this? I haven't been able to find anything online to this nature.
Thanks.

First of all it is always a bit dangerous to plot inherently spherical coordinates (lat,long) directly in the plane. Usually you should project them in some way, but I will leave it for you to explore the sp package and the function spTransform or something like that.
I guess in principle you could simply use the deldir package to calculate the Dirichlet tessellation of you points which would give you a nice grid. However, you need a bounding region for this to avoid large cells radiating out from the border of your region. I personally use spatstat to call deldir so I can't give you the direct commands in deldir, but in spatstat I would do something like:
library(spatstat)
plot(lon_array, lat_array, main='Grid Coordinates')
W <- clickpoly(add = TRUE) # Now click the region that contains your grid
i_na <- is.na(lon_array) | is.na(lat_array) # Index of NAs
X <- ppp(lon_array[!i_na], lat_array[!i_na], window = W)
grid <- dirichlet(X)
plot(grid)
I have not tested this yet and I will update this answer once I get the chance to test it with some artificial data. A major problem is the size of your dataset which may take a long time to calculate the Dirichlet tessellation of. I have only tried to call dirichlet on dataset of size up to 3000 points...

3-D Cartesian points to 2-D hemispherical and calculate the area of 2-D Voronoi cells

I've been working on some functions in R and MatLab based on Qhull (the geometry package in R) to project local Cartesian X,Y,Z points within a circular plot to spherical (theta,phi,R), centered at 0,0,0. Since all of the Z values are positive in the original coordinates (X and Y are instead centered at 0), this gives me the hemispherical projection that I desire (the point colors are scaled by Z values), plotted with the radial.plot() function of R plotrix, using phi (azimuth angle) and theta (polar angle):
For the spherical transformation, after centering at 0,0,0, rather than using the calculations of Bourke (1996), I use the ISO specification listed on Wikipedia (not the physics convention).
r <- sqrt(x^2 + y^2 + z^2)
theta <- acos(z/r)
phi <- atan2(y,x)
I would like to know the area of Voronoi cells containing points of a given class in this hemispherical projection, preserving perspective distortion. While it is simple to calculate the 2-D Voronoi diagram for the 2-D Cartesian X,Y points, translating this Voronoi diagram to 2-D spherical may not produce the desired results, yes? Perhaps it would be best to compute the Voronoi diagram directly from the hemispherical projected points and then return the area of each cell.
Update: I've solved it. My solution will be shared in a new R package, which I will post here.

OP, Adam Erickson, published the gapfraction package which implements Erickson's hemispherical-Voronoi gap fraction algorithm.
The gapfraction package for R was designed for modeling understory
light in forests with light-detection-and-ranging (LiDAR) data. In
addition to metrics of canopy gap fraction (Po), angular canopy
closure (ACC), and vertical canopy cover (VCC), the package implements
a new canopy height model (CHM) algorithm, popular individual tree
crown (ITC) detection algorithms, and a number of other algorithms
that produce useful features for statistical modeling, including the
distance of trees from plot center.
For further details please consult: gapfraction: R functions for LiDAR canopy light transmission
Please see some simple demonstration of the code below:
# devtools::install_github("adam-erickson/gapfraction", dependencies=TRUE)
library(raster)
library(gapfraction)
data(las)
# This function implements Erickson's hemispherical-Voronoi gap fraction algorithm
# with four common lens geometries: equi-distant, equi-angular, stereographic, and orthographic
P.hv(
las = las,
model = "equidist",
thresh.val = 1.25,
thresh.var = "height",
reprojection = NA,
pol.deg = 5,
azi.deg = 45,
col = "height",
plots = TRUE,
plots.each = FALSE,
plots.save= FALSE
)
Output:

Visualizing distance between nodes according to weights - with R

I'm trying to draw a graph where the distance between vertices correspond to the edge weights* and I've founde that in graphviz there is a way to draw such graph. Is there a way to do this in R with the igraph package (specfically with graph.adkacency)?
Thanks,
Noam
(as once have been asked: draw a graph where the distance between vertices correspond to the edge weights)

This is not possible as you need triangle equality for every triangle to be able to plot such an object. So you can only approximate it. For this you can use "force embedded" algorithms. There are a few in igraph. The one I often use is the Fruchterman-Reingold algorithm.
See for details:
library("igraph")
?layout.fruchterman.reingold
Edit:
Note that the distance between nodes will correspond somewhat with the inverse of the absolute edge weight.

Like Sacha Epskamp mentioned, unless your data is perfect, you cannot draw a graph that would not violate some triangular inequalities. However, there are techniques named Multidimensional scaling (MDS) targeted at minimizing such violations.
One implementation in R is cmdscale from the stats package. I'd recommend the example at the bottom of ?cmdscale:
> require(graphics)
>
> loc <- cmdscale(eurodist)
> x <- loc[,1]
> y <- -loc[,2]
> plot(x, y, type="n", xlab="", ylab="", main="cmdscale(eurodist)")
> text(x, y, rownames(loc), cex=0.8)
Of course, you can plot x and y using any graphics packages (you were inquiring about igraph specifically).
Finally, I'm sure you'll find plenty of other implementations if you search for "multidimensional scaling" or "MDS". Good luck.

visualization for high-dimensional points in R

I have a centroid, e.g., A. and I have other 100 points. All of these points are of high-dimensions, e.g, 1000 dimensions. Is there a way to visualize these points in a two-dimensional space in-terms of their distance with A.

A common (though simple) way to visualize high-dimensional points in low dimensional space is to use some form of multi-dimensional scaling:
dat <- matrix(runif(1000*99),99,1000)
#Combine with "special" point
dat <- rbind(rep(0.1,1000),dat)
out <- cmdscale(dist(dat),k = 2)
#Plot everything, highlighting our "special" point
plot(out)
points(out[1,1],out[1,2],col = "red")
You can also check out isoMDS or sammon in the MASS package for other implementations in R.

The distance (by which I assume you mean the norm of the difference vector) is only 1 value, so you can calculate these norms and show them on a 1D plot, but for 2D you'll need a second parameter.

What techniques exists in R to visualize a "distance matrix"?

I wish to present a distance matrix in an article I am writing, and I am looking for good visualization for it.
So far I came across balloon plots (I used it here, but I don't think it will work in this case), heatmaps (here is a nice example, but they don't allow to present the numbers in the table, correct me if I am wrong. Maybe half the table in colors and half with numbers would be cool) and lastly correlation ellipse plots (here is some code and example - which is cool to use a shape, but I am not sure how to use it here).
There are also various clustering methods but they will aggregate the data (which is not what I want) while what I want is to present all of the data.
Example data:
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dist(nba[1:20, -1], )
I am open for ideas.

You could also use force-directed graph drawing algorithms to visualize a distance matrix, e.g.
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dist_m <- as.matrix(dist(nba[1:20, -1]))
dist_mi <- 1/dist_m # one over, as qgraph takes similarity matrices as input
library(qgraph)
jpeg('example_forcedraw.jpg', width=1000, height=1000, unit='px')
qgraph(dist_mi, layout='spring', vsize=3)
dev.off()

Tal, this is a quick way to overlap text over an heatmap. Note that this relies on image rather than heatmap as the latter offsets the plot, making it more difficult to put text in the correct position.
To be honest, I think this graph shows too much information, making it a bit difficult to read... you may want to write only specific values.
also, the other quicker option is to save your graph as pdf, import it in Inkscape (or similar software) and manually add the text where needed.
Hope this helps
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dst <- dist(nba[1:20, -1],)
dst <- data.matrix(dst)
dim <- ncol(dst)
image(1:dim, 1:dim, dst, axes = FALSE, xlab="", ylab="")
axis(1, 1:dim, nba[1:20,1], cex.axis = 0.5, las=3)
axis(2, 1:dim, nba[1:20,1], cex.axis = 0.5, las=1)
text(expand.grid(1:dim, 1:dim), sprintf("%0.1f", dst), cex=0.6)

A Voronoi Diagram (a plot of a Voronoi Decomposition) is one way to visually represent a Distance Matrix (DM).
They are also simple to create and plot using R--you can do both in a single line of R code.
If you're not famililar with this aspect of computational geometry, the relationship between the two (VD & DM) is straightforward, though a brief summary might be helpful.
Distance Matrices--i.e., a 2D matrix showing the distance between a point and every other point, are an intermediate output during kNN computation (i.e., k-nearest neighbor, a machine learning algorithm which predicts the value of a given data point based on the weighted average value of its 'k' closest neighbors, distance-wise, where 'k' is some integer, usually between 3 and 5.)
kNN is conceptually very simple--each data point in your training set is in essence a 'position' in some n-dimension space, so the next step is to calculate the distance between each point and every other point using some distance metric (e.g., Euclidean, Manhattan, etc.). While the training step--i.e., construcing the distance matrix--is straightforward, using it to predict the value of new data points is practically encumbered by the data retrieval--finding the closest 3 or 4 points from among several thousand or several million scattered in n-dimensional space.
Two data structures are commonly used to address that problem: kd-trees and Voroni decompositions (aka "Dirichlet tesselation").
A Voronoi decomposition (VD) is uniquely determined by a distance matrix--i.e., there's a 1:1 map; so indeed it is a visual representation of the distance matrix, although again, that's not their purpose--their primary purpose is the efficient storage of the data used for kNN-based prediction.
Beyond that, whether it's a good idea to represent a distance matrix this way probably depends most of all on your audience. To most, the relationship between a VD and the antecedent distance matrix will not be intuitive. But that doesn't make it incorrect--if someone without any statistics training wanted to know if two populations had similar probability distributions and you showed them a Q-Q plot, they would probably think you haven't engaged their question. So for those who know what they are looking at, a VD is a compact, complete, and accurate representation of a DM.
So how do you make one?
A Voronoi decomp is constructed by selecting (usually at random) a subset of points from within the training set (this number varies by circumstances, but if we had 1,000,000 points, then 100 is a reasonable number for this subset). These 100 data points are the Voronoi centers ("VC").
The basic idea behind a Voronoi decomp is that rather than having to sift through the 1,000,000 data points to find the nearest neighbors, you only have to look at these 100, then once you find the closest VC, your search for the actual nearest neighbors is restricted to just the points within that Voronoi cell. Next, for each data point in the training set, calculate the VC it is closest to. Finally, for each VC and its associated points, calculate the convex hull--conceptually, just the outer boundary formed by that VC's assigned points that are farthest from the VC. This convex hull around the Voronoi center forms a "Voronoi cell." A complete VD is the result from applying those three steps to each VC in your training set. This will give you a perfect tesselation of the surface (See the diagram below).
To calculate a VD in R, use the tripack package. The key function is 'voronoi.mosaic' to which you just pass in the x and y coordinates separately--the raw data, not the DM--then you can just pass voronoi.mosaic to 'plot'.
library(tripack)
plot(voronoi.mosaic(runif(100), runif(100), duplicate="remove"))

You may want to consider looking at a 2-d projection of your matrix (Multi Dimensional Scaling). Here is a link to how to do it in R.
Otherwise, I think you are on the right track with heatmaps. You can add in your numbers without too much difficulty. For example, building of off Learn R :
library(ggplot2)
library(plyr)
library(arm)
library(reshape2)
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
nba$Name <- with(nba, reorder(Name, PTS))
nba.m <- melt(nba)
nba.m <- ddply(nba.m, .(variable), transform,
rescale = rescale(value))
(p <- ggplot(nba.m, aes(variable, Name)) + geom_tile(aes(fill = rescale),
colour = "white") + scale_fill_gradient(low = "white",
high = "steelblue")+geom_text(aes(label=round(rescale,1))))

A dendrogram based on a hierarchical cluster analysis can be useful:
http://www.statmethods.net/advstats/cluster.html
A 2-D or 3-D multidimensional scaling analysis in R:
http://www.statmethods.net/advstats/mds.html
If you want to go into 3+ dimensions, you might want to explore ggobi / rggobi:
http://www.ggobi.org/rggobi/

In the book "Numerical Ecology" by Borcard et al. 2011 they used a function called *coldiss.r *
you can find it here: http://ichthyology.usm.edu/courses/multivariate/coldiss.R
it color codes the distances and even orders the records by dissimilarity.
another good package would be the seriation package.
Reference:
Borcard, D., Gillet, F. & Legendre, P. (2011) Numerical Ecology with R. Springer.

A solution using Multidimensional Scaling
data = read.csv("http://datasets.flowingdata.com/ppg2008.csv", sep = ",")
dst = tcrossprod(as.matrix(data[,-1]))
dst = matrix(rep(diag(dst), 50L), ncol = 50L, byrow = TRUE) +
matrix(rep(diag(dst), 50L), ncol = 50L, byrow = FALSE) - 2*dst
library(MASS)
mds = isoMDS(dst)
#remove {type = "n"} to see dots
plot(mds$points, type = "n", pch = 20, cex = 3, col = adjustcolor("black", alpha = 0.3), xlab = "X", ylab = "Y")
text(mds$points, labels = rownames(data), cex = 0.75)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex