Mapping distances to colors - r

Assuming a matrix of distances between a number of samples, I would like to somehow reasonably map these distances to a color space. So for example if you have three apparent clusters, they should have different colors, and within a cluster you would have a number of shades of a color. However, I would like to avoid explicit clustering, if possible.
Clearly, the mapping cannot be perfect and universal: rather, it is a heuristic.
Is there a known algorithm for that? Or, perhaps, a ready solution for R?

Here is one possibility. No matter how many dimensions your original data was, you can use multi-dimensional scaling with the distance matrix to project the data to three dimensions, in a way that coarsely preserves distances. If you treat the three dimensions as R, G and B this will give a color scheme in which points that are close should have "close" colors.
Here is a simple example. I generate some 5-dimensional data with 4 clusters (although no cluster analysis is performed). From that, we get the distance matrix. Then, as above we use multi-dimensional scaling to turn this into a color map. The points are plotted to show the result.
## Generate some sample data
set.seed(1234)
v = c(rnorm(80,0,1), rnorm(80,0,1), rnorm(80,4,1), rnorm(80,4,1))
w = c(rnorm(80,0,1), rnorm(80,4,1), rnorm(80,0,1), rnorm(80,4,1))
x = c(rnorm(80,0,1), rnorm(80,0,1), rnorm(80,4,1), rnorm(80,4,1))
y = c(rnorm(80,0,1), rnorm(80,4,1), rnorm(80,0,1), rnorm(80,4,1))
z = c(rnorm(80,0,1), rnorm(80,4,1), rnorm(80,-4,1), rnorm(80,8,1))
df = data.frame(v,w,x,y,z)
## Distance matrix
D = dist(df)
## Project to 3-dimensions
PROJ3 = cmdscale(D, 3)
## Scale the three dimensions to [0,1] interval
ScaledP3 = apply(PROJ3, 2, function(x) { (x - min(x))/(max(x)-min(x)) })
colnames(ScaledP3) = c("red", "green", "blue")
X = as.data.frame(ScaledP3)
## Convert to color map
ColorMap = do.call(rgb, X)
plot(x,y, pch=20, col=ColorMap)

Related

How to circle variable to observed (not latent) variables in dagitty plot

How would I put a circle around certaiin variables in the following plot?
library(dagitty)
g = dagitty('dag{
A [pos="-1,0.5"]
W [pos="0.893,-0.422"]
X [adjusted,pos="0,-0.5"]
Y [pos="1,0.5"]
A -> Y
X -> A
X -> W
X -> Y
}')
png("mp.png", width = 500, height = 500,res=300)
plot(g)
dev.off()
In the web based tool you can indicate eg latent or adjusted and it changes the color of the circle, but this is not quite what I am looking for, although if it were possible to get these in the plot from R that would be sufficient, although I don't really like the way the variable is next to the circle in the web based version. I really wanted to circle observed variables and not circle unobserved ones.
I wrote a function which takes the points you want to circle as input, extracts the position of said points and circles them.
library(dagitty)
g = dagitty('dag{
A [pos="-1,0.5"]
W [pos="0.893,-0.422"]
X [adjusted,pos="0,-0.5"]
Y [pos="1,0.5"]
A -> Y
X -> A
X -> W
X -> Y
}')
circle_points <- function(points_to_circle, g) {
#few regexs to extract the points and the positions from "g"
#can surely be optimized, made nicer and more robust but it works for now
fsplit <- strsplit(g[1], "\\]")[[1]]
fsplit <- fsplit[-length(fsplit)]
fsplit <- substr(fsplit, 1, nchar(fsplit)-1)
fsplit[1] <- substr(fsplit[1], 6, nchar(fsplit))
vars <- sapply(regmatches(fsplit,
regexec("\\\n(.*?)\\s*\\[", fsplit)), "[", 2)
pos <- sub(".*pos=\\\"", "", fsplit)
#build dataframe with extracted information
res_df <- data.frame(vars = vars,
posx = sapply(strsplit(pos, ","), "[",1),
posy = sapply(strsplit(pos, ","), "[",2))
df_to_circle <- res_df[res_df$vars %in% points_to_circle,]
#y-position seems to be inverted and has to be multiplied by -1
points(c(as.numeric(df_to_circle$posx)),
c(as.numeric(df_to_circle$posy) * -1),
cex = 4)
}
plot(g)
circle_points(c("A", "Y"), g)
This results in:
You can of course work with the cex parameter, adding colors etc. It seems that the positioning of the circles is a bit off-centered so maybe manipulate the x and y positions in circle_points by a slim margin.
I did not find any information in dagitty, but bnlearn package can add circle/or other shape easily. But I just noticed you only want to add circle to observed traits rather than latent variables (better mentioned this in your title). Then my code might not be what you are looking for. I still attached the code here for your reference. Alternatively, you can distinguish observed/latent traits in different color. This can be easily done using bnlearn (https://www.bnlearn.com/examples/graphviz-plot/)
library(bnlearn)
tree = model2network("[X][W|X][A|X][Y|A:X]")
graphviz.plot(tree, main = "DAG structure", shape = "circle",
layout = "circo")

Drawing a smooth implicit surface with misc3d

The misc3d package provides a great implementation of the marching cubes algorithm, allowing to plot implicit surfaces.
For example, let's plot a Dupin cyclide:
a = 0.94; mu = 0.56; c = 0.34 # cyclide parameters
f <- function(x, y, z, a, c, mu){ # implicit equation f(x,y,z)=0
b <- sqrt(a^2-c^2)
(x^2+y^2+z^2-mu^2+b^2)^2 - 4*(a*x-c*mu)^2 - 4*b^2*y^2
}
# define the "voxel"
nx <- 50; ny <- 50; nz <- 25
x <- seq(-c-mu-a, abs(mu-c)+a, length=nx)
y <- seq(-mu-a, mu+a, length=ny)
z <- seq(-mu-c, mu+c, length=nz)
g <- expand.grid(x=x, y=y, z=z)
voxel <- array(with(g, f(x,y,z,a,c,mu)), c(nx,ny,nz))
# plot the surface
library(misc3d)
surf <- computeContour3d(voxel, level=0, x=x, y=y, z=z)
drawScene.rgl(makeTriangles(surf))
Nice, except that the surface is not smooth.
The documentation of drawScene.rgl says: "Object-specific rendering features such as smoothing and material are controlled by setting in the objects." I don't know what does that mean. How to get a smooth surface?
I have a solution but not a straightforward one: this solution consists in building a mesh3d object from the output of computeContour3d, and to include the surface normals in this mesh3d.
The surface normals of an implicit surface defined by f(x,y,z)=0 are simply given by the gradient of f. It is not hard to derive the gradient for this example.
gradient <- function(xyz,a,c,mu){
x <- xyz[1]; y <- xyz[2]; z <- xyz[3]
b <- sqrt(a^2-c^2)
c(
2*(2*x)*(x^2+y^2+z^2-mu^2+b^2) - 8*a*(a*x-c*mu),
2*(2*y)*(x^2+y^2+z^2-mu^2+b^2) - 8*b^2*y,
2*(2*z)*(x^2+y^2+z^2-mu^2+b^2)
)
}
Then the normals are computed as follows:
normals <- apply(surf, 1, function(xyz){
gradient(xyz,a,c,mu)
})
Now we are ready to make the mesh3d object:
mesh <- list(vb = rbind(t(surf),1),
it = matrix(1:nrow(surf), nrow=3),
primitivetype = "triangle",
normals = rbind(-normals,1))
class(mesh) <- c("mesh3d", "shape3d")
And finally to plot it with rgl:
library(rgl)
shade3d(mesh, color="red")
Nice, the surface is smooth now.
But is there a more straightforward way to get a smooth surface, without building a mesh3d object? What do they mean in the documentation: "Object-specific rendering features such as smoothing and material are controlled by setting in the objects."?
I don't know what the documentation is suggesting. However, you can do it via a mesh object slightly more easily than you did (though the results aren't quite as nice), using the addNormals() function to calculate the normals automatically rather than by formula.
Here are the steps:
Compute the surface as you did.
Create the mesh without normals. This is basically what you did, but using tmesh3d():
mesh <- tmesh3d(t(surf), matrix(1:nrow(surf), nrow=3), homogeneous = FALSE)
Calculate which vertices are duplicates of which others:
verts <- apply(mesh$vb, 2, function(column) paste(column, collapse = " "))
firstcopy <- match(verts, verts)
Rewrite the indices to use the first copy. This is necessary, since the misc3d functions give a collection of disconnected triangles; we need to work out which are connected.
it <- as.numeric(mesh$it)
it <- firstcopy[it]
dim(it) <- dim(mesh$it)
mesh$it <- it
At this point, there are a lot of unused vertices in the mesh; if memory was a problem you might want to add a step to remove them. I'm going to skip that.
Add the normals
mesh <- addNormals(mesh)
Here are the before and after shots. Left is without normals, right is with them.
It's not quite as smooth as your solution using computed normals, but it's not always easy to find those.
There's an option smooth in the makeTriangles function:
drawScene.rgl(makeTriangles(surf, smooth=TRUE))
I think the result is equivalent to #user2554330's solution, but this is more straightforward.
EDIT
The result is highly better with the rmarchingcubes package:
library(rmarchingcubes)
contour_shape <- contour3d(
griddata = voxel, level = 0,
x = x, y = y, z = z
)
library(rgl)
tmesh <- tmesh3d(
vertices = t(contour_shape[["vertices"]]),
indices = t(contour_shape[["triangles"]]),
normals = contour_shape[["normals"]],
homogeneous = FALSE
)
open3d(windowRect = c(50, 50, 562, 562))
view3d(zoom=0.8)
shade3d(tmesh, color = "darkred")

Coordinates from distance matrix in R

Is there a package to convert a distance matrix to a set of coordinates?
I have gone throught the below question. I was hoping there would be a package for this.
Finding the coordinates of points from distance matrix
I have considered Sammons Projection for this but from what I understand, it is an optimizer and gets you an optimum solution. I think there should be an algorithm to get a unique solution for this.
Multidimensional scaling (MDS) aims to project the distance matrix of your data to a lower dimension k, where desired k = 2 in your case, while trying to preserve the distances between data points:
# Multidimensional scaling
library(MASS)
set.seed(1)
labels <- as.factor(sample(LETTERS[1:5], 100, replace=TRUE))
dat <- mvrnorm(n=100, mu = c(1:4), Sigma=matrix(1:16, ncol=4)) + as.numeric(labels)^2
#> dim(dat)
#[1] 100 4
# Euclidean distance matrix (100x100)
d <- dist(dat)
# Classical MDS for distance matrix d
# http://en.wikipedia.org/wiki/Multidimensional_scaling
mds <- cmdscale(d, k = 2)
x <- mds[,1]
y <- mds[,2]
plot(x,y, col=rainbow(5)[as.numeric(labels)], pch=16, main="MDS for object 'dat'")
legend("topright", legend=unique(labels), col=rainbow(5)[unique(as.numeric(labels))], pch=16)
Further reading: https://stats.stackexchange.com/questions/14002/whats-the-difference-between-principal-components-analysis-and-multidimensional
Look-up an algorithm called Multi-Dimensional Scaling (MDS). An implementation in R is the cmdscale function from the stats package:
Multidimensional scaling takes a set of dissimilarities and returns a set of points such that the distances between the points are approximately equal to the dissimilarities.
The documentation also has an example where a distance matrix is turned into two vectors of x and y coordinates, then plotted.

spatial distribution of points, R

What would be an easy way to generate a 3 different spatial distribution of points (N = 20 points) using R. For example, 1) random, 2) uniform, and 3) clustered on the same space (50 x 50 grid)?
1) Here's one way to get a very even spacing of 5 points in a 25 by 25 grid numbered from 1 each direction. Put points at (3,18), (8,3), (13,13), (18,23), (23,8); you should be able to generalize from there.
2) as you suggest, you could use runif ... but I'd have assumed from your question you actually wanted points on the lattice (i.e. integers), in which case you might use sample.
Are you sure you want continuous rather than discrete random variables?
3) This one is "underdetermined" - depending on how you want to define things there's a bunch of ways you might do it. e.g. if it's on a grid, you could sample points in such a way that points close to (but not exactly on) already sampled points had a much higher probability than ones further away; a similar setup works for continuous variables. Or you could generate more points than you need and eliminate the loneliest ones. Or you could start with random uniform points and them make them gravitate toward their neighbors. Or you could generate a few cluster-centers (4-10, say), and then scatter points about those centers. Or you could do any of a hundred other things.
A bit late, but the answers above do not really address the problem. Here is what you are looking for:
library(sp)
# make a grid of size 50*50
x1<-seq(1:50)-0.5
x2<-x1
grid<-expand.grid(x1,x2)
names(grid)<-c("x1","x2")
# make a grid a spatial object
coordinates(grid) <- ~x1+x2
gridded(grid) <- TRUE
First: random sampling
# random sampling
random.pt <- spsample(x = grid, n= 20, type = 'random')
Second: regular sampling
# regular sampling
regular.pt <- spsample(x = grid, n= 20, type = 'regular')
Third: clustered at a distance of 2 from a random location (can go outside the area)
# random sampling of one location
ori <- data.frame(spsample(x = grid, n= 1, type = 'random'))
# select randomly 20 distances between 0 and 2
n.point <- 20
h <- rnorm(n.point, 1:2)
# empty dataframe
dxy <- data.frame(matrix(nrow=n.point, ncol=2))
# take a random angle from the randomly selected location and make a dataframe of the new distances from the original sampling points, in a random direction
angle <- runif(n = n.point,min=0,max=2*pi)
dxy[,1]= h*sin(angle)
dxy[,2]= h*cos(angle)
cluster <- data.frame(x=rep(NA, 20), y=rep(NA, 20))
cluster$x <- ori$coords.x1 + dxy$X1
cluster$y <- ori$coords.x2 + dxy$X2
# make a spatial object and plot
coordinates(cluster)<- ~ x+y
plot(grid)
plot(cluster, add=T, col='green')
plot(random.pt, add=T, col= 'red')
plot(regular.pt, add=T, col= 'blue')

Generating multidimensional data

Does R have a package for generating random numbers in multi-dimensional space? For example, suppose I want to generate 1000 points inside a cuboid or a sphere.
I have some functions for hypercube and n-sphere selection that generate dataframes with cartesian coordinates and guarantee a uniform distribution through the hypercube or n-sphere for an arbitrary amount of dimensions :
GenerateCubiclePoints <- function(nrPoints,nrDim,center=rep(0,nrDim),l=1){
x <- matrix(runif(nrPoints*nrDim,-1,1),ncol=nrDim)
x <- as.data.frame(
t(apply(x*(l/2),1,'+',center))
)
names(x) <- make.names(seq_len(nrDim))
x
}
is in a cube/hypercube of nrDim dimensions with a center and l the length of one side.
For an n-sphere with nrDim dimensions, you can do something similar, where r is the radius :
GenerateSpherePoints <- function(nrPoints,nrDim,center=rep(0,nrDim),r=1){
#generate the polar coordinates!
x <- matrix(runif(nrPoints*nrDim,-pi,pi),ncol=nrDim)
x[,nrDim] <- x[,nrDim]/2
#recalculate them to cartesians
sin.x <- sin(x)
cos.x <- cos(x)
cos.x[,nrDim] <- 1 # see the formula for n.spheres
y <- sapply(1:nrDim, function(i){
if(i==1){
cos.x[,1]
} else {
cos.x[,i]*apply(sin.x[,1:(i-1),drop=F],1,prod)
}
})*sqrt(runif(nrPoints,0,r^2))
y <- as.data.frame(
t(apply(y,1,'+',center))
)
names(y) <- make.names(seq_len(nrDim))
y
}
in 2 dimensions, these give :
From code :
T1 <- GenerateCubiclePoints(10000,2,c(4,3),5)
T2 <- GenerateSpherePoints(10000,2,c(-5,3),2)
op <- par(mfrow=c(1,2))
plot(T1)
plot(T2)
par(op)
Also check out the copula package. This will generate data within a cube/hypercube with uniform margins, but with correlation structures that you set. The generated variables can then be transformed to represent other shapes, but still with relations other than independent.
If you want more complex shapes but are happy with uniform and idependent within the shape then you can just do rejection sampling: generate data within a cube that contains your shape, then test if the points are within your shape, reject them if not, then keep doing this until there are enough points.
A couple of years ago, I made a package called geozoo. It is available on CRAN.
install.packages("geozoo")
library(geozoo)
It has many different functions to produce objects in N-dimensions.
p = 4
n = 1000
# Cube with points on it's face.
# A 3D version would be a box with solid walls and a hollow interior.
cube.face(p)
# Hollow sphere
sphere.hollow(p, n)
# Solid cube
cube.solid.random(p, n)
cube.solid.grid(p, 10) # evenly spaced points
# Solid Sphere
sphere.solid.random(p, n)
sphere.solid.grid(p, 10) # evenly spaced points
One of my favorite ones to watch animate is a cube with points along its edges, because it was one of the first objects that I made. It also gives you a sense of distance between vertices.
# Cube with points along it's edges.
cube.dotline(4)
Also, check out the website: http://streaming.stat.iastate.edu/~dicook/geometric-data/. It contains pictures and downloadable data sets.
Hope it meets your needs!
Cuboid:
df <- data.frame(
x = runif(1000),
y = runif(1000),
z = runif(1000)
)
head(df)
x y z
1 0.7522104 0.579833314 0.7878651
2 0.2846864 0.520284731 0.8435828
3 0.2240340 0.001686003 0.2143208
4 0.4933712 0.250840233 0.4618258
5 0.6749785 0.298335804 0.4494820
6 0.7089414 0.141114804 0.3772317
Sphere:
df <- data.frame(
radius = runif(1000),
inclination = 2*pi*runif(1000),
azimuth = 2*pi*runif(1000)
)
head(df)
radius inclination azimuth
1 0.1233281 5.363530 1.747377
2 0.1872865 5.309806 4.933985
3 0.2371039 5.029894 6.160549
4 0.2438854 2.962975 2.862862
5 0.5300013 3.340892 1.647043
6 0.6972793 4.777056 2.381325
Note: edited to include code for sphere
Here is one way to do it.
Say we hope to generate a bunch of 3d points of the form y = (y_1, y_2, y_3)
Sample X from multivariate Gaussian with mean zero and covariance matrix R.
(x_1, x_2, x_3) ~ Multivariate_Gaussian(u = [0,0,0], R = [[r_11, r_12, r_13],r_21, r_22, r_23], [r_31, r_32, r_33]]
You can find a function which generates Multivariate Gaussian samples in an R package.
Take the Gaussian cdf of each covariate (phi(x_1) , phi(x_2), phi(x_3)). In this case, phi is the Gaussian cdf of our variables. Ie phi(x_1) = Pr[x <= x_1] By the probability integral transform, these (phi(x_1) , phi(x_2), phi(x_3)) = (u_1, u_2, u_3), will each be uniformly distrubted on [0,1].
Then, take the inverse cdf of each uniformly distributed marginal. In other words take the inverse cdf of u_1, u_2, u_3:
F^{-1}(u_1), F^{-2}(u_2), F^{-3}(u_3) = (y_1, y_2, y_3), where F is the marginal cdf of the distrubution you are trying to sample from.

Resources