Scanning and storing a simple image in a complex matrix - r

I have been playing with linear algebra transformations in R, moving around a bunch of points plotted in the complex plane. I have posted the results here - the code is linked on the first sentence.
I would like to do the same operations on a real image. Evidently I don't want to get into Fourier transforming the image, or dealing with color or grayscale. I would like to get any old jpeg, turn it into a summarized plot of black and white dots, locate each dot in terms of its position in the complex plane, and then apply the linear algebra operations as I did to my drawing of a house.
The questions are, 1. What is the name for the type of stripped-down, basic black and white image that I am describing? 2. How can I turn a regular jpeg (or other file) into that type of image? How can then store every dot of the thousands of dots the image will contain into a matrix of complex numbers?
Is there software to do this? Is there code in R or python to do it?

It's not clear what you're trying to do with those complex vectors, that wouldn't be more easily obtained using standard x,y coordinates, but here goes a possible starting point
library(jpeg)
im <- readJPEG(system.file("img", "Rlogo.jpg", package="jpeg"))
gr <- apply(im, 1:2, mean)
bw <- which(gr < 0.5, arr.ind = TRUE)
conjure_matrix_of_darkness <- function(bw, xlim=c(-2, 2), ylim=c(-2,2)){
x <- (bw[,1] - min(bw[,1]))/diff(range(bw[,1])) * diff(xlim) + min(xlim)
y <- (bw[,2] - min(bw[,2]))/diff(range(bw[,2])) * diff(ylim) + min(ylim)
x+1i*y
}
test <- conjure_matrix_of_darkness(bw)
par(mfrow=c(2,1), mar=c(0,0,0,0))
plot(test, pch=19, xaxt="n", yaxt="n")
plot(test*exp(1i*pi), pch=19, xaxt="n", yaxt="n")

Related

Is it possible to add marks to a 3D point pattern using SpatStat in Rstudio?

I am using spatstat software to analyse the spatial relationship between cells. I have used this in the past for 2D and it was fine. I am now using it for 3D analysis from confocal images.
My problem is that I am unable to plot the resultant point pattern with different marks. I am able to assign the marks (cell type) to each point and plot the 3D point pattern. However the marks do not appear on the plot as different characters.
Series6 <-pp3(Series6[,5], Series6[,6],Series6[,7],box3(c(0,775),c(0,775),c(0,30)), marks = Series6[,8])
The resultant plot: 3D Plot
I have tried it with 2D point patterns and it works fine displaying a different character for each mark. This makes me think that the marks feature isn't compatible with 3D point patterns. I would be grateful if anyone had any tips.
A three dimensional point pattern (class pp3) can have marks. Your code has successfully assigned marks to the points. You could check this by typing head(as.data.frame(Series6)) which would show the coordinates and marks of the first few points.
The problem is that the plot method, plot.pp3, does not display the marks. This feature is not yet implemented.
I will take this question as a feature request for spatstat.
In the meantime, you can get the desired effect by splitting the point pattern into subsets with different marks (assuming the marks are categorical) and plotting each subset in turn. Here is a function to do that:
plotbymark <- function(X, cols=NULL, chars=NULL, main="") {
require(spatstat)
stopifnot(is.pp3(X))
stopifnot(is.multitype(X))
Y <- split(X, un=TRUE)
m <- length(Y)
mm <- seq_len(m)
if(is.null(cols)) cols <- mm
if(is.null(chars)) chars <- mm
for(i in mm) {
plot(Y[[i]], col=cols[i], pch=chars[i], add=(i > 1), main=main)
}
explain <- data.frame(type=names(Y), col=cols, pch=chars)
return(invisible(explain))
}

How do I make planes in RGL thicker?

I will try 3D printing data to make some nice visual illustration for a binary classification example.
Here is my 3D plot:
require(rgl)
#Get example data from mtcars and normalize to range 0:1
fun_norm <- function(k){(k-min(k))/(max(k)-min(k))}
x_norm <- fun_norm(mtcars$drat)
y_norm <- fun_norm(mtcars$mpg)
z_norm <- fun_norm(mtcars$qsec)
#Plot nice big spheres with rgl that I hope will look good after 3D printing
plot3d(x_norm, y_norm, z_norm, type="s", radius = 0.02, aspect = T)
#The sticks are meant to suspend the spheres in the air
plot3d(x_norm, y_norm, z_norm, type="h", lwd = 5, aspect = T, add = T)
#Nice thick gridline that will also be printed
grid3d(c("x","y","z"), lwd = 5)
Next, I wanted to add a z=0 plane, inspired by this blog here describing the r2stl written by Ian Walker. It is supposed to be the foundation of the printed structure that holds everything together.
planes3d(a=0, b=0, c=1, d=0)
However, it has no volume, it is a thin slab with height=0. I want it to form a solid base for the printed structure, which is meant to keep everything together (check out the aforementioned blog for more details, his examples are great). How do I increase the thickness of my z=0 plane to achieve the same effect?
Here is the final step to exporting as STL:
writeSTL("test.stl")
One can view the final product really nicely using the open source Meshlab as recommended by Ian in the blog.
Additional remark: I noticed that the thin plane is also separate from the grids that I added on the -z face of the cube and is floating. This might also cause a problem when printing. How can I merge the grids with the z=0 plane? (I will be sending the STL file to a friend who will print for me, I want to make things as easy for him as possible)
You can't make a plane thicker. You can make a solid shape (extrude3d() is the function to use). It won't adapt itself to the bounding box the way a plane does, so you would need to draw it last.
For example,
example(plot3d)
bbox <- par3d("bbox")
slab <- translate3d(extrude3d(bbox[c(1,2,2,1)], bbox[c(3,3,4,4)], 0.5),
0,0, bbox[5])
shade3d(slab, col = "gray")
produces this output:
This still isn't printable (the points have no support), but it should get you started.
In the matlib package, there's a function regvec3d() that draws a vector space representation of a 2-predictor multiple regression model. The plot method for the result of the function has an argument show.base that draws the base x1-x2 plane, and draws it thicker if show.base >0.
It is a simple hack that just draws a second version of the plane at a small offset. Maybe this will be enough for your application.
if (show.base > 0) planes3d(0, 0, 1, 0, color=col.plane, alpha=0.2)
if (show.base > 1) planes3d(0, 0, 1, -.01, color=col.plane, alpha=0.1)

Visualizing big-data xy regression plots in R (maybe contour histograms?)

I have 1 million x-y data points. 100,000 of them are from foo; 900,000 of them are from bar. And perhaps a few unusual mass points. Let me help my audience visualize them, and not merely the regression or loess lines but the data. Let me draw bars in red, and foos in blue, and then my two loess lines on top of them. think something like
K <- 1000 ; M <- K*K ; HT <- 100*K
x <- rnorm(M); y <- x+rnorm(M); y[1:HT] <- y[1:HT]+1 ; x[HT:(HT*2)] <- y[HT:(HT*2)] <- 0
pdf(file="try.pdf")
plot( x, y, col="blue", pch=".")
points( x[1:HT], y[1:HT], col="red", pch="." )
## scatter.smooth( x[1:HT], y[1:HT] ), but this seems to take forever
dev.off()
this is not only not a great visual (for example, the high-elevation zero point is lost), but also creates a 7.5MB(!) pdf file. my previewer almost chokes on it, too. (hint: jpeg compression is pretty good for the problem. that is, instead of the pdf(), just use jpeg and a different file extension. drawback: the axes become fuzzily compressed, too.)
so, I need some better ideas. I am thinking two-dimensional filled.contourplot on the full data set (in a gray-scale reaching not too far towards black), with a plain contour overlay of the 1:HT points, and then two loess overlays. alas, even to do this, I need to start off smoothing the number of data points that appear at an x-y location, and presumably binning-first is not the best way to do this---it would throw away information, which the contour plot could use.
alternatively, I could stay with the standard xy plot, and simply cull random points until the file is small enough and the visuals good enough. this could be done perhaps better via binning, too.
better ideas?

R, rgl, plotting points and ellipses

I am using R to visualize some data. I am found RGL to be a great library for plotting points.
points3d(x,y,z)
where x = c(x1,x2, ...), y = c(y1,y2,...), z = c(z1,z2, ...) and x,y,z have the same length, is a great function for plotting large sets of data.
Now, I would like to plot ellipses, mixed in with the data. I have a characterization of ellipses by a center point C, a vector describing the major axis U, and a vector describing the minor axis V. I obtain points P on the boundary of the ellipse by
P = U*cos(t) + V*sin(t) (t ranges between 0 and 2*pi)
obtaining vectors, xt, yt, and zt. Then I can plot the ellipse with
polygon3d(xt,yt,zt)
It works fine, but I'm guessing everyone reading is cringing, and will tell me that this is a bad way to do this. Indeed it takes a couple seconds to render each ellipse this way.
I don't think the ellipse3d function from the RGL package works here; at the very least, I am not working a matrix of covariances, nor do I understand how to get the ellipse I want from this function. Also, it returns an ellipsoid, not an ellipse.
****** EDIT ************
For a concrete example that takes awhile:
library(rgl)
open3d()
td <- c(0:359)
t <- td*pi/180
plotEllipseFromVector <- function(c,u,v){
xt <- c[1] + u[1]*cos(t) + v[1]*sin(t)
yt <- c[2] + u[2]*cos(t) + v[2]*sin(t)
zt <- c[3] + u[3]*cos(t) + v[3]*sin(t)
polygon3d(xt,yt,zt)
}
Input center point, major, and minor axis you want. It takes just over 2 seconds for me.
On the other hand, if I change t to be 0,20,40,... 340, then it works quite fast.

What techniques exists in R to visualize a "distance matrix"?

I wish to present a distance matrix in an article I am writing, and I am looking for good visualization for it.
So far I came across balloon plots (I used it here, but I don't think it will work in this case), heatmaps (here is a nice example, but they don't allow to present the numbers in the table, correct me if I am wrong. Maybe half the table in colors and half with numbers would be cool) and lastly correlation ellipse plots (here is some code and example - which is cool to use a shape, but I am not sure how to use it here).
There are also various clustering methods but they will aggregate the data (which is not what I want) while what I want is to present all of the data.
Example data:
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dist(nba[1:20, -1], )
I am open for ideas.
You could also use force-directed graph drawing algorithms to visualize a distance matrix, e.g.
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dist_m <- as.matrix(dist(nba[1:20, -1]))
dist_mi <- 1/dist_m # one over, as qgraph takes similarity matrices as input
library(qgraph)
jpeg('example_forcedraw.jpg', width=1000, height=1000, unit='px')
qgraph(dist_mi, layout='spring', vsize=3)
dev.off()
Tal, this is a quick way to overlap text over an heatmap. Note that this relies on image rather than heatmap as the latter offsets the plot, making it more difficult to put text in the correct position.
To be honest, I think this graph shows too much information, making it a bit difficult to read... you may want to write only specific values.
also, the other quicker option is to save your graph as pdf, import it in Inkscape (or similar software) and manually add the text where needed.
Hope this helps
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dst <- dist(nba[1:20, -1],)
dst <- data.matrix(dst)
dim <- ncol(dst)
image(1:dim, 1:dim, dst, axes = FALSE, xlab="", ylab="")
axis(1, 1:dim, nba[1:20,1], cex.axis = 0.5, las=3)
axis(2, 1:dim, nba[1:20,1], cex.axis = 0.5, las=1)
text(expand.grid(1:dim, 1:dim), sprintf("%0.1f", dst), cex=0.6)
A Voronoi Diagram (a plot of a Voronoi Decomposition) is one way to visually represent a Distance Matrix (DM).
They are also simple to create and plot using R--you can do both in a single line of R code.
If you're not famililar with this aspect of computational geometry, the relationship between the two (VD & DM) is straightforward, though a brief summary might be helpful.
Distance Matrices--i.e., a 2D matrix showing the distance between a point and every other point, are an intermediate output during kNN computation (i.e., k-nearest neighbor, a machine learning algorithm which predicts the value of a given data point based on the weighted average value of its 'k' closest neighbors, distance-wise, where 'k' is some integer, usually between 3 and 5.)
kNN is conceptually very simple--each data point in your training set is in essence a 'position' in some n-dimension space, so the next step is to calculate the distance between each point and every other point using some distance metric (e.g., Euclidean, Manhattan, etc.). While the training step--i.e., construcing the distance matrix--is straightforward, using it to predict the value of new data points is practically encumbered by the data retrieval--finding the closest 3 or 4 points from among several thousand or several million scattered in n-dimensional space.
Two data structures are commonly used to address that problem: kd-trees and Voroni decompositions (aka "Dirichlet tesselation").
A Voronoi decomposition (VD) is uniquely determined by a distance matrix--i.e., there's a 1:1 map; so indeed it is a visual representation of the distance matrix, although again, that's not their purpose--their primary purpose is the efficient storage of the data used for kNN-based prediction.
Beyond that, whether it's a good idea to represent a distance matrix this way probably depends most of all on your audience. To most, the relationship between a VD and the antecedent distance matrix will not be intuitive. But that doesn't make it incorrect--if someone without any statistics training wanted to know if two populations had similar probability distributions and you showed them a Q-Q plot, they would probably think you haven't engaged their question. So for those who know what they are looking at, a VD is a compact, complete, and accurate representation of a DM.
So how do you make one?
A Voronoi decomp is constructed by selecting (usually at random) a subset of points from within the training set (this number varies by circumstances, but if we had 1,000,000 points, then 100 is a reasonable number for this subset). These 100 data points are the Voronoi centers ("VC").
The basic idea behind a Voronoi decomp is that rather than having to sift through the 1,000,000 data points to find the nearest neighbors, you only have to look at these 100, then once you find the closest VC, your search for the actual nearest neighbors is restricted to just the points within that Voronoi cell. Next, for each data point in the training set, calculate the VC it is closest to. Finally, for each VC and its associated points, calculate the convex hull--conceptually, just the outer boundary formed by that VC's assigned points that are farthest from the VC. This convex hull around the Voronoi center forms a "Voronoi cell." A complete VD is the result from applying those three steps to each VC in your training set. This will give you a perfect tesselation of the surface (See the diagram below).
To calculate a VD in R, use the tripack package. The key function is 'voronoi.mosaic' to which you just pass in the x and y coordinates separately--the raw data, not the DM--then you can just pass voronoi.mosaic to 'plot'.
library(tripack)
plot(voronoi.mosaic(runif(100), runif(100), duplicate="remove"))
You may want to consider looking at a 2-d projection of your matrix (Multi Dimensional Scaling). Here is a link to how to do it in R.
Otherwise, I think you are on the right track with heatmaps. You can add in your numbers without too much difficulty. For example, building of off Learn R :
library(ggplot2)
library(plyr)
library(arm)
library(reshape2)
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
nba$Name <- with(nba, reorder(Name, PTS))
nba.m <- melt(nba)
nba.m <- ddply(nba.m, .(variable), transform,
rescale = rescale(value))
(p <- ggplot(nba.m, aes(variable, Name)) + geom_tile(aes(fill = rescale),
colour = "white") + scale_fill_gradient(low = "white",
high = "steelblue")+geom_text(aes(label=round(rescale,1))))
A dendrogram based on a hierarchical cluster analysis can be useful:
http://www.statmethods.net/advstats/cluster.html
A 2-D or 3-D multidimensional scaling analysis in R:
http://www.statmethods.net/advstats/mds.html
If you want to go into 3+ dimensions, you might want to explore ggobi / rggobi:
http://www.ggobi.org/rggobi/
In the book "Numerical Ecology" by Borcard et al. 2011 they used a function called *coldiss.r *
you can find it here: http://ichthyology.usm.edu/courses/multivariate/coldiss.R
it color codes the distances and even orders the records by dissimilarity.
another good package would be the seriation package.
Reference:
Borcard, D., Gillet, F. & Legendre, P. (2011) Numerical Ecology with R. Springer.
A solution using Multidimensional Scaling
data = read.csv("http://datasets.flowingdata.com/ppg2008.csv", sep = ",")
dst = tcrossprod(as.matrix(data[,-1]))
dst = matrix(rep(diag(dst), 50L), ncol = 50L, byrow = TRUE) +
matrix(rep(diag(dst), 50L), ncol = 50L, byrow = FALSE) - 2*dst
library(MASS)
mds = isoMDS(dst)
#remove {type = "n"} to see dots
plot(mds$points, type = "n", pch = 20, cex = 3, col = adjustcolor("black", alpha = 0.3), xlab = "X", ylab = "Y")
text(mds$points, labels = rownames(data), cex = 0.75)

Resources