How to reproduce this graphical explanation (a scatter plot) of how covariance works? - r

I found this graphical intuitive explanation of covariance:
32 binormal points drawn from distributions with the given covariances, ordered from most negative (bluest) to most positive (reddest)
The whole material can be found at:
https://stats.stackexchange.com/questions/18058/how-would-you-explain-covariance-to-someone-who-understands-only-the-mean
I would like to recreate this sort of graphical illustration in R, but I'm not sufficiently familiar with R's plotting tools. I don't even know where to start in order to get those colored rectangles between each pair of data points, let alone make them semi-transparent.
I think this could make a very efficient teaching tool.

The cor.rect.plot function in the TeachingDemos package makes plots similar to what is shown. You can modify the code for the function to make the plot even more similar if you desire.

Related

changing default colours when using the plot function of the R package mixtools

I have a plotting problem with curves when using mixtools
Using the following R code
require(mixtools)
x <- c(rnorm(10000,8,2),rnorm(10000,18,5))
xMix <- normalmixEM(x, lambda=NULL, mu=NULL, sigma=NULL)
plot(xMix, which = 2, nclass=25)
I get a nice histogram, with the 2 normal curves estimated from the model superimposed.
The problem is with the default colours (i.e. red and green), which I need to change for a publication to be black and grey.
One way I thought to doing this was first to produce the histogram
hist(xMix$x, freq=FALSE, nclass=25)
and then add the lines using the "curve" function.
....... but I lost my way, and couldn't solve it
I would be grateful for any pointers or the actual solution
thanks
PS. Note that there is an alternative work-around to this problem using ggplot:
Any suggestions for how I can plot mixEM type data using ggplot2
but for various reasons I need to keep using the base graphics
You can also edit the colours directly using the col2 argument in the mixtools plotting function
For example
plot(xMix, which = 2, nclass=25, col2=c("dimgrey","black"))
giving the problem a bit more thought, I managed to rephrase the problem and ask the question in a much more direct way
Using user-defined functions within "curve" function in R graphics
this delivered two nice solutions of how to use the "curve" function to draw normal distributions produced by the mixture modelling.
the overall answer therefore is to use the "hist" function to draw a histogram of the raw data, then the "curve" function (incorporating the sdnorm function) to draw each normal distribution. This gives total control of the colours (and potentially any other graphic parameter).
And not to forget - this is where I got the code for the sdnorm function - and other useful insights
Any suggestions for how I can plot mixEM type data using ggplot2
Thanks as always to StackOverflow and the contributors who provide such helpful advice.

Why is there no col key for R's rgl?

I would like to draw $3$ dimensional scatter plots, or more precisely I have a program that gives me the mass distribution in the unit cube with respect to a 3 dimensional equidistant grid. You can interpret this as a continuous relaxation of a $3$ dimensional assignment problem if you want.
Anyway this is just to give you a very brief background since my actual problem is not really concerned with the maths behind the procedure but with the visualization. I have:
$n$ points in the unit cube $[0,1]^3$
each of the $n$ points is assigned a "weight" between $0$ and $\frac1n$ (typically a lot of the weights coincide, if there are too many different values, i use the cut command to reduce the range to, say $60$ different values)
And I'd like to plot the $n$ points in a color which corresponds to their weight.
Now I found the rgl Package in R which allows me to do exactly that and also provides a very nice interactive plot window but it doesn't seem to allow a "col key" parameter, i.e. I cannot add a continuous color legend to my plot.
On the other hand the package plot3D provides a function to do a $3$ dimensional scatterplot and easily allows me to add the col key. However plot3D does not work with interactive plots but merely gives me the option to specify the angle at which I want to look at the cube. In a $3$D setting I strongly prefer the interactive alternative.
Now is there a way to automatically add a continuous color legend to an rgl plot? If not, do you know why this hasn't been implemented? Or would you solve my problem completely different altogether?
P.S. sorry for the formatting, I'm new to SO and the math environment "$" doesn't seem to work here.
The reason this hasn't been implemented is because until fairly recently it wasn't easy to have a static legend and a dynamic plot in the same window.
Now it's easy; there's a legend3d() function that might do what you want, but I think you probably want a different sort of legend than it will draw. If you know how to draw what you want in 2D, you can use the bgplot3d() function to put it in the background of your plot.
Both of those options give bitmapped legends. It would also be possible to do vector-based legends, but that would be quite a bit more work.

making an interrupted plot with ggplot2

I am trying to make the following plot (which I made in R) using ggplot2. How do I go about doing this? (The plot has a finer resolution from 0 to 10, but coarser resolution from 10 onwards.) know how to do this in R, but I am not sure how to proceed with this in the case of ggplot2. Here is the figure obtained using base R.
To make things useful, note that i don't have that much interest in hanging on to the right-hand y-axis.
Also, I would like to do a similar task on a pair of barplots (have the same scale, and only one y-axis. I am hoping that that approach is similar.
I would also be open to other suggestions that would make a similar plot conveying the information similarly or better.
Many thanks for suggestions!

Polygon/contour around subset of vertices on graph (more precise than mark.groups in igraph)

Problem definition
I need to produce a number of specific graphs, and on these graphs, highlight subsets of vertices (nodes) by drawing a contour/polygon/range around or over them (see image below).
A graph may have multiple of these contours/ranges, and they may overlap, iff one or more vertices belong to multiple subsets.
Given a graph of N vertices, any subset may be of size 1..N.
However, vertices not belonging to a subset must not be inside the contour (as that would be misleading, so that's priority no. 1). This is gist of my problem.
All these graphs happen to have the property that the ranges are continuous, as the data they represent covers only directly connected subsets of vertices.
All graphs will be undirected and connected (no unconnected vertices will ever be plotted).
Reproducible attempts
I am using R and the igraph package. I have already tried some solutions, but none of them work well enough.
First attempt, mark.groups in plot.igraph:
library(igraph)
g = make_graph("Frucht")
l = layout.reingold.tilford(g,1)
plot(g, layout=l, mark.groups = c(1,3,6,12,5), mark.shape=1)
# bad, vertex 11 should not be inside the contour
plot(g, layout=l, mark.groups = c(1,6,12,5,11), mark.shape=1)
# 3 should not be in; image below
# just choosing another layout here is not a generalizable solution
The plot.igraph calls igraph.polygon, which calls convex_hull (also igraph), which calls xspline. The results is, from what I understand, something called a convex hull (which otherwise looks very nice!), but for my purposes that is not precise enough, covering vertices that should not be covered.
Second attempt with contour. So I tried implementing my own version, based on the solution suggested here:
library(MASS)
xx <- runif(5, 0, 1);yy <- abs(xx)+rnorm(5,0,0.2)
plot(xx,yy, xlim=c( min(xx)-sd(xx),max(xx)+sd(xx)), ylim =c( min(yy)-sd(yy), max(yy)+sd(yy)))
dens2 <- kde2d(xx, yy, lims=c(min(xx)-sd(xx), max(xx)+sd(xx), min(yy)- sd(yy), max(yy)+sd(yy) ),h=c(bandwidth.nrd(xx)/1.5, bandwidth.nrd(xx)/ 1.5), n=50 )
contour(dens2, level=0.001, col="red", add=TRUE, drawlabels=F)
The contour plot looks in principle like something I could use, given enough tweaking of the bandwidth and level values (to make the contour snug enough so it doesn't cover any points outside the group). However, this solution has the drawback that when the level value is too small, the contour breaks (doesn't produce a continuous area) - so if I would go that way, controlling for continuity (and determining good bandwidth/level values on the fly) automatically should be implemented. Another problem is, I cannot quite see how could I plot the contour over the plots produced by igraph: the layout.* commands produce what looks like a coordinate matrix, but the coordinates do not match the axis coordinates on the plot:
# compare:
layout.reingold.tilford(g,1)
plot(g, layout=l, axes=T)
The question:
What would be a better way to achieve the plotting of such ranges on graphs (ideally igraphs) in R that would meet the criteria outlined above - ranges that include only the vertices that belong to their subset and exclude all else - while being continous ranges?
The solution I am looking for should be scalable to graphs of different sizes and layouts that I may need to create (so hand-tweaking each graph by hand using e.g. tkplot is not a good solution). I am aware that on some graphs with some vertex groups, meeting both the criteria will indeed be impossible in practise, but intuitively it should be possible to implement something that still works most of the time with smallish (10..20 vertices) and not-too-complex graphs (ideally it would be possible to detect and give a warning if a perfectly fitting range could not be plotted). Either an improvement of the mark.groups approach (not necessarily within the package, but using the hull-idea mentioned above), or something with contour or a similar suitable function, or suggesting something else entirely would be welcome, as long as it works (most of the time).
Update stemming from the discussion: a solution that only utilizes functions of core R or CRAN packages (not external software) is desirable, since I will eventually want to incorporate this functionality in a package.
Edit: specified the last paragraph as per the comments.
The comment area is not long enough to fit my answer there, so I'm putting this here, although I'd rather post it as a comment as it is not a full solution.
Quite a long throw, but the first thing that popped into my mind is support vector machines. The idea would be that you construct a support vector machine classifier that classifies your points into two groups (in or out) based on the coordinates of the vertices, using some non-linear kernel function (I would try the radial basis function). Then you plot the separating hyperplane of the trained support vector machine. One drawback is that the area that you obtain this way might be unbounded (i.e. go to infinity in some directions), so this idea definitely requires some further thinking, but at least that's one possible direction to go.

How to avoid overplotting (for points) using base-graph?

I am in my way of finishing the graphs for a paper and decided (after a discussion on stats.stackoverflow), in order to transmit as much information as possible, to create the following graph that present both in the foreground the means and in the background the raw data:
However, one problem remains and that is overplotting. For example, the marked point looks like it reflects one data point, but in fact 5 data points exists with the same value at that place.
Therefore, I would like to know if there is a way to deal with overplotting in base graph using points as the function.
It would be ideal if e.g., the respective points get darker, or thicker or,...
Manually doing it is not an option (too many graphs and points like this). Furthermore, ggplot2 is also not what I want to learn to deal with this single problem (one reason is that I tend to like dual-axes what is not supprted in ggplot2).
Update: I wrote a function which automatically creates the above graphs and avoids overplotting by adding vertical or horizontal jitter (or both): check it out!
This function is now available as raw.means.plot and raw.means.plot2 in the plotrix package (on CRAN).
Standard approach is to add some noise to the data before plotting. R has a function jitter() which does exactly that. You could use it to add the necessary noise to the coordinates in your plot. eg:
X <- rep(1:10,10)
Z <- as.factor(sample(letters[1:10],100,replace=T))
plot(jitter(as.numeric(Z),factor=0.2),X,xaxt="n")
axis(1,at=1:10,labels=levels(Z))
Besides jittering, another good approach is alpha blending which you can obtain (on the graphics devices supporing it) as the fourth color parameter. I provided an example for 'overplotting' of two histograms in this SO question.
One additional idea for the general problem of showing the number of points is using a rug plot (rug function), this places small tick marks along the margin that can show how many points contribute (still use jittering or alpha blending for ties). This allows the actual points to show their true rather than jittered values, but the rug can then indicate which parts of the plot have more values.
For the example plot direct jittering or alpha blending is probably best, but in some other cases the rug plot can be useful.
You may also use sunflowerplot, while it would be hard to implement it here. I would use alpha-blending, as Dirk suggested.

Resources