varying stat_binhex() size in ggplot2 - r

I'm trying to use the stat_binhex() in ggplot2 to drop hex tiles on a plot, and the automatic settings vary the color of the bins, depending on count. That is, all the hexes are the same size, but have different colors.
I want to vary the size of the hex symbol itself! so that some are bigger than others... and i also want to vary color based on a third variable. I read through the documentation of ggplot2 and couldn't find any way to do this. The *hexbin* package has an option like this (lattice) but its plot() functions are maddening, so I was hoping to stay in ggplot2. Any other suggestions would be extremely helpful, as well.
If you know Kirk Goldsberry's NBA shot charts on Grantland, that's very similar to what I'd like to accomplish with my dataset.

Related

Dealing with Large Legends in R Plots - Complicated Heatmap Example

I'm working on a really complicated heatmap figure in R. heatmap.2 from the gplots package is not enough for me, because I want multiple sidebar annotations, such as the heatmap.3 function permits: https://www.biostars.org/p/18211/
Here's my specific plot so far:
When I have multiple sidebar annotations on this heatmap, though, they need to be properly labeled with a legend. The legend quickly becomes unwieldy and starts bleeding off the plot or into other labels on the plot, depending on where I choose to place the legend.
I've tried using the ncols option when placing the legend at the bottom of the plot but the legend contains information about several factors worth of metadata, and I want a separate column in the legend to denote each sidebar's worth of metadata. As far as I know there is no option in the legend command to permit this functionality, so I'm interested in hearing potential ways around this.
Alternatively, I am also open to the idea of simply generating a legend image with R separately if anyone knows how to do this.
Thanks!

ggplot: Pallete Greyscale On Print, Colourful on Screen [duplicate]

I've started to produce the charts for a paper. For some of them which are bar charts I've used the "Pastel1" palette (as recommended in the book on ggplot2, pastel colours are better than saturated ones for fill areas, such as bars).
The problem with Pastel1 at least is that when printed on a B&W laser printer, the colours are indistinguishable. I don't know if the readers will view the paper on screen or will print it on B&W, so I'm looking for either of the following:
how to add hash lines to a palette such as Pastel1 (hopefully the hash lines are also subtle)
a colour palette easy on the eyes that also produces distinct grey areas for B&W for, say, up to 3-4 different colours.
Granted, I could find the latter by experimenting and using toner, but perhaps this has already been solved, I suppose it's a common problem. And yes, I did google for this, but didn't find anything pertinent.
Thank you.
Use http://colorbrewer2.org/ and only show colour schemes that are printer friendly.
Also see scale_fill_grey.
Currently it's not possible to used hash lines due to a limitation in the underlying grid drawing package.
There is the col2grey function in the TeachingDemos package that will convert a set of colors to an approximation of the grey color that will result from printing. You can use this to try different pallettes without wasting toner/paper.
Use this to select another color combination (gray scale option included)

How do I use ggplot2 to manually assign hexadecimal colors to each data point in a plot or bar graph?

I have a data set, and one of the variables is a factored array with hexadecimal characters (e.g. '#00FF00'). One of the things I wanted to try doing is creating a bar plot with all of the different colors combined.
I tried using
cg<-ggplot(my.data,aes(x=factor(1),fill=as.character(my.color)))
followed by
cg+geom_bar()
but the only colors plotted seem to be ones from the default scale. I've tried omitting the as.character() part of the code, but it doesn't make a difference. I also have the same issue when making 2d plots with geom_point().
If I try something like
plot(my.data$var1,my.data$var2,col=as.character(my.color))
the colors are plotted the way I wanted them, although the graph doesn't look as nice as the ones in ggplot2.
Is there something obvious I'm missing, or is this beyond the scope of ggplot2?
You should add scale_fill_identity() to use color names as actual colors.
ggplot(my.data,aes(x=factor(1),fill=my.color)) +
geom_bar()+
scale_fill_identity()

ggplot: recommended colour palettes also distinguishable for B&W printing?

I've started to produce the charts for a paper. For some of them which are bar charts I've used the "Pastel1" palette (as recommended in the book on ggplot2, pastel colours are better than saturated ones for fill areas, such as bars).
The problem with Pastel1 at least is that when printed on a B&W laser printer, the colours are indistinguishable. I don't know if the readers will view the paper on screen or will print it on B&W, so I'm looking for either of the following:
how to add hash lines to a palette such as Pastel1 (hopefully the hash lines are also subtle)
a colour palette easy on the eyes that also produces distinct grey areas for B&W for, say, up to 3-4 different colours.
Granted, I could find the latter by experimenting and using toner, but perhaps this has already been solved, I suppose it's a common problem. And yes, I did google for this, but didn't find anything pertinent.
Thank you.
Use http://colorbrewer2.org/ and only show colour schemes that are printer friendly.
Also see scale_fill_grey.
Currently it's not possible to used hash lines due to a limitation in the underlying grid drawing package.
There is the col2grey function in the TeachingDemos package that will convert a set of colors to an approximation of the grey color that will result from printing. You can use this to try different pallettes without wasting toner/paper.
Use this to select another color combination (gray scale option included)

How to avoid overplotting (for points) using base-graph?

I am in my way of finishing the graphs for a paper and decided (after a discussion on stats.stackoverflow), in order to transmit as much information as possible, to create the following graph that present both in the foreground the means and in the background the raw data:
However, one problem remains and that is overplotting. For example, the marked point looks like it reflects one data point, but in fact 5 data points exists with the same value at that place.
Therefore, I would like to know if there is a way to deal with overplotting in base graph using points as the function.
It would be ideal if e.g., the respective points get darker, or thicker or,...
Manually doing it is not an option (too many graphs and points like this). Furthermore, ggplot2 is also not what I want to learn to deal with this single problem (one reason is that I tend to like dual-axes what is not supprted in ggplot2).
Update: I wrote a function which automatically creates the above graphs and avoids overplotting by adding vertical or horizontal jitter (or both): check it out!
This function is now available as raw.means.plot and raw.means.plot2 in the plotrix package (on CRAN).
Standard approach is to add some noise to the data before plotting. R has a function jitter() which does exactly that. You could use it to add the necessary noise to the coordinates in your plot. eg:
X <- rep(1:10,10)
Z <- as.factor(sample(letters[1:10],100,replace=T))
plot(jitter(as.numeric(Z),factor=0.2),X,xaxt="n")
axis(1,at=1:10,labels=levels(Z))
Besides jittering, another good approach is alpha blending which you can obtain (on the graphics devices supporing it) as the fourth color parameter. I provided an example for 'overplotting' of two histograms in this SO question.
One additional idea for the general problem of showing the number of points is using a rug plot (rug function), this places small tick marks along the margin that can show how many points contribute (still use jittering or alpha blending for ties). This allows the actual points to show their true rather than jittered values, but the rug can then indicate which parts of the plot have more values.
For the example plot direct jittering or alpha blending is probably best, but in some other cases the rug plot can be useful.
You may also use sunflowerplot, while it would be hard to implement it here. I would use alpha-blending, as Dirk suggested.

Resources