R histogram - too many variables - r

I am trying to illustrate a histogram of 33 different variables. Due to the number of variables I think "beside" different Colors I need to label each bar in a clear way, even using an arrow, if its doable.
I was wondering about
1) How can I define 33 distinct color in R
2) How can I label them, say vertical below X axis with a certain distance from each other to make my figure more clear.
I am using multhist function from Plotrix package, and for data you can image just 33 random vector with different length !
Thanks

As Chris mentioned, trying to distinguish 33 colours doesn't work for humans. You need to find a different plot type that doesn't rely on only colour.
Without a reproducible example, it is not possible to say what this plot should be, but here's some generic colour advice.
Use HCL colours rather the RGB or HSV. Read Escaping RGBland by Achim Zeileis for an explanation. There are some useful functions for generating palettes in the colorspace package.
If your variables are unordered categories (i.e., encoded as factors) then your colours should have different hues. (Use rainbow_hcl.)
If your variables are in some sort of order (ranges or ordered factors) then your colours should have different lightness or chroma. (Use sequential_hcl.) A variation on this is if they differ about some midpoint, in which case you need diverge_hcl.

You can define colors in R in any number of ways; try ?rainbow or ?greyscale for some suggestions
You could also look at all the colors here and just create a vector of your desired colors that you call inside your plot function.
Your problem though is that the human eye and the printing process has trouble distinguishing and reproducing that many distinct colors. See the documentation at the colorbrewer site for more information (and advice on picking colors).
Not sure I understand what your trying to do with the labels, but you can re-label an axis with a call to axis. See the documentation in ?axis.

Related

Why is there no col key for R's rgl?

I would like to draw $3$ dimensional scatter plots, or more precisely I have a program that gives me the mass distribution in the unit cube with respect to a 3 dimensional equidistant grid. You can interpret this as a continuous relaxation of a $3$ dimensional assignment problem if you want.
Anyway this is just to give you a very brief background since my actual problem is not really concerned with the maths behind the procedure but with the visualization. I have:
$n$ points in the unit cube $[0,1]^3$
each of the $n$ points is assigned a "weight" between $0$ and $\frac1n$ (typically a lot of the weights coincide, if there are too many different values, i use the cut command to reduce the range to, say $60$ different values)
And I'd like to plot the $n$ points in a color which corresponds to their weight.
Now I found the rgl Package in R which allows me to do exactly that and also provides a very nice interactive plot window but it doesn't seem to allow a "col key" parameter, i.e. I cannot add a continuous color legend to my plot.
On the other hand the package plot3D provides a function to do a $3$ dimensional scatterplot and easily allows me to add the col key. However plot3D does not work with interactive plots but merely gives me the option to specify the angle at which I want to look at the cube. In a $3$D setting I strongly prefer the interactive alternative.
Now is there a way to automatically add a continuous color legend to an rgl plot? If not, do you know why this hasn't been implemented? Or would you solve my problem completely different altogether?
P.S. sorry for the formatting, I'm new to SO and the math environment "$" doesn't seem to work here.
The reason this hasn't been implemented is because until fairly recently it wasn't easy to have a static legend and a dynamic plot in the same window.
Now it's easy; there's a legend3d() function that might do what you want, but I think you probably want a different sort of legend than it will draw. If you know how to draw what you want in 2D, you can use the bgplot3d() function to put it in the background of your plot.
Both of those options give bitmapped legends. It would also be possible to do vector-based legends, but that would be quite a bit more work.

R: How to automatically set the color of different groups in survival plot

I am plotting the survival probability for my dataframe with 8 different groups with this command:
fit2<-Surv((time=t2$uptimeDay,event=t2$solved,type='right')~t2$cluster)
plot(fit2,conf.int=F,xlim=c(0, 250),mark.time=c(1,50,100,200),mark=c(1,3,4,2,5,7,6,8,9,10),lwd=1,cex=0.7,lty = 1:11,xlab='Time(days)',ylab='Survival Probability')
the cluster here is a number between 1 and 10.
I would like to know how to automatically set the colors of the curves together with an automatic legend using key of the curves.
Can somebody help me out with this?
I have a function that I use for Kaplan-Meier curves that is based on ggplot2, which will take care of the colors and legends for you. Regrettably, I've not gotten around to packaging it up in any sensible way. But you can download the source code from
https://gist.github.com/nutterb/004ade595ec6932a0c29
And some examples on how to use it from
https://gist.github.com/nutterb/fb19644cc18c4e64d12a
It's not clear what you mean by making this "automatic" and the desire to "use the key of the curves", but perhaps you are asking that the colors of the curves match the legend.
png()
mycols=c("red","blue")
plot(prio.fit, fill=mycols)
legend(x="bottomleft", col=mycols, legend=mycols)
dev.off()
If you want this mated to a dataset and wanted to specify particular colors for your groups, then you will need to provide a dataset so there is something meaningful to use as labels, and be more specific about the coloring schema needed.

Are there good predefined color sequences for different data in one plot?

A while ago, I asked How to change Lattice graphics default groups colors?, and got a helpful response from BenBarnes. This allowed me to define more than 7 cycling colors for different data in the same plot in R's Lattice package, which I did. However, I found that it's difficult to define more than 9, maybe 10 colors are not (a) hard to see on a white background, or (b) include pairs of colors that look very similar. (That might be why seven colors is Lattice's default, obviously.) It occurs to me, though, that there are people out there who are much better at managing colors in information display than I am, and that maybe someone had already defined a good list of 10, 12, maybe even 15 colors for display of data in the same plot. Anybody know of such a list? Any color specification that I can convert into a Lattice format would work. If it's already been done in Lattice, even better! (Is there a better place to ask this question??)
There's a large body of work on choosing colors. Check out the RColorBrewer and colorspace packages as a starting point. In the documentation for colorspace there is a link to an excellent paper (and the vignette summarizes much of the paper). And think about your color blind colleagues, with dichromat.
In general, I think it is very difficult to pick a large set of colors that don't end up being hard to distinguish from one another. When I am looking for a large number (>8) of colors that I want to be noticeably distinct and aesthetically pleasing, I usually use the rich.colors palette in the gplots package. I find it more useful than the similar rainbow palette, because the colors don't wrap around on each other.

Chernoff faces extended in R

The aplpack library contains the possibility to plot beautiful Chernoff faces with faces. symbols and TeachingDemos also offer the possibility to plot variations of these faces. But none of them allows to plot more than 15 dimensions (symbols allows two more dimensions for colours, but they are defined in an inconvenient way so that some faces turn out to be completely black, hiding other parts of the face). Is there a way in R (perhaps with another library) to plot more dimensions, e.g. by adding a body with limbs of different lengths or by using colours to visualise some of the dimensions? Maybe I've overseen something and the colours in aplpack can be mapped to variables as well?
The TeachingDemos package also has the ms.face function that works with the my.symbols function to create a scatterplot with the Chernoff Face as the symbol. This gives the original 15 values of the face, plus an x-coordinate and a y-coordinate; with my.symbols you can also specify a color (for the overall face, not individual features) and an overall size based on variables. That gives 19 dimensions, you could also vary the line width and style, but that will probably distort the plot more than help.
With that many dimensions I would probably go more for the star plots (symbols function) with the variables ordered based on a clustering procedure, or use some type of dimension reduction tool (principal components, grand tour, etc.)

How to avoid overplotting (for points) using base-graph?

I am in my way of finishing the graphs for a paper and decided (after a discussion on stats.stackoverflow), in order to transmit as much information as possible, to create the following graph that present both in the foreground the means and in the background the raw data:
However, one problem remains and that is overplotting. For example, the marked point looks like it reflects one data point, but in fact 5 data points exists with the same value at that place.
Therefore, I would like to know if there is a way to deal with overplotting in base graph using points as the function.
It would be ideal if e.g., the respective points get darker, or thicker or,...
Manually doing it is not an option (too many graphs and points like this). Furthermore, ggplot2 is also not what I want to learn to deal with this single problem (one reason is that I tend to like dual-axes what is not supprted in ggplot2).
Update: I wrote a function which automatically creates the above graphs and avoids overplotting by adding vertical or horizontal jitter (or both): check it out!
This function is now available as raw.means.plot and raw.means.plot2 in the plotrix package (on CRAN).
Standard approach is to add some noise to the data before plotting. R has a function jitter() which does exactly that. You could use it to add the necessary noise to the coordinates in your plot. eg:
X <- rep(1:10,10)
Z <- as.factor(sample(letters[1:10],100,replace=T))
plot(jitter(as.numeric(Z),factor=0.2),X,xaxt="n")
axis(1,at=1:10,labels=levels(Z))
Besides jittering, another good approach is alpha blending which you can obtain (on the graphics devices supporing it) as the fourth color parameter. I provided an example for 'overplotting' of two histograms in this SO question.
One additional idea for the general problem of showing the number of points is using a rug plot (rug function), this places small tick marks along the margin that can show how many points contribute (still use jittering or alpha blending for ties). This allows the actual points to show their true rather than jittered values, but the rug can then indicate which parts of the plot have more values.
For the example plot direct jittering or alpha blending is probably best, but in some other cases the rug plot can be useful.
You may also use sunflowerplot, while it would be hard to implement it here. I would use alpha-blending, as Dirk suggested.

Resources