R - Scatter plots, how to plot points in differnt lines to overlapping? - r

I want to plot several lists of points, each list has distance (decimal) and error_no (1-8). So far I am using the following:
plot(b1$dist1, b1$e1, col="blue",type="p", pch=20, cex=.5)
points(b1$dist2, b1$e2, col="blue", pch=22)
to add them both to the same plot. (I will add legends, etc later on).
The problem I have is that points overlap, and even when changing the character using for plotting, it covers up previous points. Since I am planning on plotting a lot more than just 2 this will be a big problem.
I found some ways in:
http://www.rensenieuwenhuis.nl/r-sessions-13-overlapping-data-points/
But I would rather do something that would space the points along the y axis, one way would be to add .1, then .2, and so on, but I was wondering if there was any package to do that for me.
Cheers
M
ps: if I missed something, please let me know.

As noted in the very first point in the link you posted, jitter will slightly move all your points. If you just want to move the points on the y-axis:
plot(b1$dist1, b1$e1, col="blue",type="p", pch=20, cex=.5)
points(b1$dist2, jitter(b1$e2), col="blue", pch=22)

Depends a lot on what information you wish to impart to the reader of your chart. A common solution is to use the transparency quality of R's color specification. Instead of calling a color "blue" for example, set the color to #0000FF44 (Apologies if I just set it to red or green) The final two bytes define the transparency, from 00 to FF, so overlapping data points will appear darker than standalone points.

Look at the spread.labs function in the TeachingDemos package, particularly the example. It may be that you can use that function to create your plot (the examples deal with labels, but could just as easily be applied to the points themselves). The key is that you will need to find the new locations based on the combined data, then plot. If the function as is does not do what you want, you could still look at the code and use the ideas to spread out your points.
Another approach would be to restructure your data and use the ggplot2 package with "dodging". Other approaches rather than using points several times would be the matplot function, using the col argument to plot with a vector, or lattice or ggplot2 plots. You will probably need to restructure the data for any of these.

Related

Why is there no col key for R's rgl?

I would like to draw $3$ dimensional scatter plots, or more precisely I have a program that gives me the mass distribution in the unit cube with respect to a 3 dimensional equidistant grid. You can interpret this as a continuous relaxation of a $3$ dimensional assignment problem if you want.
Anyway this is just to give you a very brief background since my actual problem is not really concerned with the maths behind the procedure but with the visualization. I have:
$n$ points in the unit cube $[0,1]^3$
each of the $n$ points is assigned a "weight" between $0$ and $\frac1n$ (typically a lot of the weights coincide, if there are too many different values, i use the cut command to reduce the range to, say $60$ different values)
And I'd like to plot the $n$ points in a color which corresponds to their weight.
Now I found the rgl Package in R which allows me to do exactly that and also provides a very nice interactive plot window but it doesn't seem to allow a "col key" parameter, i.e. I cannot add a continuous color legend to my plot.
On the other hand the package plot3D provides a function to do a $3$ dimensional scatterplot and easily allows me to add the col key. However plot3D does not work with interactive plots but merely gives me the option to specify the angle at which I want to look at the cube. In a $3$D setting I strongly prefer the interactive alternative.
Now is there a way to automatically add a continuous color legend to an rgl plot? If not, do you know why this hasn't been implemented? Or would you solve my problem completely different altogether?
P.S. sorry for the formatting, I'm new to SO and the math environment "$" doesn't seem to work here.
The reason this hasn't been implemented is because until fairly recently it wasn't easy to have a static legend and a dynamic plot in the same window.
Now it's easy; there's a legend3d() function that might do what you want, but I think you probably want a different sort of legend than it will draw. If you know how to draw what you want in 2D, you can use the bgplot3d() function to put it in the background of your plot.
Both of those options give bitmapped legends. It would also be possible to do vector-based legends, but that would be quite a bit more work.

cluster: :clusplot axis in wrong direction

I'm trying to plot the cluster obtained from fuzzy c-means clustering.
The plot should look like this.
code for the plot
plot(data$Longitude, data$Latitude, main="Fuzzy C-Means",col=data$Revised, pch=16, cex=.6,
xlab="Longitude",ylab="Latitude")
library(maps)
map("state", add=T)
However, when I tried to use clusplot the plot is displaying in opposite direction(both top and bottom and left and right) as below.
I wanna know if there's a way to reverse the plot to show in the order as the above picture.
Also, for the very dense area, it's hard to find the ellipse label. I wanna know if there's a way to show the label inside the ellipse instead of outside.
code for 2nd pic
library(cluster)
clusplot(cbind(Geocode$Longitude, Geocode$Latitude), cluster, color=TRUE,shade=TRUE,
labels=4, lines=0,col.p=cluster,
xlab="Longitude",ylab="Latitude",cex=1)
clusplot is a function that performs a lot of magic for you. In particular it projects the data set - which happens in a way you don't like, unfortunately. (Also note the scales - it centered and scaled the data, too)
clusplot.default: Creates a bivariate plot visualizing a partition (clustering) of the data. All observation are represented by points in the plot, using principal components or multidimensional scaling.
As far as I can tell, clusplot doesn't have map support, but you will want such a map I guess...
While maybe you can use the s.x.2d parameter to specify the exact projection (and this way disable automatic scaling), it probably is still difficult to add the map. Maybe look at the source of clusplot instead, and take only the parts you want?

Intelligent Y Axis Scaling BarPlot R

I want to plot some data with barplot. Rather, I want to make a bar graph and barplot seemed the logical choice. I am plotting just fine but I was wondering if there is a way to intelligently scale the y axis to round up from the highest count.
For example I set the yaxis in this case to be 30, because I knew that Strand.22 had 27 counts in it: barplot(unlist(d), ylim=c(0,30), xlab="Forward Reverse", ylab="Counts")
In the future, I want this script to run on its own, so it would be optimal for the the Y-axis to choose it's own ylim. Short of pulling the information out of my 'd' variable I can't think of a good way to do this. Is there an easy way to do this with barplot? Would some other plotter work better? I have seen things about ggplots but it seemed super complex and I wasn't sure that it would do anything better.
EDIT: If I do not choose a ylim it picks automatically and this is what it decided was best.
I disagree with it's choice.
If you don't specify ylim, R will come up with something based on the data. (Sounds like you don't like it's choice, which is fair.)
If you specify something based on the data like:
barplot(unlist(d), ylim=c(0,1.1*max(unlist(d)))
R will draw you a plot that reflects the maximum value of data. That example just takes the maximum of your values and multiplies that by 1.1 (this could be any number) to give it a little extra height. R does something similar to this when you make a scatterplot but it handles barplots slightly differently.

equivalent to MatLab "bar" function in R?

Is there an function in R that does the same job as Matlab's "bar" function?
R does have a "barplot" function in the library graphics, however, it is not the same.
The Matlab bar(X,Y) (verbatim excerpt from MATLAB documentation) "draws a bar for each element in Y at locations specified in X, where X is a vector defining the x-axis intervals for the vertical bars." (emphasis mine)
However, the R barplot function does not allow one to specify locations.
Perhaps there is a method in ggplot2 that supports this? I am only able to find standard bar charts in ggplot2.
No, barplot is not the same as bar, but you should read the whole help. You can do many things to position the bars. The first is simply their order in Y. You could insert spaces if you wish (additional 0s). If you have X and Y then sort Y on X (Y[order(X)]) and plot it. If you need to change positions use the "space" and "width" arguments. It's not as straightforward as specifying X values I suppose but it's definitely more useful in most situations. Generally what you want to adjust is widths of bars and spaces between bars. Their position on the X-axis should be arbitrary. If the position on the X-axis is really meaningful then you should be using line plots, not bar graphs.
In R:
barplot(rbind(1:10, 2:11), beside=T, names.arg=1:10)
In MATLAB:
>> bar(1:10, [(1:10)' (2:11)'])
Read up on par . Then observe, for example:
x<-c(1,2,4,5,6)
y<-c(3,4,3,4,2)
plot(x,y,type='h',lwd=6)
Edit: yes, I know this doesn't (yet) plot multiple data sets, but I would hope you can see simple ways to make that happen, with spacings, colors, etc. specified to your exact liking :-)
Sounds vaguely like the R stepfun. On the other hand one would need to know what "draws a bar" means before saying it is not the same as barplot(..., horiz=TRUE) One would, of course, need to examine some more detailed evidence such as data and plots before arriving at a conclusion, however. #John Colby should be congratulated for adding some specificity to the discussion. The axis function is probably what Quant Guy needs education regarding.

How to avoid overplotting (for points) using base-graph?

I am in my way of finishing the graphs for a paper and decided (after a discussion on stats.stackoverflow), in order to transmit as much information as possible, to create the following graph that present both in the foreground the means and in the background the raw data:
However, one problem remains and that is overplotting. For example, the marked point looks like it reflects one data point, but in fact 5 data points exists with the same value at that place.
Therefore, I would like to know if there is a way to deal with overplotting in base graph using points as the function.
It would be ideal if e.g., the respective points get darker, or thicker or,...
Manually doing it is not an option (too many graphs and points like this). Furthermore, ggplot2 is also not what I want to learn to deal with this single problem (one reason is that I tend to like dual-axes what is not supprted in ggplot2).
Update: I wrote a function which automatically creates the above graphs and avoids overplotting by adding vertical or horizontal jitter (or both): check it out!
This function is now available as raw.means.plot and raw.means.plot2 in the plotrix package (on CRAN).
Standard approach is to add some noise to the data before plotting. R has a function jitter() which does exactly that. You could use it to add the necessary noise to the coordinates in your plot. eg:
X <- rep(1:10,10)
Z <- as.factor(sample(letters[1:10],100,replace=T))
plot(jitter(as.numeric(Z),factor=0.2),X,xaxt="n")
axis(1,at=1:10,labels=levels(Z))
Besides jittering, another good approach is alpha blending which you can obtain (on the graphics devices supporing it) as the fourth color parameter. I provided an example for 'overplotting' of two histograms in this SO question.
One additional idea for the general problem of showing the number of points is using a rug plot (rug function), this places small tick marks along the margin that can show how many points contribute (still use jittering or alpha blending for ties). This allows the actual points to show their true rather than jittered values, but the rug can then indicate which parts of the plot have more values.
For the example plot direct jittering or alpha blending is probably best, but in some other cases the rug plot can be useful.
You may also use sunflowerplot, while it would be hard to implement it here. I would use alpha-blending, as Dirk suggested.

Resources