I don't have much experiencie with Gadfly package in Julia language and I wish I know if there is a solution for my problem. This is my first question here.
I need to plot a graphic that values in axis x are [1.0 0.1 0.01 0.001 0.0001 0.00001 0.000001] and I only have results for these parameters for five case tests. I wish I have to plot the graphic in a good visible way to see the results. The poor result is consequence of the fact that the numbers in the interval are very close and I want to adjust the space between them.
My poor plot
Is there a way to resolve this problem?
Related
This is for viewing purposes. The actual scales are the same.
A short version of my question would be "Force grids to be square in plots".
As can be seen in the screenshot taken from the plot in Atom, while the axis are the same increments numerically, the grids are rectangular rather than square. I am not sure how to fix this. because the plot is wider than it is high, it skews the plot when I look at it.
if you do a quick estimate of the height of the Z and length of the X axis just using your fingers, you can tell that the X axis is considerably longer.
I think you want ratio=1, at least in Plots.jl:
julia> using Plots
julia> plot(rand(20,3), randn(20,3); ratio=1,
xticks=[-1,0,1], yticks=[-1,0,1], size=(600,300))
Although it might sound easy at first, I do not have a scatterplot. And I think that is what make this question challenging. I am having this plot, which comes from this question.
Summing up, each axis represents a variable that is not connected to the other. It is not an XY scatterplot, as you see.
I wonder to know if there is any possibility to trace the 95% confidence interval for the mean in both variables, and draw a square in the middle of the plot representing the overlapping area among both datasets.
The result might be something similar to this, bearing in mind that 95CL represented do not correspond to reality (just for the sake of illustrating how it might appear):
Here is a another question which deals with this situation, but not using ggplot.
I would like to draw $3$ dimensional scatter plots, or more precisely I have a program that gives me the mass distribution in the unit cube with respect to a 3 dimensional equidistant grid. You can interpret this as a continuous relaxation of a $3$ dimensional assignment problem if you want.
Anyway this is just to give you a very brief background since my actual problem is not really concerned with the maths behind the procedure but with the visualization. I have:
$n$ points in the unit cube $[0,1]^3$
each of the $n$ points is assigned a "weight" between $0$ and $\frac1n$ (typically a lot of the weights coincide, if there are too many different values, i use the cut command to reduce the range to, say $60$ different values)
And I'd like to plot the $n$ points in a color which corresponds to their weight.
Now I found the rgl Package in R which allows me to do exactly that and also provides a very nice interactive plot window but it doesn't seem to allow a "col key" parameter, i.e. I cannot add a continuous color legend to my plot.
On the other hand the package plot3D provides a function to do a $3$ dimensional scatterplot and easily allows me to add the col key. However plot3D does not work with interactive plots but merely gives me the option to specify the angle at which I want to look at the cube. In a $3$D setting I strongly prefer the interactive alternative.
Now is there a way to automatically add a continuous color legend to an rgl plot? If not, do you know why this hasn't been implemented? Or would you solve my problem completely different altogether?
P.S. sorry for the formatting, I'm new to SO and the math environment "$" doesn't seem to work here.
The reason this hasn't been implemented is because until fairly recently it wasn't easy to have a static legend and a dynamic plot in the same window.
Now it's easy; there's a legend3d() function that might do what you want, but I think you probably want a different sort of legend than it will draw. If you know how to draw what you want in 2D, you can use the bgplot3d() function to put it in the background of your plot.
Both of those options give bitmapped legends. It would also be possible to do vector-based legends, but that would be quite a bit more work.
I am in my way of finishing the graphs for a paper and decided (after a discussion on stats.stackoverflow), in order to transmit as much information as possible, to create the following graph that present both in the foreground the means and in the background the raw data:
However, one problem remains and that is overplotting. For example, the marked point looks like it reflects one data point, but in fact 5 data points exists with the same value at that place.
Therefore, I would like to know if there is a way to deal with overplotting in base graph using points as the function.
It would be ideal if e.g., the respective points get darker, or thicker or,...
Manually doing it is not an option (too many graphs and points like this). Furthermore, ggplot2 is also not what I want to learn to deal with this single problem (one reason is that I tend to like dual-axes what is not supprted in ggplot2).
Update: I wrote a function which automatically creates the above graphs and avoids overplotting by adding vertical or horizontal jitter (or both): check it out!
This function is now available as raw.means.plot and raw.means.plot2 in the plotrix package (on CRAN).
Standard approach is to add some noise to the data before plotting. R has a function jitter() which does exactly that. You could use it to add the necessary noise to the coordinates in your plot. eg:
X <- rep(1:10,10)
Z <- as.factor(sample(letters[1:10],100,replace=T))
plot(jitter(as.numeric(Z),factor=0.2),X,xaxt="n")
axis(1,at=1:10,labels=levels(Z))
Besides jittering, another good approach is alpha blending which you can obtain (on the graphics devices supporing it) as the fourth color parameter. I provided an example for 'overplotting' of two histograms in this SO question.
One additional idea for the general problem of showing the number of points is using a rug plot (rug function), this places small tick marks along the margin that can show how many points contribute (still use jittering or alpha blending for ties). This allows the actual points to show their true rather than jittered values, but the rug can then indicate which parts of the plot have more values.
For the example plot direct jittering or alpha blending is probably best, but in some other cases the rug plot can be useful.
You may also use sunflowerplot, while it would be hard to implement it here. I would use alpha-blending, as Dirk suggested.
Background:
I'm running a Monte Carlo simulation to show that a particular process (a cumulative mean) does not converge over time, and often diverges wildly in simulation (the expectation of the random variable = infinity). I want to plot about 10 of these simulations on a line chart, where the x axis has the iteration number, and the y axis has the cumulative mean up to that point.
Here's my problem:
I'll run the first simulation (each sim. having 10,000 iterations), and build the main plot based on its current range. But often one of the simulations will have a range a few orders of magnitude large than the first one, so the plot flies outside of the original range. So, is there any way to dynamically update the ylim or xlim of a plot upon adding a new set of points or lines?
I can think of two workarounds for this: 1. store each simulation, then pick the one with the largest range, and build the base graph off of that (not elegant, and I'd have to store a lot of data in memory, but would probably be laptop-friendly [[EDIT: as Marek points out, this is not a memory-intense example, but if you know of a nice solution that'd support far more iterations such that it becomes an issue (think high dimensional walks that require much, much larger MC samples for convergence) then jump right in]]) 2. find a seed that appears to build a nice looking version of it, and set the ylim manually, which would make the demonstration reproducible.
Naturally I'm holding out for something more elegant than my workarounds. Hoping this isn't too pedestrian a problem, since I imagine it's not uncommon with simulations in R. Any ideas?
I'm not sure if this is possible using base graphics, if someone has a solution I'd love to see it. However graphics systems based on grid (lattice and ggplot2) allow the graphics object to be saved and updated. It's insanely easy in ggplot2.
require(ggplot2)
make some data and get the range:
foo <- as.data.frame(cbind(data=rnorm(100), numb=seq_len(100)))
make an initial ggplot object and plot it:
p <- ggplot(as.data.frame(foo), aes(numb, data)) + layer(geom='line')
p
make some more data and add it to the plot
foo <- as.data.frame(cbind(data=rnorm(200), numb=seq_len(200)))
p <- p + geom_line(aes(numb, data, colour="red"), data=as.data.frame(foo))
plot the new object
p
I think (1) is the best option. I actually don't think this isn't elegant. I think it would be more computationally intensive to redraw every time you hit a point greater than xlim or ylim.
Also, I saw in Peter Hoff's book about Bayesian statistics a cool use of ts() instead of lines() for cumulative sums/means. It looks pretty spiffy: