making an interrupted plot with ggplot2 - r

I am trying to make the following plot (which I made in R) using ggplot2. How do I go about doing this? (The plot has a finer resolution from 0 to 10, but coarser resolution from 10 onwards.) know how to do this in R, but I am not sure how to proceed with this in the case of ggplot2. Here is the figure obtained using base R.
To make things useful, note that i don't have that much interest in hanging on to the right-hand y-axis.
Also, I would like to do a similar task on a pair of barplots (have the same scale, and only one y-axis. I am hoping that that approach is similar.
I would also be open to other suggestions that would make a similar plot conveying the information similarly or better.
Many thanks for suggestions!

Related

geom_bspline across multiple plots combined into a single figure

I would like to create a ggplot2 layer that includes multiple geom_bspline(), or something similar, to point to regions on different plots after combining them into a single figure. A feature in the data seen in one plot appears in another plot after a transformation. However, it may not be clear to a non-expert they are due to the same phenomenon. The plots are to be combined into a single figure using ggarrange(), cowplot(), patchwork() or something similar.
I can get by using ggforce::geom_ellipse() on each plot but it's not as clean. Any suggestions?
Of course, after asking the question and staring at the figure in question, it came to me that I simply need to add a geom_bspline() to the combined figure. Tried that earlier but didn't give enough thought to the coordinates on the new layer. The coordinates of the spline are given in the range of 0 to 1 for both the x and y values on this new layer. Simple and obvious.

How to reproduce this graphical explanation (a scatter plot) of how covariance works?

I found this graphical intuitive explanation of covariance:
32 binormal points drawn from distributions with the given covariances, ordered from most negative (bluest) to most positive (reddest)
The whole material can be found at:
https://stats.stackexchange.com/questions/18058/how-would-you-explain-covariance-to-someone-who-understands-only-the-mean
I would like to recreate this sort of graphical illustration in R, but I'm not sufficiently familiar with R's plotting tools. I don't even know where to start in order to get those colored rectangles between each pair of data points, let alone make them semi-transparent.
I think this could make a very efficient teaching tool.
The cor.rect.plot function in the TeachingDemos package makes plots similar to what is shown. You can modify the code for the function to make the plot even more similar if you desire.

Intelligent Y Axis Scaling BarPlot R

I want to plot some data with barplot. Rather, I want to make a bar graph and barplot seemed the logical choice. I am plotting just fine but I was wondering if there is a way to intelligently scale the y axis to round up from the highest count.
For example I set the yaxis in this case to be 30, because I knew that Strand.22 had 27 counts in it: barplot(unlist(d), ylim=c(0,30), xlab="Forward Reverse", ylab="Counts")
In the future, I want this script to run on its own, so it would be optimal for the the Y-axis to choose it's own ylim. Short of pulling the information out of my 'd' variable I can't think of a good way to do this. Is there an easy way to do this with barplot? Would some other plotter work better? I have seen things about ggplots but it seemed super complex and I wasn't sure that it would do anything better.
EDIT: If I do not choose a ylim it picks automatically and this is what it decided was best.
I disagree with it's choice.
If you don't specify ylim, R will come up with something based on the data. (Sounds like you don't like it's choice, which is fair.)
If you specify something based on the data like:
barplot(unlist(d), ylim=c(0,1.1*max(unlist(d)))
R will draw you a plot that reflects the maximum value of data. That example just takes the maximum of your values and multiplies that by 1.1 (this could be any number) to give it a little extra height. R does something similar to this when you make a scatterplot but it handles barplots slightly differently.

Plotting large numbers with R, but not all numbers are being shown

I am trying to render 739455 data point on a graph using R, but on the x-axis I can not view all those numbers, is there a way I can do that?
I am new to R.
Thank you
As others suggested, try hist, hexbin, plot(density(node)), as these are standard methods for dealing with more points than pixels. (I like to set hist with the parameter breaks = "FD" - it tends to have better breakpoints than the default setting.)
Where you may find some joy is in using the iplots package, an interactive plotting package. The corresponding commands include ihist, iplot, and more. As you have a Mac, the more recent Acinonyx package may be even more fun. You can zoom in and out quite easily. I recommend starting with the iplots package as it has more documentation and a nice site.
If you have a data frame with several variables, not just node, then being able to link the different plots such that brushing points in one plot highlights them in another will make the whole process more stimulating and efficient.
That's not to say that you should ignore hexbin and the other ideas - those are still very useful. Be sure to check out the options for hexbin, e.g. ?hexbin.

How to avoid overplotting (for points) using base-graph?

I am in my way of finishing the graphs for a paper and decided (after a discussion on stats.stackoverflow), in order to transmit as much information as possible, to create the following graph that present both in the foreground the means and in the background the raw data:
However, one problem remains and that is overplotting. For example, the marked point looks like it reflects one data point, but in fact 5 data points exists with the same value at that place.
Therefore, I would like to know if there is a way to deal with overplotting in base graph using points as the function.
It would be ideal if e.g., the respective points get darker, or thicker or,...
Manually doing it is not an option (too many graphs and points like this). Furthermore, ggplot2 is also not what I want to learn to deal with this single problem (one reason is that I tend to like dual-axes what is not supprted in ggplot2).
Update: I wrote a function which automatically creates the above graphs and avoids overplotting by adding vertical or horizontal jitter (or both): check it out!
This function is now available as raw.means.plot and raw.means.plot2 in the plotrix package (on CRAN).
Standard approach is to add some noise to the data before plotting. R has a function jitter() which does exactly that. You could use it to add the necessary noise to the coordinates in your plot. eg:
X <- rep(1:10,10)
Z <- as.factor(sample(letters[1:10],100,replace=T))
plot(jitter(as.numeric(Z),factor=0.2),X,xaxt="n")
axis(1,at=1:10,labels=levels(Z))
Besides jittering, another good approach is alpha blending which you can obtain (on the graphics devices supporing it) as the fourth color parameter. I provided an example for 'overplotting' of two histograms in this SO question.
One additional idea for the general problem of showing the number of points is using a rug plot (rug function), this places small tick marks along the margin that can show how many points contribute (still use jittering or alpha blending for ties). This allows the actual points to show their true rather than jittered values, but the rug can then indicate which parts of the plot have more values.
For the example plot direct jittering or alpha blending is probably best, but in some other cases the rug plot can be useful.
You may also use sunflowerplot, while it would be hard to implement it here. I would use alpha-blending, as Dirk suggested.

Resources