Split density plot in R ggplot2 - r

How can I do a density plot that is split across variables like the ones done in Figure 1 in this paper ? https://www.pnas.org/content/117/24/13386
Obivously, doing geom_density on filtered data then changes the value of the density towards the border, because it assumes zeroes right after 1.96 if coming from the left.
What I'd need is to compute the whole density, but only plot half of it

Related

Swap Axes of Violin plot

I am trying to create Violin Plots using the StatsPlots.jl library.
However, I would like to have the returned Violin plot to be horizontal instead of vertical as I want to show the distribution of a variable (e.g. Temperature) for different heights, eg. at 1000m, 2000m, 3000m ...
So it would be nice if the height was at the y-Axis while the temperature distribution was on the x-Axis.
Is there a way to swap the axes of a Plots.Plot struct, or is there an argument I could pass to violin() that does the trick?

How to count the number of points below a contour line in R

I've created a heatmap in R based on simulations and plotted it using image.plot() and I have added contour lines by using contour(). I also have a data frame that contains a column for observations in first year and trend size that I have plotted on top of the heatmap using base R plot. Is there an easy way to count the number of points below the 0.5 contour line and about the 0.95 contour line?
Assuming you are plotting a variable called z, you can use something like.
table(z>.95)

R + ggplot2, multiple histograms in the same plot with each histogram normalised to unit area?

Sorry for the newbie R question...
I have a data.frame that contains measurements of a single variable. These measurements will be distributed differently depending on whether the thing being measured is of type A or type B; that is, you can imagine that my column names are: measurement, type label (A or B). I want to plot the histograms of the measurements for A and B separately, and put the two histograms in the same plot, with each histogram normalised to unit area (this is because I expect the proportions of A and B to differ significantly). By unit area, I mean that A and B each have unit area, not that A+B have unit area. Basically, I want something like geom_density, but I don't want a smoothed distributions for each; I want the histogram bars. Not interleaved, but plotted one on top of the other. Not stacked, although it would be interesting to know how to do this also. (The purpose of this plot is to explore differences in the shapes of the distributions that would indicate that there are quantitative differences between A and B that could be used to distinguish between them.) That's all. Two or more histograms -- not smoothed density plots -- in the same plot with each normalised to unit area. Thanks!
Something like this?
# generate example
set.seed(1)
df <- data.frame(Type=c(rep("A",1000),rep("B",4000)),
Value=c(rnorm(1000,mean=25,sd=10),rchisq(4000,15)))
# you start here...
library(ggplot2)
ggplot(df, aes(x=Value))+
geom_histogram(aes(y=..density..,fill=Type),color="grey80")+
facet_grid(Type~.)
Note that there are 4 times as many samples of type B.
You can also set the y-axis scales to float using: scales="free_y" in the call to facet_grid(...).

Plotting means in R

I want to fit my vectors x ,y to some kind of curve, but they're both about 10k long with x-values very closely packed, so a scatter plot just ends up as a huge mess. What I'd like to do is to plot the AVERAGE of the y-values corresponding to one x-value.
For example:
y=rnorm(1000)
x=c(rep(1,500),rep(2,500))
plot(x,y)
I'd like this plot to only have two single points, one for x=1 and one for x=2. Any ideas?
plot(unique(x),tapply(y,x,mean))
or maybe even
plot(tapply(x,x,unique),tapply(y,x,mean))

Axis-labeling in R histogram and density plots; multiple overlays of density plots

I have two related problems.
Problem 1: I'm currently using the code below to generate a histogram overlayed with a density plot:
hist(x,prob=T,col="gray")
axis(side=1, at=seq(0,100, 20), labels=seq(0,100,20))
lines(density(x))
I've pasted the data (i.e. x above) here.
I have two issues with the code as it stands:
the last tick and label (100) of the x-axis does not appear on the histogram/plot. How can I put these on?
I'd like the y-axis to be of count or frequency rather than density, but I'd like to retain the density plot as an overlay on the histogram. How can I do this?
Problem 2: using a similar solution to problem 1, I now want to overlay three density plots (not histograms), again with frequency on the y-axis instead of density. The three data sets are at:
http://pastebin.com/z5X7yTLS
http://pastebin.com/Qg8mHg6D
http://pastebin.com/aqfC42fL
Here's your first 2 questions:
myhist <- hist(x,prob=FALSE,col="gray",xlim=c(0,100))
dens <- density(x)
axis(side=1, at=seq(0,100, 20), labels=seq(0,100,20))
lines(dens$x,dens$y*(1/sum(myhist$density))*length(x))
The histogram has a bin width of 5, which is also equal to 1/sum(myhist$density), whereas the density(x)$x are in small jumps, around .2 in your case (512 even steps). sum(density(x)$y) is some strange number definitely not 1, but that is because it goes in small steps, when divided by the x interval it is approximately 1: sum(density(x)$y)/(1/diff(density(x)$x)[1]) . You don't need to do this later because it's already matched up with its own odd x values. Scale 1) for the bin width of hist() and 2) for the frequency of x length(x), as DWin says. The last axis tick became visible after setting the xlim argument.
To do your problem 2, set up a plot with the correct dimensions (xlim and ylim), with type = "n", then draw 3 lines for the densities, scaled using something similar to the density line above. Think however about whether you want those semi continuous lines to reflect the heights of imaginary bars with bin width 5... You see how that might make the density lines exaggerate the counts at any particular point?
Although this is an aged thread, if anyone catches this. I would only think it is a 'good idea' to forego translating the y density to count scales based on what the user is attempting to do.
There are perfectly good reasons for using frequency as the y value. One idea in particular that comes to mind is that using counts for the y scale value can give an analyst a good idea about where to begin the 'data hunt' for stratifying heterogenous data, if a mixed distribution model cannot soundly or intuitively be applied.
In practice, overlaying a density estimate over the observed histogram can be very useful in data quality checks. For example, in the above, if I were looking at the above graphic as a single source of data with the assumption that it describes "1 thing" and I wish to model this as "1 thing", I have an issue. That is, I have heterogeneous data which may require some level of stratification. The density overlay then becomes a simple visual tool for detecting heterogeneity (apart from using log transformations to smooth between-interval variation), and a direction (locations of the mixed distributions) for stratifying the data.

Resources