Fill area under a curve that does not overlap any other curves - r

This questions somewhat relates to a question I previously posted. However, I have narrowed down exactly what I am trying to do and I feel this question is different enough from my previous question to warrant a new post.
I am adding multiple (>50) curves to a plot in R. Each curve has a corresponding probability (0-1). I have sorted the curves by probability and would like to shade the area under each curve with a transparency alpha weighted by probability.
I am adding the plots in a descending sequence by probability. I would like to shade just the portion under each curve that is not covered by any curves currently on the graph.
I have read many posts on shading areas between curves, or under curves, but I cannot figure out how to shade just the area not covered by any other plot on the graph. I hope this is not considered a duplicate.
Shaded area under two curves using R
Shading a kernel density
plot between two points.
How to make gradient color filled
timeseries plot in R
Shading between curves in R
Here is an example picture (marked up in MS paint) of what I would like a final plot to look like (except without the lines inside the polygons). I used four curves in this example, but I will be adding many more when I figure this out. I added the curve with the highest response first, then each subsequent curve, shading just the portion not already filled.
In the above example I used lines to add the curves to the graph and then shaded them in MS paint. I understand to fill in the area under each curve I will need to use polygon with border=NA .Here is an example of how I am planning on using polygon to shade based on the response value. My current approach is to adjust the color using alpha, but if there is a more practical approach using a gray scale pallet or gradient I am open to suggestions.
polygon(x, y1,col=rgb(0,0,0,alpha=(1-wei.param[1,3])), border=NA )
I have tried several different approaches (based on the above hyperlinks) to specify the dimensions of each polygon. I can get it to work for polygons 1-3, but after that they start stacking on top of each other.
Here are example data and code to reproduce the plots.
diameters<-c(rep(1.5,393),3,3,3,3,3.1,3.1,3.1,3.2,3.2,3.2,3.3,3.4,3.4,3.4,3.4,3.4,
3.4,3.4,3.4,3.5,3.5,3.6,3.6,3.7,3.7,3.7,3.7,3.8,3.8,3.8,3.8,3.8,3.8,
3.9,3.9,4,4,4,4.1,4.2,4.2,4.2,4.2,4.3,4.3,4.4,4.49,4.5,4.5,4.6,4.7,
4.7,4.7,4.8,4.9,4.9,4.9,5,5,5,5,5.1,5.1,5.2,5.3,5.4,5.4,5.6,5.7,5.7,
5.7,5.8,6,6,6,6.3,6.4,6.6,6.9,6.9,6.9,7,7.1,7.2,7.4,7.4,7.7,7.8,7.9,
7.9,8.2,8.5,8.5,8.9,9.2,10.2,10.47,10.5,10.7,11.7,13.2,13.5,14.4,14.5,
14.5,15.1,18.4)
wei.param<-matrix(data=NA,nrow=5,ncol=3,dimnames = list(c(),c("shape", "scale", "prob")))
wei.param[,1]<-c(1.834682,2.720390,3.073429,1.9,1.9)
wei.param[,2]<-c(2.78,2.78,2.78,1.6,2.8710692)
wei.param[,3]<-c(0.49, 0.46, 0.26, 0.26, 0.07)
x=seq(0,20,1)
y1<-dweibull(x,shape=wei.param[1,1],scale=wei.param[1,2])
y2<-dweibull(x,shape=wei.param[2,1],scale=wei.param[2,2])
y3<-dweibull(x,shape=wei.param[3,1],scale=wei.param[3,2])
y4<-dweibull(x,shape=wei.param[4,1],scale=wei.param[4,2])
#Plot
hist(diameters,freq=F,main='',ylim=c(0,.5))
polygon(x, y1,col=rgb(0,0,0,alpha=(1-wei.param[1,3])), border=NA )
lines(x, y1)
lines(x, y2)
lines(x, y3)
lines(x, y4)

I think this what you want:
I don't know how to do this with base R graphics, but here's the code for ggplot2, which I know better. Note that ggplot2 requires data to be input as a data.frame. Also, I created a second probability column so that I could group the polygons with ggplot2.
df <- data.frame(x = rep(x, 4), y = c(y1, y2, y3, y4),
Prob = c(
rep(wei.param[1,3], length(y1)),
rep(wei.param[2,3], length(y2)),
rep(wei.param[2,3], length(y2)),
rep(wei.param[4,3], length(y4))))
df$Prob2 = as.factor(df$Prob)
library(scales) # needed for alpha function with ggplot2
library(ggplot2)
example <- ggplot() +
geom_histogram(aes(x = diameters, y = ..density..),
prob = TRUE, fill = alpha('white', 0), color = 'black') +
geom_polygon(data = df, aes( x = x, y = y), color = 'white',
fill = 'white') +
geom_polygon(data = df, aes( x = x, y = y, alpha = Prob,
group = Prob2)) +
geom_polygon() + theme_bw()
ggsave('example.jpg', example, width = 6, height = 4)
You should be able to do a similar trick with base R. All you need to do is plot the white polygons over your histogram, but under your shaded polygons. If you decide to use my ggplot2 code you'll probably want to tweak bin width (see ?geom_histogram for details about how to do this).

Related

move ggplot2 contour from other facets to main

I have x,y,z data with categorical variables that facilitate a facet. I want to include contour lines from all but the first facet and discard the rest of the data. One way to visualize the process is to facet the data and mentally move the contours from the other facets to the first.
MWE:
library(ggplot2)
library(dplyr)
data(volcano)
nx <- 87; ny <- 61
vdat <- data_frame(w=0L, x=rep(seq_len(nx), ny), y=rep(seq_len(ny), each=nx), z=c(volcano))
vdat <- bind_rows(vdat,
mutate(vdat, w=1L, x=x+4, y=y+4, z=z-20),
mutate(vdat, w=2L, x=x+8, y=y+8, z=z-40))
ggplot(vdat, aes(x, y, fill=z)) +
geom_tile() +
facet_wrap(~ w, nrow=1) +
geom_contour(aes(z=z), color='white', breaks=c(-Inf,110,Inf))
In each facet, I have:
facet 0: X,Y,Z for w==0L, contour for w==0L
facet 1: X,Y,Z for w==1L, contour for w==1L
facet 2: X,Y,Z for w==2L, contour for w==2L
What I'd like to have is a single pane, effectively:
X,Y,Z for w==0L, contour for all values of the w categorical
(Forgive my hasty GIMP skills. In the real data, the contours will likely not overlap, but I don't think that that would be a problem.)
The real data has different values (and gradients) of z for the same X,Y system, so the contour is otherwise compatible with the first facet. However, it's still "different", so I cannot mock-up the contours with the single w==0L data.
I imagine there might be a few ways to do this:
form the data "right" the first time, informing ggplot how to pull the contours but lay them on the single plot (e.g., using different data= for certain layers);
form the faceted plot, extract the contours from the other facets, apply them to the first, and discard the other facets (perhaps using grid and/or gtable); or perhaps
(mathematically calculate the contours myself and add them as independent lines; I was hoping to re-use ggplot2's efforts to avoid this ...).
It doesn't fit so neatly with the grammar of graphics, but you can just add a geom_contour call for each subset of data. A quick way is to add a list of such calls to the graph, which you can generate quickly by lapplying across the split data:
ggplot(vdat[vdat$w == 0, ], aes(x, y, z = z, fill = z)) +
geom_tile() +
lapply(split(vdat, vdat$w), function(dat){
geom_contour(data = dat, color = 'white', breaks = c(-Inf, 110, Inf))
})
You can even make a legend, if you need:
ggplot(vdat[vdat$w == 0, ], aes(x, y, z = z, fill = z, color = factor(w))) +
geom_raster() +
lapply(split(vdat, vdat$w), function(dat){
geom_contour(data = dat, breaks = c(-Inf, 110, Inf))
})

R: Density plot with colors by group?

I have data from 2 populations.
I'd like to get the histogram and density plot of both on the same graphic.
With one color for one population and another color for the other one.
I've tried this (example):
library(ggplot2)
AA <- rnorm(100000, 70,20)
BB <- rnorm(100000,120,20)
valores <- c(AA,BB)
grupo <- c(rep("AA", 100000),c(rep("BB", 100000)))
todo <- data.frame(valores, grupo)
ggplot(todo, aes(x=valores, fill=grupo, color=grupo)) +
geom_histogram(aes(y=..density..), binwidth=3)+ geom_density(aes(color=grupo))
But I'm just getting a graphic with a single line and a single color.
I would like to have different colors for the the two density lines. And if possible the histograms as well.
I've done it with ggplot2 but base R would also be OK.
or I don't know what I've changed and now I get this:
ggplot(todo, aes(x=valores, fill=grupo, color=grupo)) +
geom_histogram( position="identity", binwidth=3, alpha=0.5)+
geom_density(aes(color=grupo))
but the density lines were not plotted.
or even strange things like
I suggest this ggplot2 solution:
ggplot(todo, aes(valores, color=grupo)) +
geom_histogram(position="identity", binwidth=3, aes(y=..density.., fill=grupo), alpha=0.5) +
geom_density()
#skan: Your attempt was close but you plotted the frequencies instead of density values in the histogram.
A base R solution could be:
hist(AA, probability = T, col = rgb(1,0,0,0.5), border = rgb(1,0,0,1),
xlim=range(AA,BB), breaks= 50, ylim=c(0,0.025), main="AA and BB", xlab = "")
hist(BB, probability = T, col = rgb(0,0,1,0.5), border = rgb(0,0,1,1), add=T)
lines(density(AA))
lines(density(BB), lty=2)
For alpha I used rgb. But there are more ways to get it in. See alpha() in the scales package for instance. I added also the breaks parameter for the plot of the AAs to increase the binwidth compared to the BB group.

Interpolating correctly between points in R using ggplot2 and axis scaling

I have some data I want to graph on a semi-log scale, however I get some artifacts when there is a large jump between points. On linear scale, a straight line is drawn between subsequent points, which is a fine approximation for visualization. However, the exact same thing is done when using the log scale (either by using scale_x_log10 or scale_x_continuous with a log transformation). A line between two points on the semi-log scale should show up curved. In other words, this:
df <- data.frame(x = c(0, 1), y = c(0, 1))
ggplot(data = df, aes(x, y)) + geom_line() + scale_x_log10(limits = c(10^-3, 10^0))
produces this:
when I would expect something more like this:
generated by this code:
df <- data.frame(x = seq(0, 1, 0.01), y = seq(0, 1, 0.01))
ggplot(data = df, aes(x, y)) + geom_line() + scale_x_log10(limits = c(10^-3, 10^0))
It's clear what's happening, but I'm not sure what the best way to fix the interpolation is. In the actual data I'm plotting there are a few jumps at various points, which makes the plots very misleading when trying to compare two lines. (They're ROC curves in this instance.)
One thought is I can search the data for jumps and fill in some interpolated points myself, but I'm hoping for a cleaner way that doesn't involve me adding in a bunch of fake data points.
What you describe is a transformation of the coordinate system, not a transformation of the scales. The distinction is that scale transformations take place before any statistical transformations, and coordinate transformations take place afterward. In this case, the "statistical transformation" is "draw a straight line between the points". With a transformed scale, the line is straight in the transformed (log) space; with a transformed coordinate, it is straight in the original (linear) space and therefore curved in log space.
# don't include 0 in the data because log 0 is -Inf
DF <- data.frame(x = c(0.1, 1), y = c(0.1, 1))
ggplot(data = DF, aes(x = x, y = y)) +
geom_line() +
coord_trans(x="log10")

ggplot2 fails to draw curved line

I am trying to draw a curved line in ggplot2 which should look like this:
However, in ggplot2 I can only draw in the line in the following way:
Here is the code that I have used to create both pictures:
df1 <- data.frame(dollar = c(0,5,10,20,30), value = c(0,200,300, -100, -300))
# draw line graph with base plot
plot(y = df1$dollar, x = df1$emiss_red, type = "l")
# draw line graph with ggplot
ggplot() + geom_line(data = df1, aes(y = dollar, x = value), size =1)
Ggplot2 seems to order the data frame according to x value and then connect the points according to the x-value. However, I do not want my graph to be ordered.
Additionally, I do not want to flip the axis around, since dollar value must appear on the y-axis. Since I prefer to create these graphs in ggplot2, does anyone know how to accomplish this?
You just need to swap geom_line to geom_path. As noted in the documentation, geom_path connects "observations in original order", while geom_line connects "observations, ordered by x value".
So the last line would be
ggplot() + geom_path(data = df1, aes(y = dollar, x = value), size =1)

Conditional graphing and fading colors

I am trying to create a graph where because there are so many points on the graph, at the edges of the green it starts to fade to black while the center stays green. The code I am currently using to create this graph is:
plot(snb$px,snb$pz,col=snb$event_type,xlim=c(-2,2),ylim=c(1,6))
I looked into contour plotting but that did not work for this. The coloring variable is a factor variable.
Thanks!
This is a great problem for ggplot2.
First, read the data in:
snb <- read.csv('MLB.csv')
With your data frame you could try plotting points that are partly transparent, and setting them to be colored according to the factor event_type:
require(ggplot2)
p1 <- ggplot(data = snb, aes(x = px, y = py, color = event_type)) +
geom_point(alpha = 0.5)
print(p1)
and then you get this:
Or, you might want to think about plotting this as a heatmap using geom_bin2d(), and plotting facets (subplots) for each different event_type, like this:
p2 <- ggplot(data = snb, aes(x = px, y = py)) +
geom_bin2d(binwidth = c(0.25, 0.25)) +
facet_wrap(~ event_type)
print(p2)
which makes a plot for each level of the factor, where the color will be the number of data points in each bins that are 0.25 on each side. But, if you have more than about 5 or 6 levels, this might look pretty bad. From the small data sample you supplied, I got this
If the levels of the factors don't matter, there are some nice examples here of plots with too many points. You could also try looking at some of the examples on the ggplot website or the R cookbook.
Transparency could help, which is easily achieved, as #BenBolker points out, with adjustcolor:
colvect = adjustcolor(c("black", "green"), alpha = 0.2)
plot(snb$px, snb$pz,
col = colvec[snb$event_type],
xlim = c(-2,2),
ylim = c(1,6))
It's built in to ggplot:
require(ggplot2)
p <- ggplot(data = snb, aes(x = px, y = pz, color = event_type)) +
geom_point(alpha = 0.2)
print(p)

Resources