I have data from 2 populations.
I'd like to get the histogram and density plot of both on the same graphic.
With one color for one population and another color for the other one.
I've tried this (example):
library(ggplot2)
AA <- rnorm(100000, 70,20)
BB <- rnorm(100000,120,20)
valores <- c(AA,BB)
grupo <- c(rep("AA", 100000),c(rep("BB", 100000)))
todo <- data.frame(valores, grupo)
ggplot(todo, aes(x=valores, fill=grupo, color=grupo)) +
geom_histogram(aes(y=..density..), binwidth=3)+ geom_density(aes(color=grupo))
But I'm just getting a graphic with a single line and a single color.
I would like to have different colors for the the two density lines. And if possible the histograms as well.
I've done it with ggplot2 but base R would also be OK.
or I don't know what I've changed and now I get this:
ggplot(todo, aes(x=valores, fill=grupo, color=grupo)) +
geom_histogram( position="identity", binwidth=3, alpha=0.5)+
geom_density(aes(color=grupo))
but the density lines were not plotted.
or even strange things like
I suggest this ggplot2 solution:
ggplot(todo, aes(valores, color=grupo)) +
geom_histogram(position="identity", binwidth=3, aes(y=..density.., fill=grupo), alpha=0.5) +
geom_density()
#skan: Your attempt was close but you plotted the frequencies instead of density values in the histogram.
A base R solution could be:
hist(AA, probability = T, col = rgb(1,0,0,0.5), border = rgb(1,0,0,1),
xlim=range(AA,BB), breaks= 50, ylim=c(0,0.025), main="AA and BB", xlab = "")
hist(BB, probability = T, col = rgb(0,0,1,0.5), border = rgb(0,0,1,1), add=T)
lines(density(AA))
lines(density(BB), lty=2)
For alpha I used rgb. But there are more ways to get it in. See alpha() in the scales package for instance. I added also the breaks parameter for the plot of the AAs to increase the binwidth compared to the BB group.
This questions somewhat relates to a question I previously posted. However, I have narrowed down exactly what I am trying to do and I feel this question is different enough from my previous question to warrant a new post.
I am adding multiple (>50) curves to a plot in R. Each curve has a corresponding probability (0-1). I have sorted the curves by probability and would like to shade the area under each curve with a transparency alpha weighted by probability.
I am adding the plots in a descending sequence by probability. I would like to shade just the portion under each curve that is not covered by any curves currently on the graph.
I have read many posts on shading areas between curves, or under curves, but I cannot figure out how to shade just the area not covered by any other plot on the graph. I hope this is not considered a duplicate.
Shaded area under two curves using R
Shading a kernel density
plot between two points.
How to make gradient color filled
timeseries plot in R
Shading between curves in R
Here is an example picture (marked up in MS paint) of what I would like a final plot to look like (except without the lines inside the polygons). I used four curves in this example, but I will be adding many more when I figure this out. I added the curve with the highest response first, then each subsequent curve, shading just the portion not already filled.
In the above example I used lines to add the curves to the graph and then shaded them in MS paint. I understand to fill in the area under each curve I will need to use polygon with border=NA .Here is an example of how I am planning on using polygon to shade based on the response value. My current approach is to adjust the color using alpha, but if there is a more practical approach using a gray scale pallet or gradient I am open to suggestions.
polygon(x, y1,col=rgb(0,0,0,alpha=(1-wei.param[1,3])), border=NA )
I have tried several different approaches (based on the above hyperlinks) to specify the dimensions of each polygon. I can get it to work for polygons 1-3, but after that they start stacking on top of each other.
Here are example data and code to reproduce the plots.
diameters<-c(rep(1.5,393),3,3,3,3,3.1,3.1,3.1,3.2,3.2,3.2,3.3,3.4,3.4,3.4,3.4,3.4,
3.4,3.4,3.4,3.5,3.5,3.6,3.6,3.7,3.7,3.7,3.7,3.8,3.8,3.8,3.8,3.8,3.8,
3.9,3.9,4,4,4,4.1,4.2,4.2,4.2,4.2,4.3,4.3,4.4,4.49,4.5,4.5,4.6,4.7,
4.7,4.7,4.8,4.9,4.9,4.9,5,5,5,5,5.1,5.1,5.2,5.3,5.4,5.4,5.6,5.7,5.7,
5.7,5.8,6,6,6,6.3,6.4,6.6,6.9,6.9,6.9,7,7.1,7.2,7.4,7.4,7.7,7.8,7.9,
7.9,8.2,8.5,8.5,8.9,9.2,10.2,10.47,10.5,10.7,11.7,13.2,13.5,14.4,14.5,
14.5,15.1,18.4)
wei.param<-matrix(data=NA,nrow=5,ncol=3,dimnames = list(c(),c("shape", "scale", "prob")))
wei.param[,1]<-c(1.834682,2.720390,3.073429,1.9,1.9)
wei.param[,2]<-c(2.78,2.78,2.78,1.6,2.8710692)
wei.param[,3]<-c(0.49, 0.46, 0.26, 0.26, 0.07)
x=seq(0,20,1)
y1<-dweibull(x,shape=wei.param[1,1],scale=wei.param[1,2])
y2<-dweibull(x,shape=wei.param[2,1],scale=wei.param[2,2])
y3<-dweibull(x,shape=wei.param[3,1],scale=wei.param[3,2])
y4<-dweibull(x,shape=wei.param[4,1],scale=wei.param[4,2])
#Plot
hist(diameters,freq=F,main='',ylim=c(0,.5))
polygon(x, y1,col=rgb(0,0,0,alpha=(1-wei.param[1,3])), border=NA )
lines(x, y1)
lines(x, y2)
lines(x, y3)
lines(x, y4)
I think this what you want:
I don't know how to do this with base R graphics, but here's the code for ggplot2, which I know better. Note that ggplot2 requires data to be input as a data.frame. Also, I created a second probability column so that I could group the polygons with ggplot2.
df <- data.frame(x = rep(x, 4), y = c(y1, y2, y3, y4),
Prob = c(
rep(wei.param[1,3], length(y1)),
rep(wei.param[2,3], length(y2)),
rep(wei.param[2,3], length(y2)),
rep(wei.param[4,3], length(y4))))
df$Prob2 = as.factor(df$Prob)
library(scales) # needed for alpha function with ggplot2
library(ggplot2)
example <- ggplot() +
geom_histogram(aes(x = diameters, y = ..density..),
prob = TRUE, fill = alpha('white', 0), color = 'black') +
geom_polygon(data = df, aes( x = x, y = y), color = 'white',
fill = 'white') +
geom_polygon(data = df, aes( x = x, y = y, alpha = Prob,
group = Prob2)) +
geom_polygon() + theme_bw()
ggsave('example.jpg', example, width = 6, height = 4)
You should be able to do a similar trick with base R. All you need to do is plot the white polygons over your histogram, but under your shaded polygons. If you decide to use my ggplot2 code you'll probably want to tweak bin width (see ?geom_histogram for details about how to do this).