I try to plot both geom_histogram and geom_density in one figure. When I plot the two separate from each other I get for each the output I want (histogram and density plot) but when I try combining them, only the histogram is showed (regardless of which order of the histogram/density in the code).
My code looks like this:
ggplot(data=Stack_time, aes(x=values))+geom_density(alpha=0.2, fill="#FF6666")+
geom_histogram(binwidth = 50, colour="black", fill="#009454")
I do not receive any error message, but the geom_density is never shown in combination with the geom_histogram.
Since you did not provide any data here a solution based on mtcars:
Your code is nearly correct. You need to add an alpha value to your histogram, so you can see the density. But also you need to scale your data, since the density plot is between the range of 0 and 1. If you got data values larger then 1, the density plot can be tiny and you can't see it. With the function scale_data as defined as follows, i scale my data to the range of 0-1
df=mtcars
scale_data <- function(x){(x-min(x))/(max(x)-min(x))}
df$mpg2 <- scale_data(df$mpg)
library(ggplot2)
ggplot(data=df, aes(x=mpg2))+geom_density(alpha=0.2, fill="#FF6666")+
geom_histogram(binwidth = 50, colour="black", fill="#009454", alpha = 0.1)
this gives the expected output:
you can adjust this solution to your needs. Just scale the data or the density plot to the data
This should do the job, approximately:
data.frame(x=rnorm(1000)) %>% ggplot(aes(x, ..density..)) + geom_histogram(binwidth = 0.2, alpha=0.5) + geom_density(fill="red", alpha=0.2)
Related
I have data from 2 populations.
I'd like to get the histogram and density plot of both on the same graphic.
With one color for one population and another color for the other one.
I've tried this (example):
library(ggplot2)
AA <- rnorm(100000, 70,20)
BB <- rnorm(100000,120,20)
valores <- c(AA,BB)
grupo <- c(rep("AA", 100000),c(rep("BB", 100000)))
todo <- data.frame(valores, grupo)
ggplot(todo, aes(x=valores, fill=grupo, color=grupo)) +
geom_histogram(aes(y=..density..), binwidth=3)+ geom_density(aes(color=grupo))
But I'm just getting a graphic with a single line and a single color.
I would like to have different colors for the the two density lines. And if possible the histograms as well.
I've done it with ggplot2 but base R would also be OK.
or I don't know what I've changed and now I get this:
ggplot(todo, aes(x=valores, fill=grupo, color=grupo)) +
geom_histogram( position="identity", binwidth=3, alpha=0.5)+
geom_density(aes(color=grupo))
but the density lines were not plotted.
or even strange things like
I suggest this ggplot2 solution:
ggplot(todo, aes(valores, color=grupo)) +
geom_histogram(position="identity", binwidth=3, aes(y=..density.., fill=grupo), alpha=0.5) +
geom_density()
#skan: Your attempt was close but you plotted the frequencies instead of density values in the histogram.
A base R solution could be:
hist(AA, probability = T, col = rgb(1,0,0,0.5), border = rgb(1,0,0,1),
xlim=range(AA,BB), breaks= 50, ylim=c(0,0.025), main="AA and BB", xlab = "")
hist(BB, probability = T, col = rgb(0,0,1,0.5), border = rgb(0,0,1,1), add=T)
lines(density(AA))
lines(density(BB), lty=2)
For alpha I used rgb. But there are more ways to get it in. See alpha() in the scales package for instance. I added also the breaks parameter for the plot of the AAs to increase the binwidth compared to the BB group.
I have some code that is plots a histogram of some values, along with a few horizontal lines to represent reference points to compare against. However, ggplot is not generating a legend for the lines.
library(ggplot2)
library(dplyr)
## Siumlate an equal mix of uniform and non-uniform observations on [0,1]
x <- data.frame(PValue=c(runif(500), rbeta(500, 0.25, 1)))
y <- c(Uniform=1, NullFraction=0.5) %>% data.frame(Line=names(.) %>% factor(levels=unique(.)), Intercept=.)
ggplot(x) +
aes(x=PValue, y=..density..) + geom_histogram(binwidth=0.02) +
geom_hline(aes(yintercept=Intercept, group=Line, color=Line, linetype=Line),
data=y, alpha=0.5)
I even tried reducing the problem to just plotting the lines:
ggplot(y) +
geom_hline(aes(yintercept=Intercept, color=Line)) + xlim(0,1)
and I still don't get a legend. Can anyone explain why my code isn't producing plots with legends?
By default show_guide = FALSE for geom_hline. If you turn this on then the legend will appear. Also, alpha needs to be inside of aes otherwise the colours of the lines will not be plotted properly (on the legend). The code looks like this:
ggplot(x) +
aes(x=PValue, y=..density..) + geom_histogram(binwidth=0.02) +
geom_hline(aes(yintercept=Intercept, colour=Line, linetype=Line, alpha=0.5),
data=y, show_guide=TRUE)
And output:
I would like to use ggplot2 to draw a lattice plot of densities produced from different methods, in which the same yaxis scale is used throughout.
I would like to set the upper limit of the y axis to a value below the highest density value for any one method. However ggplot by default removes sections of the geom that are outside of the plotted region.
For example:
# Toy example of problem
xval <- rnorm(10000)
#Base1
plot(density(xval))
#Base2
plot(density(xval), ylim=c(0, 0.3)) # densities > 0.3 not removed from plot
xval <- as.data.frame(xval)
ggplot(xval, aes(x=xval)) + geom_density() #gg1 - looks like Base1
ggplot(xval, aex(x=xval)) + geom_density() + ylim(0, 0.3)
#gg2: does not look like Base2 due to removal of density values > 0.3
These produce the images below:
How can I make the ggplot image not have the missing section?
Using xlim() or ylim() directly will drop all data points that are not within the specified range. This yields the discontinuity of the density plot. Use coord_cartesian() to zoom in without losing the data points.
ggplot(xval, aes(x=xval)) +
geom_density() +
coord_cartesian(ylim = c(0, 0.3))
I am trying to use the excellent ggplot2 using the bar geom to plot the probability mass rather than the count. However, using aes(y=..density..) the distribution does not sum to one (but is close). I think the problem might be due to the default binwidth for factors. Here is an example of the problem,
age <- c(rep(0,4), rep(1,4))
mppf <- c(1,1,1,0,1,1,0,0)
data.test <- as.data.frame(cbind(age,mppf))
data.test$age <- as.factor(data.test$age)
data.test$mppf <- as.factor(data.test$mppf)
p.test.density <- ggplot(data.test, aes(mppf, group=age, fill=age)) +
geom_bar(aes(y=..density..), position='dodge') +
scale_y_continuous(limits=c(0,1))
dev.new()
print(p.test.density)
I can get around this problem by keeping the x-variable as continuous and setting binwidth=1, but it doesn't seem very elegant.
data.test$mppf.numeric <- as.numeric(data.test$mppf)
p.test.density.numeric <- ggplot(data.test, aes(mppf.numeric, group=age, fill=age)) +
geom_histogram(aes(y=..density..), position='dodge', binwidth=1)+
scale_y_continuous(limits=c(0,1))
dev.new()
print(p.test.density.numeric)
I think you almost have it figured out, and would have once you realized you needed a bar plot and not a histogram.
The default width for bars with categorical data is .9 (See ?stat_bin. The help page for geom_bar doesn't give the default bar width but does send you to stat_bin for further reading.). Given that, your plots show the correct density for a bar width of .9. Simply change to a width of 1 and you will see the density values you expected to see.
ggplot(data.test, aes(x = mppf, group = age, fill = age)) +
geom_bar(aes(y=..density..), position = "dodge", width = 1) +
scale_y_continuous(limits=c(0,1))
The most commonly cited example of how to visualize a logistic fit using ggplot2 seems to be something very much like this:
data("kyphosis", package="rpart")
ggplot(data=kyphosis, aes(x=Age, y = as.numeric(Kyphosis) - 1)) +
geom_point() +
stat_smooth(method="glm", family="binomial")
This visualisation works great if you don't have too much overlapping data, and the first suggestion for crowded data seems to be to use injected jitter in the x and y coordinates of the points then adjust the alpha value of the points. When you get to the point where individual points aren't useful but distributions of points are, is it possible to use geom_density(), geom_histogram(), or something else to visualise the data but continue to split the categorical variable along the y-axis as it is done with geom_point()?
From what I have found, geom_density() and geom_histogram() can easily be split/grouped by the categorical variable and both levels can easily be reversed using scale_y_reverse() but I can't figure out if it is even possible to move only one of the categorical variable distributions to the top of the plot. Any help/suggestions would be appreciated.
The annotate() function in ggplot allows you to add geoms to a plot with properties that "are not mapped from the variables of a data frame, but are instead in as vectors," meaning that you can add layers that are unrelated to your data frame. In this case your two density curves are related to the data frame (since the variables are in it), but because you're trying to position them differently, using annotate() is useful.
Here's one way to go about it:
data("kyphosis", package="rpart")
model.only <- ggplot(data=kyphosis, aes(x=Age, y = as.numeric(Kyphosis) - 1)) +
stat_smooth(method="glm", family="binomial")
absents <- subset(kyphosis, Kyphosis=="absent")
presents <- subset(kyphosis, Kyphosis=="present")
dens.absents <- density(absents$Age)
dens.presents <- density(presents$Age)
scaling.factor <- 10 # Make the density plots taller
model.only + annotate("line", x=dens.absents$x, y=dens.absents$y*scaling.factor) +
annotate("line", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1)
This adds two annotated layers with scaled density plots for each of the kyphosis groups. For the presents variable, y is scaled and increased by 1 to shift it up.
You can also fill the density plots instead of just using a line. Instead of annotate("line"...) you need to use annotate("polygon"...), like so:
model.only + annotate("polygon", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red", colour="black", alpha=0.4) +
annotate("polygon", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1, fill="green", colour="black", alpha=0.4)
Technically you could use annotate("density"...), but that won't work when you shift the present plot up by one. Instead of shifting, it fills the whole plot:
model.only + annotate("density", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red") +
annotate("density", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1, fill="green")
The only way around that problem is to use a polygon instead of a density geom.
One final variant: flipping the top density plot along y-axis = 1:
model.only + annotate("polygon", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red", colour="black", alpha=0.4) +
annotate("polygon", x=dens.presents$x, y=(1 - dens.presents$y*scaling.factor), fill="green", colour="black", alpha=0.4)
I am not sure I get your point, but here an attempt:
dat <- rbind(kyphosis,kyphosis)
dat$grp <- factor(rep(c('smooth','dens'),each = nrow(kyphosis)),
levels = c('smooth','dens'))
ggplot(dat,aes(x=Age)) +
facet_grid(grp~.,scales = "free_y") +
#geom_point(data=subset(dat,grp=='smooth'),aes(y = as.numeric(Kyphosis) - 1)) +
stat_smooth(data=subset(dat,grp=='smooth'),aes(y = as.numeric(Kyphosis) - 1),
method="glm", family="binomial") +
geom_density(data=subset(dat,grp=='dens'))