I have data that I made a histogram for (ur_memr_t$up...). I then used fitdistr to fit an exponential dist to the data. I captured the parameters for the fitted distribution and generated some random variates. I then made a density curve for the exp random variates. I want to place the density over the histogram. The following code throws this error
exp_data <- data.frame( x = rexp(3000, rate = 0.0144896182))
ggplot(data = ur_memr_t, aes(ur_memr_t$updated_days_to_next_ur)) +
geom_histogram() + ggplot(exp_data, aes(x)) + geom_density()
Error in p + o : non-numeric argument to binary operator
In addition: Warning message:
Incompatible methods ("+.gg", "Ops.data.frame") for "+"
If I run
ggplot(data = ur_memr_t, aes(ur_memr_t$updated_days_to_next_ur)) +
geom_histogram()
and
ggplot(exp_data, aes(x)) + geom_density()
seperately, they produce correct plots. Why will they not work together and plot one on top of the other?
I think it should work but you can only have one ggplot statement. Try something like this:
g = ggplot(data = ur_memr_t, aes(updated_days_to_next_ur))
g = g + geom_histogram(aes(updated_days_to_next_ur))
g = g + geom_density(data = exp_data, aes(x))
Hope it helps
Related
I have a vector of sample means and I've been tying to plot a probability histogram using hist(x) and ggplot but the bins exceed 1(which is very unusual for a probability distribution),I then used a PlotRelativeFrequency(hist(x)) function to force R to plot a histogram of probabilities,It worked! but My problem is,I cannot plot a density function over the histogram.When I used the lines(density(x)) function it plots a density function that goes way off the graph.
Since your question is tagged with ggplot, I'll give a ggplot answer.
To make histograms relative you have to set aes(y = stat(density)) such that it integrates to 1. Then, you could give the stat_function() the relevant density function for any theoretical distribution. The downside is that you'll have to pre-compute the parameters.
df <- data.frame(x = rnorm(500, 10, 2))
pars <- list(mean = mean(df$x), sd = sd(df$x))
library(ggplot2)
ggplot(df, aes(x)) +
geom_histogram(binwidth = 1, aes(y = stat(density))) +
stat_function(fun = function(x) {dnorm(x, mean = pars$mean, sd = pars$sd)})
Next up, we can plot the empirical density using kernel density estimates, which does everything pretty much automatically:
ggplot(df, aes(x)) +
geom_histogram(binwidth = 1, aes(y = stat(density))) +
geom_density()
Lastly, you can have a look at this stats function, that essentially automates the first version. Full disclaimer: I'm the author of that github repo.
library(ggnomics)
ggplot(df, aes(x)) +
geom_histogram(binwidth = 1, aes(y = stat(density))) +
stat_theodensity()
When I plot densities with ggplot, it seems to be very wrong around the limits. I see that geom_density and other functions allow specifying various density kernels, but none of them seem to fix the issue.
How do you correctly plot densities around the limits with ggplot?
As an example, let's plot the Chi-square distribution with 2 degrees of freedom. Using the builtin probability densities:
library(ggplot2)
u = seq(0, 2, by=0.01)
v = dchisq(u, df=2)
df = data.frame(x=u, p=v)
p = ggplot(df) +
geom_line(aes(x=x, y=p), size=1) +
theme_classic() +
coord_cartesian(xlim=c(0, 2), ylim=c(0, 0.5))
show(p)
We get the expected plot:
Now let's try simulating it and plotting the empirical distribution:
library(ggplot2)
u = rchisq(10000, df=2)
df = data.frame(x=u)
p = ggplot(df) +
geom_density(aes(x=x)) +
theme_classic() +
coord_cartesian(xlim=c(0, 2))
show(p)
We get an incorrect plot:
We can try to visualize the actual distribution:
library(ggplot2, dplyr, tidyr)
u = rchisq(10000, df=2)
df = data.frame(x=u)
p = ggplot(df) +
geom_point(aes(x=x, y=0.5), position=position_jitter(height=0.2), shape='.', alpha=1) +
theme_classic() +
coord_cartesian(xlim=c(0, 2), ylim=c(0, 1))
show(p)
And it seems to look correct, contrary to the density plot:
It seems like the problem has to do with kernels, and geom_density does allow using different kernels. But they don't really correct the limit problem. For example, the code above with triangular looks about the same:
Here's an idea of what I'm expecting to see (of course, I want a density, not a histogram):
library(ggplot2)
u = rchisq(10000, df=2)
df = data.frame(x=u)
p = ggplot(df) +
geom_histogram(aes(x=x), center=0.1, binwidth=0.2, fill='white', color='black') +
theme_classic() +
coord_cartesian(xlim=c(0, 2))
show(p)
The usual kernel density methods have trouble when there is a constraint such as in this case for a density with only support above zero. The usual recommendation for handling this has been to use the logspline package:
install.packages("logspline")
library(logspline)
png(); fit <- logspline(rchisq(10000, 3))
plot(fit) ; dev.off()
If this needed to be done in the ggplot2 environment there is a dlogspline function:
densdf <- data.frame( y=dlogspline(seq(0,12,length=1000), fit),
x=seq(0,12,length=1000))
ggplot(densdf, aes(y=y,x=x))+geom_line()
Perhaps you were insisting on one with 2 degrees of freedom?
Say I have function like:
quad <- function(x)
{
return (x^2)
}
That I plot using ggplot:
plot <- ggplot(data.frame(x=c(0,4)), aes(x = x)) +
stat_function(fun = quad)
So far, so good, but the line is really thin. I thus add some specific geometry to the line:
plot + geom_line(size=2)
But it returns this error:
Error: geom_line requires the following missing aesthetics: y
How can I manipulate line geometry in this type of graphs?
After playing around a while I found out that an argument named size can be passed into stat_function. It has the same effect as gem_line:
plot <- ggplot(data.frame(x=c(0,4)), aes(x = x)) +
stat_function(fun = quad, size=1.5)
Alternate title: How can I scale my y-axis for the histogram only to range 0-1?
Horrible question title, so example to demonstrate. The data here are set so that the ranges are nearly equal to my data ranges on the y-axis... about 0 to 3.5.
library(ggplot2)
x<-runif(100)*200
y<-runif(100)*3
xy<-data.frame(x,y)
p <- ggplot(xy) + theme_bw()
p + geom_point(aes(x, y)) +
geom_histogram(aes(x), alpha=1/10)
I want the histogram 'y-range' to be scaled to a max of 1. The first part of this answer shows an example, saying:
You were close, but need to use (..density..)*binwidth rather than
..count../sum(..count..)
# Your data:
all <- data.frame(fill=rep(LETTERS[1:4],c(26,24,23,29)),
Events=c(1,1,3,1,1,6,2,1,1,2,1,1,1,1,5,1,2,2,1,1,1,1,2,1,2,1,2,3,1,3,2,5,1,1,1,2,1,1,1,1,1,1,1,1,1,4,3,3,5,3,1,2,2,3,3,9,8,1,1,2,2,1,2,39,43,194,129,186,1,2,7,4,1,12,3,2,3,8,20,5,1,4,9,51,12,7,6,7,7,9,17,18,8,7,6,10,27,11,21,89,47,1))
bw <- 20 # set the binwidth
# plot
p1<-ggplot(all,aes(x=Events, fill=fill)) +
geom_histogram(aes(y=(..density..)*bw), position='dodge', binwidth=bw)
p1
but it doesn't work for me, failing with an error about there being no variable 'bw':
bw <- 30
p <- ggplot(xy) + theme_bw()
p + geom_point(aes(x, y)) +
geom_histogram(aes(x=x, y=..density.. * bw), alpha=1/10)
Error in eval(expr, envir, enclos) : object 'bw' not found
Goodness me, I found the notation I needed...
y=..ncount..
From: Normalizing y-axis in histograms in R ggplot to proportion
I have question probably similar to Fitting a density curve to a histogram in R. Using qplot I have created 7 histograms with this command:
(qplot(V1, data=data, binwidth=10, facets=V2~.)
For each slice, I would like to add a fitting gaussian curve. When I try to use lines() method, I get error:
Error in plot.xy(xy.coords(x, y), type = type, ...) :
plot.new has not been called yet
What is the command to do it correctly?
Have you tried stat_function?
+ stat_function(fun = dnorm)
You'll probably want to plot the histograms using aes(y = ..density..) in order to plot the density values rather than the counts.
A lot of useful information can be found in this question, including some advice on plotting different normal curves on different facets.
Here are some examples:
dat <- data.frame(x = c(rnorm(100),rnorm(100,2,0.5)),
a = rep(letters[1:2],each = 100))
Overlay a single normal density on each facet:
ggplot(data = dat,aes(x = x)) +
facet_wrap(~a) +
geom_histogram(aes(y = ..density..)) +
stat_function(fun = dnorm, colour = "red")
From the question I linked to, create a separate data frame with the different normal curves:
grid <- with(dat, seq(min(x), max(x), length = 100))
normaldens <- ddply(dat, "a", function(df) {
data.frame(
predicted = grid,
density = dnorm(grid, mean(df$x), sd(df$x))
)
})
And plot them separately using geom_line:
ggplot(data = dat,aes(x = x)) +
facet_wrap(~a) +
geom_histogram(aes(y = ..density..)) +
geom_line(data = normaldens, aes(x = predicted, y = density), colour = "red")
ggplot2 uses a different graphics paradigm than base graphics. (Although you can use grid graphics with it, the best way is to add a new stat_function layer to the plot. The ggplot2 code is the following.
Note that I couldn't get this to work using qplot, but the transition to ggplot is reasonably straighforward, the most important difference is that your data must be in data.frame format.
Also note the explicit mapping of the y aesthetic aes=aes(y=..density..)) - this is slighly unusual but takes the stat_function results and maps it to the data:
library(ggplot2)
data <- data.frame(V1 <- rnorm(700), V2=sample(LETTERS[1:7], 700, replace=TRUE))
ggplot(data, aes(x=V1)) +
stat_bin(aes(y=..density..)) +
stat_function(fun=dnorm) +
facet_grid(V2~.)