How can I overlay my barplot on real data with the estimated negative binomial density function using the same mean and variance?
library(data.table)
library(ggplot2)
temp <- data.table(cbind(V1=c(1,2,3,4,5,9), N=c(50,40,30,20,10,2)))
ggplot(temp, aes(x=V1, y= N)) +
geom_histogram(stat="identity", binwidth = 2.5) +
scale_y_continuous(breaks=c(0, 100, 200, max(temp$N))) +
scale_x_continuous(breaks=c(0, 100, 200, max(temp$V1))) +
theme(panel.grid.minor.x=element_blank(),
panel.grid.major.x=element_blank()
)
I tried to add stat_function(fun = dnbinom, args = list(size=1, mu = mean(temp$V1)), color="red") but all I see is a red line on the abscissa. Same for dpois (with lambda=mean(temp$V1)) and dnorm (with mean = mean(temp$V1), sd = sd(temp$V1)).
Maybe my parametrization is wrong?
#mmk is correct: normalization is the key. Here's how you can achieve what you want:
#simplest normalization
temp$Nmod <- temp$N / sum(temp$N)
#alternative normalization
#temp$Nmod <- temp$N / sqrt(sum(temp$N * temp$N))
temp$pois <- dpois(temp$V1, lambda = mean(temp$V1))
temp$nbinom <- dnbinom(temp$V1, mu = mean(temp$V1), size = 1)
ggplot(temp, aes(x=V1, y= Nmod)) +
geom_histogram(stat="identity", binwidth = 2.5) +
theme(panel.grid.minor.x=element_blank(),
panel.grid.major.x=element_blank()) +
geom_line(aes(y = pois), col = "red") +
geom_line(aes(y = nbinom), col = "blue")
Related
I am trying to make a histogram of density values and overlay that with the curve of a density function (not the density estimate).
Using a simple standard normal example, here is some data:
x <- rnorm(1000)
I can do:
q <- qplot( x, geom="histogram")
q + stat_function( fun = dnorm )
but this gives the scale of the histogram in frequencies and not densities. with ..density.. I can get the proper scale on the histogram:
q <- qplot( x,..density.., geom="histogram")
q
But now this gives an error:
q + stat_function( fun = dnorm )
Is there something I am not seeing?
Another question, is there a way to plot the curve of a function, like curve(), but then not as layer?
Here you go!
# create some data to work with
x = rnorm(1000);
# overlay histogram, empirical density and normal density
p0 = qplot(x, geom = 'blank') +
geom_line(aes(y = ..density.., colour = 'Empirical'), stat = 'density') +
stat_function(fun = dnorm, aes(colour = 'Normal')) +
geom_histogram(aes(y = ..density..), alpha = 0.4) +
scale_colour_manual(name = 'Density', values = c('red', 'blue')) +
theme(legend.position = c(0.85, 0.85))
print(p0)
A more bare-bones alternative to Ramnath's answer, passing the observed mean and standard deviation, and using ggplot instead of qplot:
df <- data.frame(x = rnorm(1000, 2, 2))
# overlay histogram and normal density
ggplot(df, aes(x)) +
geom_histogram(aes(y = after_stat(density))) +
stat_function(
fun = dnorm,
args = list(mean = mean(df$x), sd = sd(df$x)),
lwd = 2,
col = 'red'
)
What about using geom_density() from ggplot2? Like so:
df <- data.frame(x = rnorm(1000, 2, 2))
ggplot(df, aes(x)) +
geom_histogram(aes(y=..density..)) + # scale histogram y
geom_density(col = "red")
This also works for multimodal distributions, for example:
df <- data.frame(x = c(rnorm(1000, 2, 2), rnorm(1000, 12, 2), rnorm(500, -8, 2)))
ggplot(df, aes(x)) +
geom_histogram(aes(y=..density..)) + # scale histogram y
geom_density(col = "red")
I'm trying for iris data set. You should be able to see graph you need in these simple code:
ker_graph <- ggplot(iris, aes(x = Sepal.Length)) +
geom_histogram(aes(y = ..density..),
colour = 1, fill = "white") +
geom_density(lwd = 1.2,
linetype = 2,
colour = 2)
I have the below code to plot a probit model comparing the chance of success based on a maximum temperature value. Seems to work well, I'm happy with the plot. But I'm hoping to highlight the point along the curve where the probability is 50%, and then draw a line down to the x-axis to determine (and show) this value as well. Also hoping to include confidence intervals for this estimate. Any help would be greatly appreciated!
data <- data.frame(MaxTemp = c(53.2402, 59.01004,51.42602,41.53883,44.70763,53.90285,51.130318,54.5929,43.697559,49.772446,54.902222,52.720528,58.782608,47.680374,48.30313,56.10921,57.660324,46.387924,60.503147,53.803177,52.27771,58.58555,55.74136,49.04505,46.816269,52.58295,52.751373,56.209747,51.733894,51.424305,50.74564,47.046513,53.030407,56.68752,56.639351,53.526585,51.562313),
Success=c(1,1,1,0,0,1,1,1,0,0,1,1,1,0,0,1,1,0,1,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,1,1))
TempProbitModel <- glm(Success ~ MaxTemp, data=data, family=binomial(link="logit"))
temp.data <- data.frame(MaxTemp = seq(40, 62, 0.5))
predicted.data <- as.data.frame(predict(TempProbitModel, newdata = temp.data, type="link", se=TRUE))
new.data <- cbind(temp.data, predicted.data)
std <- qnorm(0.95 / 2 + 0.5)
new.data$ymin <- TempProbitModel$family$linkinv(new.data$fit - std * new.data$se)
new.data$ymax <- TempProbitModel$family$linkinv(new.data$fit + std * new.data$se)
new.data$fit <- TempProbitModel$family$linkinv(new.data$fit)
(TempProb <- ggplot(data, aes(x=MaxTemp, y=Success)) +
geom_point() +
geom_ribbon(data=new.data, aes(y=fit, ymin=ymin, ymax=ymax), alpha=0.5) +
geom_line(data=new.data, aes(y=fit)) +
labs(x="Peak Temperature", y="Probability of Success") )
Find the closest value to y = 0.5:
closest_value <- which(abs(new.data$fit - 0.5) == min(abs(new.data$fit - 0.5)))
Calculate slope at this point:
slope_at_closest_value <- (new.data[closest_value, "MaxTemp"] - new.data[closest_value - 1, "MaxTemp"]) /( new.data[closest_value, "fit"] - new.data[closest_value - 1, "fit"])
x_value <- new.data[closest_value - 1, "MaxTemp"] + slope_at_closest_value * (0.5 - new.data[closest_value - 1, "fit"])
Use this x_value to draw a vertical line:
ggplot(data, aes(x=MaxTemp, y=Success)) +
geom_point() +
geom_ribbon(data=new.data, aes(y=fit, ymin=ymin, ymax=ymax), alpha=0.5) +
geom_line(data=new.data, aes(y=fit)) +
labs(x="Peak Temperature", y="Probability of Success") +
geom_vline(xintercept = x_value, color="red")
This draws the following plot:
The confidence interval can be drawn accordingly.
An another way of getting this point is to use approxfun function.
f <- approxfun(new.data$fit,new.data$MaxTemp, rule = 2)
f(0.5)
[1] 49.39391
So now, if you are plotting it:
library(ggplot2)
ggplot(data, aes(x = MaxTemp, y = Success))+
geom_point()+
geom_ribbon(data=new.data, aes(y=fit, ymin=ymin, ymax=ymax), alpha=0.5) +
geom_line(data=new.data, aes(y=fit)) +
labs(x="Peak Temperature", y="Probability of Success") +
geom_point(x = f(0.5), y = 0.5, size = 3, color = "red")+
geom_vline(xintercept = f(0.5), linetype = "dashed", color = "red")+
geom_hline(yintercept = 0.5, linetype = "dashed", color = "red")
I am trying to make a histogram of density values and overlay that with the curve of a density function (not the density estimate).
Using a simple standard normal example, here is some data:
x <- rnorm(1000)
I can do:
q <- qplot( x, geom="histogram")
q + stat_function( fun = dnorm )
but this gives the scale of the histogram in frequencies and not densities. with ..density.. I can get the proper scale on the histogram:
q <- qplot( x,..density.., geom="histogram")
q
But now this gives an error:
q + stat_function( fun = dnorm )
Is there something I am not seeing?
Another question, is there a way to plot the curve of a function, like curve(), but then not as layer?
Here you go!
# create some data to work with
x = rnorm(1000);
# overlay histogram, empirical density and normal density
p0 = qplot(x, geom = 'blank') +
geom_line(aes(y = ..density.., colour = 'Empirical'), stat = 'density') +
stat_function(fun = dnorm, aes(colour = 'Normal')) +
geom_histogram(aes(y = ..density..), alpha = 0.4) +
scale_colour_manual(name = 'Density', values = c('red', 'blue')) +
theme(legend.position = c(0.85, 0.85))
print(p0)
A more bare-bones alternative to Ramnath's answer, passing the observed mean and standard deviation, and using ggplot instead of qplot:
df <- data.frame(x = rnorm(1000, 2, 2))
# overlay histogram and normal density
ggplot(df, aes(x)) +
geom_histogram(aes(y = after_stat(density))) +
stat_function(
fun = dnorm,
args = list(mean = mean(df$x), sd = sd(df$x)),
lwd = 2,
col = 'red'
)
What about using geom_density() from ggplot2? Like so:
df <- data.frame(x = rnorm(1000, 2, 2))
ggplot(df, aes(x)) +
geom_histogram(aes(y=..density..)) + # scale histogram y
geom_density(col = "red")
This also works for multimodal distributions, for example:
df <- data.frame(x = c(rnorm(1000, 2, 2), rnorm(1000, 12, 2), rnorm(500, -8, 2)))
ggplot(df, aes(x)) +
geom_histogram(aes(y=..density..)) + # scale histogram y
geom_density(col = "red")
I'm trying for iris data set. You should be able to see graph you need in these simple code:
ker_graph <- ggplot(iris, aes(x = Sepal.Length)) +
geom_histogram(aes(y = ..density..),
colour = 1, fill = "white") +
geom_density(lwd = 1.2,
linetype = 2,
colour = 2)
I am trying to make a histogram of density values and overlay that with the curve of a density function (not the density estimate).
Using a simple standard normal example, here is some data:
x <- rnorm(1000)
I can do:
q <- qplot( x, geom="histogram")
q + stat_function( fun = dnorm )
but this gives the scale of the histogram in frequencies and not densities. with ..density.. I can get the proper scale on the histogram:
q <- qplot( x,..density.., geom="histogram")
q
But now this gives an error:
q + stat_function( fun = dnorm )
Is there something I am not seeing?
Another question, is there a way to plot the curve of a function, like curve(), but then not as layer?
Here you go!
# create some data to work with
x = rnorm(1000);
# overlay histogram, empirical density and normal density
p0 = qplot(x, geom = 'blank') +
geom_line(aes(y = ..density.., colour = 'Empirical'), stat = 'density') +
stat_function(fun = dnorm, aes(colour = 'Normal')) +
geom_histogram(aes(y = ..density..), alpha = 0.4) +
scale_colour_manual(name = 'Density', values = c('red', 'blue')) +
theme(legend.position = c(0.85, 0.85))
print(p0)
A more bare-bones alternative to Ramnath's answer, passing the observed mean and standard deviation, and using ggplot instead of qplot:
df <- data.frame(x = rnorm(1000, 2, 2))
# overlay histogram and normal density
ggplot(df, aes(x)) +
geom_histogram(aes(y = after_stat(density))) +
stat_function(
fun = dnorm,
args = list(mean = mean(df$x), sd = sd(df$x)),
lwd = 2,
col = 'red'
)
What about using geom_density() from ggplot2? Like so:
df <- data.frame(x = rnorm(1000, 2, 2))
ggplot(df, aes(x)) +
geom_histogram(aes(y=..density..)) + # scale histogram y
geom_density(col = "red")
This also works for multimodal distributions, for example:
df <- data.frame(x = c(rnorm(1000, 2, 2), rnorm(1000, 12, 2), rnorm(500, -8, 2)))
ggplot(df, aes(x)) +
geom_histogram(aes(y=..density..)) + # scale histogram y
geom_density(col = "red")
I'm trying for iris data set. You should be able to see graph you need in these simple code:
ker_graph <- ggplot(iris, aes(x = Sepal.Length)) +
geom_histogram(aes(y = ..density..),
colour = 1, fill = "white") +
geom_density(lwd = 1.2,
linetype = 2,
colour = 2)
I'm trying to add a normal distribution line to my chart. But it simply becomes flat at the bottom for some reason.
My code
MyChart <- function(x) {
ggplot(x, aes( x = max.DrawD, y = cum.Return, label = Symbol)) +
scale_y_continuous(breaks = c(seq(0, 10, 1)), limits = c(0,10)) + # outliers excluded
scale_x_continuous(limit =c(0, 0.5)) +
geom_histogram(aes(y = ..density..), binwidth = 0.02) +
geom_text(size = 3) +
stat_function(fun = dnorm, colour = 'firebrick') +
theme_classic()
}
As you can see, the red line (my stat_function() code) is right at the bottom of the graph. How can I resolve this?
UPDATE: So I solved it. But I don't know why it now works. Just added a manual spec on the mean and standard deviation.
Updated code
MyChart <- function(x) {
ggplot(x, aes( x = max.DrawD, y = cum.Return, label = Symbol)) +
scale_y_continuous(breaks = c(seq(0, 10, 1)), limits = c(0,10)) + # outliers excluded
scale_x_continuous(limit =c(0, 0.5)) +
geom_histogram(aes(y = ..density..), binwidth = 0.02) +
geom_text(size = 3) +
stat_function(fun = dnorm, args = list(mean = mean(x$max.DrawD), sd = sd(x$max.DrawD)), colour = 'firebrick') +
theme_classic()
}
From #user20650: it works as the function dnorm needs parameters mean and sd. If you dont specify them they are assumed to be zero and one