Log-scale transformation of histogram and fittiting gamma curve - r

I have a bunch of data, that I've fitted with a gamma distribution. I've got the histogram and the fitting curve just fine, but now I wish to draw a histogram and the curve with x in log scale. Using scale_x_log10 works just fine for the histogram, but I can't make it work for the stat_function/geom_line.
I understand that's because stat_function now takes the log values, but I'm not sure how to transform the gamma function beforehand for it to work properly. Here are the relevant pictures and code snippets:
This is the original graph:
fit.gamma2 <- fitdist(myvalues[,1],distr="gamma",method="mme")
ggplot(myvalues, aes(x = V1)) +
geom_histogram(aes(y =..density..),
boundary = 0,
binwidth = sirina,
col="black",
fill="blue",
alpha=.2) +
stat_function(fun=dgamma,
args=list(shape = fit.gamma2$estimate["shape"],
rate = fit.gamma2$estimate["rate"])) +
labs(title="Histogram žarkov + Gama porazdelitev (MM)",
x = "Medprihodni časi (s)",
y = "Gostota")
This is that same graph after using scale_x_log10. The red curve is supposed to be the fitting curve, but it's obviously way off.
ggplot(myvalues, aes(x = V1)) +
geom_histogram(aes(y =..density..),
boundary = 0,
binwidth = sirina_log,
col="black",
fill="blue",
alpha=.2) +
stat_function(fun=dgamma,
args=list(shape = fit.gamma2$estimate["shape"],
rate = fit.gamma2$estimate["rate"])) +
# geom_line(aes(x=V1,y=dgamma(V1,fit.gamma2$estimate["shape"], fit.gamma2$estimate["rate"])), color="red", size = 1) +
scale_x_log10()
I have tried applying the values in 10**x form, but as my original data ranges between 0.1 and 800, some values then escape to Inf.

You need to transform your PDF based on the derivative of log10. First create a function for the transformed PDF:
dgammalog10 <- function(x, shape, rate) {
return(x*log(10)*dgamma(x, shape, rate))
}
Then you can use fun=dgammalog10 where you had fun=dgamma.

Related

General rule of overlaying density plot using ggplot2

I`m a novice with the R programming language. What is the standard/general method for overlaying a density curve on a histogram using ggplot2?
It depends wether you want an empirical density estimate or to fit a theoretical density. In both cases, you'd need to match the width of histogram bins to the density.
For the empirical kernel density estimates:
library(ggplot2)
# dummy data
df <- data.frame(
x = rnorm(1000)
)
binwidth <- 0.1
ggplot(df, aes(x)) +
geom_histogram(binwidth = binwidth) +
geom_density(aes(y = after_stat(count * binwidth)),
color = "red")
Theoretical density estimates don't live in ggplot2 but in extention packages. Disclaimer: I'm the author of the following package, so I'm biased:
library(ggh4x)
ggplot(df, aes(x)) +
geom_histogram(binwidth = binwidth) +
stat_theodensity(aes(y = after_stat(count * binwidth)),
color = "red")
Alternatively, if you don't want to bother with setting binwidths you can also scale the histogram to density instead:
ggplot(df, aes(x)) +
geom_histogram(aes(y = after_stat(density))) +
geom_density(color = "red")
Note: after_stat() requires ggplot2 v3.3.0, earlier versions use stat().
You need to make sure that to multiply value of ..count.. in in the density plot call by the value of whatever the binwidth is in the histogram call.
You can do it as follows:
set.seed(100)
a = data.frame(z = rnorm(10000))
binwidthVal=0.1
ggplot(a, aes(x=z)) +
geom_histogram(binwidth = binwidthVal) +
geom_density(colour='red', aes(y=binwidthVal * ..count..))
Credit to Brian Diggs for the idea.
EDIT: Seems like there is already a perfectly good answer here

Linear model diagnostics: smoothing line obtained in ggplot2 is different from the one obtained with base plot

I am trying to reproduce the diagnostics plots for a linear regression model using ggplot2. The smoothing line that I get is different from the one obtained using base plots or ggplot2::autoplot.
library(survival)
library(ggplot2)
model <- lm(wt.loss ~ meal.cal, data=lung)
## Fitted vs. residuals using base plot:
plot(model, which=1)
## Fitted vs. residuals using ggplot
model.frame <- fortify(model)
ggplot(model.frame, aes(.fitted, .resid)) + geom_point() + geom_smooth(method="loess", se=FALSE)
The smoothing line is different, the influence of the the first few points is much larger using the loess method provided by ggplot. My question is: how can I reproduce the smoothing line obtained with plot() using ggplot2?
You can calculate the lowess, which is used to plot the red line in the original diagnostic plot, using samename base function.
smoothed <- as.data.frame(with(model.frame, lowess(x = .fitted, y = .resid)))
ggplot(model.frame, aes(.fitted, .resid)) +
theme_bw() +
geom_point(shape = 1, size = 2) +
geom_hline(yintercept = 0, linetype = "dotted", col = "grey") +
geom_path(data = smoothed, aes(x = x, y = y), col = "red")
And the original:

how to make the value on Y axis start from zero in R, ggplot2

currently, I'm using ggplot2 to make density plot.
ggplot(data=resultFile,aes(x=V19, colour=V1) ) +
geom_line(stat="density") +
xlab("score") +
ylab("density") +
ggtitle(paste(data_name,protocol,level,sep=" ")) +
theme(legend.title=element_blank(), legend.position=c(0.92,0.9)) +
scale_color_manual(values=c("blue","red"),
labels=c("A", "B"))
using this code, I can get the plot below.
However, I can get different plot if I used plot(density()...) function in R.
Y value starts from 0.
How can I make the ggplot's plot as like plot(density()...) in R?
ggplot(data=resultFile,aes(x=V19, colour=V1) ) +
ylim(0,range) #you can use this .
geom_line(stat="density") +
xlab("score") +
ylab("density") +
ggtitle(paste(data_name,protocol,level,sep=" ")) +
theme(legend.title=element_blank(), legend.position=c(0.92,0.9)) +
scale_color_manual(values=c("blue","red"),
labels=c("A", "B"))
ggplot obviously cut off the x-axis at the min and max of the empirical distribution. You can extend the x-axis by adding xlim to the plot but please make sure that the plot does not exceed the theoretical limit of the distribution (in the example below, the theoretical limit is [0, 1], so there is not much reason to show outside the range).
set.seed(1)
temp <- data.frame(x =runif(100)^3)
library(ggplot2)
ggplot(temp, aes(x = x)) + geom_line(stat = "density" + xlim(-.2, 1.2)
plot(density(temp$x))

Visualizing Residuals and Squared Residuals in ggplot2

I've created a plot based on the diamonds data-set with a regression line and 10 cases. I want to add a residual (a vertical line going from observed price to its predicted value, the regression line, in red) specificly for one case (case 3, with the second highest value for price). In a second plot I want to add a red square with side equal to its residual (with the residual on the right side so the square doesn't overlap with the regression line) just to visualize the residuals and the squared residuals (I always do them in red). The code I have so far is shown below. Does someone know how to create these?
library(ggplot2)
library(dplyr)
data(diamonds)
set.seed(123)
small_n = sample_n(diamonds, 10, replace = TRUE)
small_n = select(small_n, price, carat)
p = ggplot(small_n, aes(x = carat, y = price))
p = p +
geom_point() +
ggtitle("Price by Carat") +
geom_hline(yintercept = mean(small_n$price),
linetype = "dashed") +
geom_vline(xintercept= mean(small_n$carat),
linetype = "dashed") +
geom_smooth(method=lm, # Add linear regression lines
se=FALSE, color = "black", size = .2) +
coord_cartesian(xlim = c(0, 3), ylim = c(0, 13000))
p

ggplot Poisson density curve: why zigzag lines?

I would like to plot the density function of a Poisson distribution. I am not sure why I get a jaggy line (in blue). On the sample plot, the normal density curve (in red) looks smooth. It is because the reason the Poisson density function doesn't accept decimal values? How to eliminate the zigzag in the Poisson density plot? Thanks very much for any help.
library(ggplot2)
ggplot(data.frame(X = seq(5, 30)), aes(x = X)) +
stat_function(fun=dpois, geom="line", size=2, color="blue3", args = list(lambda = 15)) +
stat_function(fun=dnorm, geom="line", size=2, color="red4", args = list(mean=20, sd=2))

Resources