Visualizing Residuals and Squared Residuals in ggplot2 - r

I've created a plot based on the diamonds data-set with a regression line and 10 cases. I want to add a residual (a vertical line going from observed price to its predicted value, the regression line, in red) specificly for one case (case 3, with the second highest value for price). In a second plot I want to add a red square with side equal to its residual (with the residual on the right side so the square doesn't overlap with the regression line) just to visualize the residuals and the squared residuals (I always do them in red). The code I have so far is shown below. Does someone know how to create these?
library(ggplot2)
library(dplyr)
data(diamonds)
set.seed(123)
small_n = sample_n(diamonds, 10, replace = TRUE)
small_n = select(small_n, price, carat)
p = ggplot(small_n, aes(x = carat, y = price))
p = p +
geom_point() +
ggtitle("Price by Carat") +
geom_hline(yintercept = mean(small_n$price),
linetype = "dashed") +
geom_vline(xintercept= mean(small_n$carat),
linetype = "dashed") +
geom_smooth(method=lm, # Add linear regression lines
se=FALSE, color = "black", size = .2) +
coord_cartesian(xlim = c(0, 3), ylim = c(0, 13000))
p

Related

Log-scale transformation of histogram and fittiting gamma curve

I have a bunch of data, that I've fitted with a gamma distribution. I've got the histogram and the fitting curve just fine, but now I wish to draw a histogram and the curve with x in log scale. Using scale_x_log10 works just fine for the histogram, but I can't make it work for the stat_function/geom_line.
I understand that's because stat_function now takes the log values, but I'm not sure how to transform the gamma function beforehand for it to work properly. Here are the relevant pictures and code snippets:
This is the original graph:
fit.gamma2 <- fitdist(myvalues[,1],distr="gamma",method="mme")
ggplot(myvalues, aes(x = V1)) +
geom_histogram(aes(y =..density..),
boundary = 0,
binwidth = sirina,
col="black",
fill="blue",
alpha=.2) +
stat_function(fun=dgamma,
args=list(shape = fit.gamma2$estimate["shape"],
rate = fit.gamma2$estimate["rate"])) +
labs(title="Histogram žarkov + Gama porazdelitev (MM)",
x = "Medprihodni časi (s)",
y = "Gostota")
This is that same graph after using scale_x_log10. The red curve is supposed to be the fitting curve, but it's obviously way off.
ggplot(myvalues, aes(x = V1)) +
geom_histogram(aes(y =..density..),
boundary = 0,
binwidth = sirina_log,
col="black",
fill="blue",
alpha=.2) +
stat_function(fun=dgamma,
args=list(shape = fit.gamma2$estimate["shape"],
rate = fit.gamma2$estimate["rate"])) +
# geom_line(aes(x=V1,y=dgamma(V1,fit.gamma2$estimate["shape"], fit.gamma2$estimate["rate"])), color="red", size = 1) +
scale_x_log10()
I have tried applying the values in 10**x form, but as my original data ranges between 0.1 and 800, some values then escape to Inf.
You need to transform your PDF based on the derivative of log10. First create a function for the transformed PDF:
dgammalog10 <- function(x, shape, rate) {
return(x*log(10)*dgamma(x, shape, rate))
}
Then you can use fun=dgammalog10 where you had fun=dgamma.

How to make the prediction interval to cover the area without datapoint

I am trying to plot a prediction interval for a linear model in ggplot.
Following the instructions in other websites, I have written the following
codes:
model1 <- lm(rate ~ conc, data=origin)
temp_var <- predict(model1, interval="prediction")
new_df <- cbind(origin, temp_var)
ggplot(data = new_df, aes(x = conc, y = rate))+
geom_line(data=plot, aes(x=x, y=y, group=group)) +
geom_point(color="#2980B9", size = 0.5) +
geom_line(aes(y=lwr), color = "#2980B9", linetype = "dashed")+
geom_line(aes(y=upr), color = "#2980B9", linetype = "dashed")
Since I want to predict the values outside my data range (blue points), I
would like to extend the prediction band. But the problem is the "lwr" and
"upr" generated by predict function only provides values within my data
range, as shown in the following graph.
Is there any way to extend the prediction band?
Thank you so much for your help.

Linear model diagnostics: smoothing line obtained in ggplot2 is different from the one obtained with base plot

I am trying to reproduce the diagnostics plots for a linear regression model using ggplot2. The smoothing line that I get is different from the one obtained using base plots or ggplot2::autoplot.
library(survival)
library(ggplot2)
model <- lm(wt.loss ~ meal.cal, data=lung)
## Fitted vs. residuals using base plot:
plot(model, which=1)
## Fitted vs. residuals using ggplot
model.frame <- fortify(model)
ggplot(model.frame, aes(.fitted, .resid)) + geom_point() + geom_smooth(method="loess", se=FALSE)
The smoothing line is different, the influence of the the first few points is much larger using the loess method provided by ggplot. My question is: how can I reproduce the smoothing line obtained with plot() using ggplot2?
You can calculate the lowess, which is used to plot the red line in the original diagnostic plot, using samename base function.
smoothed <- as.data.frame(with(model.frame, lowess(x = .fitted, y = .resid)))
ggplot(model.frame, aes(.fitted, .resid)) +
theme_bw() +
geom_point(shape = 1, size = 2) +
geom_hline(yintercept = 0, linetype = "dotted", col = "grey") +
geom_path(data = smoothed, aes(x = x, y = y), col = "red")
And the original:

How can i add two shade on both end of the density distribution plot

How can i add shaded on both end like the picture below?
i want to add one end from 0 to -.995 and 1.995 to Inf
I tried solution here https://stackoverflow.com/a/4371473/3133957 but it doesn't seem to work.
here my code
tmpdata <- data.frame(vals = t.stats)
qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency") +
geom_vline(xintercept = t.statistic(precip, population.precipitation),
linetype = "dashed") +
geom_ribbon(data=subset(tmpdata,vals>-1.995 & vals<1.995),aes(ymax=max(vals),ymin=0,fill="red",alpha=0.5))
You didn't provide a dataset for your question, so I simulated one to use for this answer. First, make your density plot:
tmpdata <- data.frame(vals = rnorm(10000, mean = 0, sd = 1))
plot <- qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency")
Then, extract the x and y coordinates used by ggplot to plot your density curve:
area.data <- ggplot_build(plot)$data[[1]]
You can then add two geom_area layers to shade in the left and right tails of your curve via:
plot +
geom_area(data=area.data[which(area.data$x < -1.995),], aes(x=x, y=y), fill="skyblue") +
geom_area(data=area.data[which(area.data$x > 1.995),], aes(x=x, y=y), fill="skyblue")
This will give you the following plot:
Note that you can add your geom_vline layer after this (I left it out because it required data you did not supply in your question).

ggplot2 add offset to jitter positions

I have data that looks like this
df = data.frame(x=sample(1:5,100,replace=TRUE),y=rnorm(100),assay=sample(c('a','b'),100,replace=TRUE),project=rep(c('primary','secondary'),50))
and am producing a plot using this code
ggplot(df,aes(project,x)) + geom_violin(aes(fill=assay)) + geom_jitter(aes(shape=assay,colour=y),height=.5) + coord_flip()
which gives me this
This is 90% of the way to being what I want. But I would like it if each point was only plotted on top of the violin plot for the matching assay type. That is, the jitterred positions of the points were set such that the triangles were only ever on the upper teal violin plot and the circles in the bottom red violin plot for each project type.
Any ideas how to do this?
In order to get the desired result, it is probably best to use position_jitterdodge as this gives you the best control over the way the points are 'jittered':
ggplot(df, aes(x = project, y = x, fill = assay, shape = assay, color = y)) +
geom_violin() +
geom_jitter(position = position_jitterdodge(dodge.width = 0.9,
jitter.width = 0.5,
jitter.height = 0.2),
size = 2) +
coord_flip()
which gives:
You can use interaction between assay & project:
p <- ggplot(df,aes(x = interaction(assay, project), y=x)) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip()
The labeling can be adjusted by numeric scaled x axis:
# cbind the interaction as a numeric
df$group <- as.numeric(interaction(df$assay, df$project))
# plot
p <- ggplot(df,aes(x=group, y=x, group=cut_interval(group, n = 4))) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip() + scale_x_continuous(breaks = c(1.5, 3.5), labels = levels(df$project))

Resources