Confidence interval over a normal distribution plot - r

I want to plot vertical lines on the x position for the confidence interval. I did the statistics, but I cannot find a way to add it to the plot. Please follow this MWE:
xseq<-seq(-4,4,.01)
densities<-dnorm(xseq, 0,1)
par(mfrow=c(1,3), mar=c(3,4,4,2))
plot(xseq, densities, col="darkgreen",xlab="", ylab="Densidade", type="l",lwd=2, cex=2, main="Normal", cex.axis=.8)
Generates:
The ci is:
x<-t.test(xseq, conf.level = 0.95)$conf.int
But when I try to plot the line with:
line(x[1], x[2])
It gives me the error:
Error in structure(.Call(C_tukeyline, as.double(xy$x[ok]), as.double(xy$y[ok]), :
insufficient observations
After comments pointing out abline() it works:
I am, however, incorrect to think that t.test will give cis for a normal distribution.
What am I doing wrong?

Using ggplot2:
ggplot(data = df, aes(x = xseq, y = densities)) +
geom_point() +
geom_vline(xintercept = c(x[1], x[2]))
With proper confidence intervals:
ggplot(data = df, aes(x = xseq, y = densities)) +
geom_point() +
geom_vline(xintercept = c(x2[1], x2[2]))
Sample data:
df <- data.frame(xseq = seq(-4,4,.01),
densities = dnorm(xseq, 0,1))
x <- t.test(xseq, conf.level = 0.95)$conf.int
x2 <- qnorm(c(0.05, 0.95), mean = mean(xseq), sd = sd(xseq))

Related

How can I add confidence intervals to a scatterplot for a regression on two variables?

I need to create an insightful graphic with a regression line, data points, and confidence intervals. I am not looking for smoothed lines. I have tried multiple codes, but I just can't get it right.
I am looking for something like this:
Some codes I have tried:
p <- scatterplot(df.regsoft$w ~ df.regsoft$b,
data = df.regsoft,
boxplots = FALSE,
regLine = list(method=lm, col="red"),
pch = 16,
cex = 0.7,
xlab = "Fitted Values",
ylab = "Residuals",
legend = TRUE,
smooth = FALSE)
abline(coef = confint.lm(result.rs))
But this doesn't create what I want to create, however it is closest to what I intended. Notice that I took out "smooth" since this is not really what I am looking for.
How can I make this plot interactive?
If you don't mind switch to ggplot and the tidyverse, then this is simply a geom_smooth(method = "lm"):
library(tidyverse)
d <- tibble( #random stuff
x = rnorm(100, 0, 1),
y = 0.25 * x + rnorm(100, 0, 0.25)
)
m <- lm(y ~ x, data = d) #linear model
d %>%
ggplot() +
aes(x, y) + #what to plot
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
without method = "lm" it draws a smoothed line.
As for the Conf. interval (Obs 95%) lines, it seems to me that's simply a quantile regression. In that case, you can use the quantreg package.
If you want to make it interactive, you can use the plotly package:
library(plotly)
p <- d %>%
ggplot() +
aes(x, y) +
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
ggplotly(p)
================================================
P.S.
I am not completely sure this is what the figure you posted is showing (I guess so), but to add the quantile lines, I would just perform two quantile regressions (upper and lower) and then calculate the values of the quantile lines for your data:
library(tidyverse)
library(quantreg)
d <- tibble( #random stuff
x = rnorm(100, 0, 1),
y = 0.25 * x + rnorm(100, 0, 0.25)
)
m <- lm(y ~ x, data = d) #linear model
# 95% quantile, two tailed
rq_low <- rq(y ~ x, data = d, tau = 0.025) #lower quantile
rq_high <- rq(y ~ x, data = d, tau = 0.975) #upper quantile
d %>%
mutate(low = rq_low$coefficients[1] + x * rq_low$coefficients[2],
high = rq_high$coefficients[1] + x * rq_high$coefficients[2]) %>%
ggplot() +
geom_point(aes(x, y)) +
geom_smooth(aes(x, y), method = "lm") +
geom_line(aes(x, low), linetype = "dashed") +
geom_line(aes(x, high), linetype = "dashed") +
theme_bw()

How to Add a Legend to a ggplot without plotting the raw data?

I have made a plot of a polynomial function: y = x^2 - 6*x + 9
with a series of several points in a sequence + minor standard error in y. I used these points to construct a spline model for that function from the raw data points, and then I calculated the derivative from the spline model with R's predict() function and then I added both of the spline curves to the plot.
By the way, the expected derivative function is this: dy / dx = 2*x - 6
The original function I colored blue and the 1st derivative function I colored red. I wish to add legends to these plots, but I'm finding that difficult since I did not assign any points to the plots, as I declared the data-frames within the geom_smooth() functions.
The code I'm using is this:
library(ggplot2)
# Plot the function: f(x) = x^2 - 6x + 9
# with a smooth spline:
# And then the deriviative of that function from predicted values of the
# smoothed spline: f ' (x) = 2*x - 6
# Get a large sequence of x-values:
x <- seq(from = -10, to = 10, by = 0.01)
# The y-values are a function of each x value.
y <- x^2 - 6*x + 9 + rnorm(length(x), 0, 0.5)
# Fit the curve to a model which is a smoothed spine.
model <- smooth.spline(x = x, y = y)
# Predict the 1st derivative of this smoothed spline.
f_x <- predict(model, x = seq(from = min(x), to = max(x), by = 1), deriv = 1)
# Plot the smoothed spline of the original function and the derivative with respect to x.
p <- ggplot() + theme_bw() + geom_smooth(data = data.frame(x,y), aes(x = x, y = y), method = "loess", col = "blue", se = TRUE) + geom_smooth(data = data.frame(f_x$x, f_x$y), aes(x = f_x$x, y = f_x$y), method = "loess", col = "red", se = TRUE)
# Set the bounds of the plot.
p <- p + scale_x_continuous(breaks = scales::pretty_breaks(n = 20), limits = c(-5, 10)) + scale_y_continuous(breaks = scales::pretty_breaks(n = 20), limits = c(-10, 10))
# Add some axis labels
p <- p + labs(x = "x-axis", y = "y-axis", title = "Original Function and predicted derivative function")
p <- p + scale_fill_manual(values = c("blue", "red"), labels = c("Original Function", "Derivative Function with respect to x"))
print(p)
I was hoping that I could add the legend with scale_fill_manual(), but my attempt does not add a legend to the plot. Essentially, the plot I get generally looks like this, minus the messy legend that I added in paint. I would like that legend, thank you.
I did this because I want to show to my chemistry instructor that I can accurately measure the heat capacity just from the points from differential scanning calorimetry data for which I believe the heat capacity is just the first derivative plot of heat flow vs Temperature differentiated with respect to temperature.
So I tried to make a plot showing the original function overlayed with the 1st derivative function with respect to x, showing that the plot of the first derivative made only from a spline curve fitted to raw data points reliably produces the expected line dy / dx = 2 * x - 6, which it does.
I just want to add that legend.
Creating a data frame with you data and use color within aesthetics is the most common way of doing this.
df <- rbind(
data.frame(data='f(x)', x=x, y=y),
data.frame(data='f`(x)', x=f_x$x, y=f_x$y))
p <- ggplot(df, aes(x,y, color=data)) + geom_smooth(method = 'loess')
p <- p + scale_x_continuous(breaks = scales::pretty_breaks(n = 20), limits = c(-5, 10)) + scale_y_continuous(breaks = scales::pretty_breaks(n = 20), limits = c(-10, 10))
p <- p + labs(x = "x-axis", y = "y-axis", title = "Original Function and predicted derivative function")
p <- p + scale_color_manual(name = "Functions", values = c("blue", "red"), labels = c("Original Function", "Derivative Function with respect to x"))
print(p)

Having several fits in one plot (in R)

I was wondering how I can modify the following code to have a plot something like
data(airquality)
library(quantreg)
library(ggplot2)
library(data.table)
library(devtools)
# source Quantile LOESS
source("https://www.r-statistics.com/wp-content/uploads/2010/04/Quantile.loess_.r.txt")
airquality2 <- na.omit(airquality[ , c(1, 4)])
#'' quantreg::rq
rq_fit <- rq(Ozone ~ Temp, 0.95, airquality2)
rq_fit_df <- data.table(t(coef(rq_fit)))
names(rq_fit_df) <- c("intercept", "slope")
#'' quantreg::lprq
lprq_fit <- lapply(1:3, function(bw){
fit <- lprq(airquality2$Temp, airquality2$Ozone, h = bw, tau = 0.95)
return(data.table(x = fit$xx, y = fit$fv, bw = paste0("bw=", bw), fit = "quantreg::lprq"))
})
#'' Quantile LOESS
ql_fit <- Quantile.loess(airquality2$Ozone, jitter(airquality2$Temp), window.size = 10,
the.quant = .95, window.alignment = c("center"))
ql_fit_df <- data.table(x = ql_fit$x, y = ql_fit$y.loess, bw = "bw=1", fit = "Quantile LOESS")
I want to have all these fits in a plot.
geom_quantile can calculate quantiles using the rq method internally, so we don't need to create the rq_fit_df separately. However, the lprq and Quantile LOESS methods aren't available within geom_quantile, so I've used the data frames you provided and plotted them using geom_line.
In addition, to include the rq line in the color and linetype mappings and in the legend we add aes(colour="rq", linetype="rq") as a sort of "artificial" mapping inside geom_quantile.
library(dplyr) # For bind_rows()
ggplot(airquality2, aes(Temp, Ozone)) +
geom_point() +
geom_quantile(quantiles=0.95, formula=y ~ x, aes(colour="rq", linetype="rq")) +
geom_line(data=bind_rows(lprq_fit, ql_fit_df),
aes(x, y, colour=paste0(gsub("q.*:","",fit),": ", bw),
linetype=paste0(gsub("q.*:","",fit),": ", bw))) +
theme_bw() +
scale_linetype_manual(values=c(2,4,5,1,1)) +
labs(colour="Method", linetype="Method",
title="Different methods of estimating the 95th percentile by quantile regression")

How do I add confidence intervals to glm model in ggplot?

Here is an example of what my data looks like:
DATA <- data.frame(
TotalAbund = sample(1:10),
TotalHab = sample(0:1),
TotalInv = sample(c("yes", "no"), 20, replace = TRUE)
)
DATA$TotalHab<-as.factor(DATA$TotalHab)
DATA
Here is my model:
MOD.1<-glm(TotalAbund~TotalInv+TotalHab, family=quasipoisson, data=DATA)
Here is my plot:
NEWDATA <- with(DATA,
expand.grid(TotalInv=unique(TotalInv),
TotalHab=unique(TotalHab)))
pred <- predict(MOD.1,newdata= NEWDATA,se.fit=TRUE)
gg1 <- ggplot(NEWDATA, aes(x=factor(TotalHab), y=TotalAbund,colour=TotalInv))
I get the following error...
Error in eval(expr, envir, enclos) : object 'TotalAbund' not found
...when trying to run the last line of code:
gg1 + geom_point(data=pframe,size=8,shape=17,alpha=0.7,
position=position_dodge(width=0.75))
Can anyone help? Also how do I add 95% confidence intervals to my points? Thanks.
You will need to calculate the 95% confidence intervals yourself. You were on the right track using predict and asking for the se.fit. We will first ask for the predictions on the link scale, calculate 95% confidence intervals, and then transform them to the real scale for plotting. Here is a convenience function to calculate your CI's for the log link (which you used in the model).
# get your prediction
pred <- predict(MOD.1,newdata= NEWDATA,se.fit=TRUE,
type = "link")
# CI function
make_ci <- function(pred, data){
# fit, lower, and upper CI
fit <- pred$fit
lower <- fit - 1.96*pred$se.fit
upper <- fit + 1.96*pred$se.fit
return(data.frame(exp(fit), exp(lower), exp(upper), data))
}
my_pred <- make_ci(pred, NEWDATA)
# to be used in geom_errorbar
limits <- aes(x = factor(TotalHab), ymax = my_pred$exp.upper., ymin = my_pred$exp.lower.,
group = TotalInv)
Then we plot it out, I will leave the final tweaking to you to make the figure out how you want it to.
ggplot(my_pred, aes(x = factor(TotalHab), y = exp.fit., color = TotalInv))+
geom_errorbar(limits, position = position_dodge(width = 0.75),
color = "black")+
geom_point(size = 8, position = position_dodge(width = 0.75), shape = 16)+
ylim(c(0,15))+
geom_point(data = DATA, aes(x = factor(TotalHab), y = TotalAbund, colour = TotalInv),
size = 8, shape = 17, alpha = 0.7,
position = position_dodge(width = 0.75))

geom_abline for logistic regression (ggplot2)

I am sorry if this question is very simple, however, I could not find any solution to my problem. I want to plot logistic regressions lines with ggplot2. The problem is that I cannot use geom_abline because I dont have the original model, just the slope and intercept for each regression line. I have use this approach for linear regressions, and this works fine with geom_abline, because you can just give multiple slopes and intercepts to the function.
geom_abline(data = estimates, aes(intercept = inter, slope = slo)
where inter and slo are vectors with more then one value.
If I try the same approach with coefficients from a logistic regression, I will get the wrong regression lines (linear). I am trying to use geom_line, however, I cannot use the function predict to generate the predicted values because I dont have the a original model objetc.
Any suggestion?
Thanks in advance,
Gustavo
If the model had a logit link then you could plot the prediction using only the intercept (coefs[1]) and slope (coefs[2]) as:
library(ggplot2)
n <- 100L
x <- rnorm(n, 2.0, 0.5)
y <- factor(rbinom(n, 1L, plogis(-0.6 + 1.0 * x)))
mod <- glm(y ~ x, binomial("logit"))
coefs <- coef(mod)
x_plot <- seq(-5.0, 5.0, by = 0.1)
y_plot <- plogis(coefs[1] + coefs[2] * x_plot)
plot_data <- data.frame(x_plot, y_plot)
ggplot(plot_data) + geom_line(aes(x_plot, y_plot), col = "red") +
xlab("x") + ylab("p(y | x)") +
scale_y_continuous(limits = c(0, 1)) + theme_bw()
Edit
Here one way of plotting k predicted probability lines on the same graph following from the previous code:
library(reshape2)
k <- 5L
intercepts <- rnorm(k, coefs[1], 0.5)
slopes <- rnorm(k, coefs[2], 0.5)
x_plot <- seq(-5.0, 5.0, by = 0.1)
model_predictions <- sapply(1:k, function(idx) {
plogis(intercepts[idx] + slopes[idx] * x_plot)
})
colnames(model_predictions) <- 1:k
plot_data <- as.data.frame(cbind(x_plot, model_predictions))
plot_data_melted <- melt(plot_data, id.vars = "x_plot", variable.name = "model",
value.name = "y_plot")
ggplot(plot_data_melted) + geom_line(aes(x_plot, y_plot, col = model)) +
xlab("x") + ylab("p(y | x)") +
scale_y_continuous(limits = c(0, 1)) + theme_bw()

Resources