I have produced a glm interaction plot using ggplot2. I have attached the code I have used and the plot
.
I know that the grey shaded areas represent the 95% condfidence interval, but I am wondering if there is a method to get the exact values of the grey shaded areas and therefore 95% confidence interval?
#bind data togther
Modern_EarlyHolocene<-rbind(FladenF30, FladenB30, Early_Holocene)
#Build modern vs Holocene model
Modern_EarlyHolocene<-glm(Max_Height~Age+Time_period, data=Modern_EarlyHolocene,family = gaussian)
#Produce gg interaction plot
Modern_EarlyHolocene_plot<-ggplot(data=Modern_EarlyHolocene) +
aes(x = Age, y = Max_Height, group = Time_period, color = Time_period,) +>
geom_point( alpha = .7) +
stat_smooth(method = "glm", level=0.95) +
expand_limits(y=c(0,90), x=c(0,250))
#add axis labels
Modern_EarlyHolocene_plot + labs(x = "Age (years)", y = 'Maximum height (mm)') +
theme(legend.text = element_text(size = 14, colour = "Black"),
legend.title=element_blank()) +
theme(axis.text=element_text(size=14),
axis.title=element_text(size=16,face="bold"))
You can access de plot data with layer_data(Modern_EarlyHolocene_plot, i) with i corresponding to the layer to return, in the order added to the plot
You are effectively fitting a different regression line for each Time_period, so your glm has to include an interaction term. It should be:
Modern_EarlyHolocene<-glm(Max_Height~Age*Time_period, data=Modern_EarlyHolocene)
I do not have your data, so see below for an example with iris:
fit = glm(Sepal.Width ~ Sepal.Length * Species,data=iris)
g1 = ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width,color=Species)) +
geom_point( alpha = .7) + stat_smooth(method = "glm", level=0.95)
To get the se of the predictions, you do:
pred = predict(fit,iris,se.fit = TRUE)
df_pred = data.frame(iris,pred=pred$fit,se=pred$se)
We can plot this, and the upper and lower bounds of the prediction are 1.96 * the standard error:
g2 = ggplot(df_pred,aes(x=Sepal.Length,y=Sepal.Width,color=Species)) +
geom_point( alpha = .7) +
geom_ribbon(aes(ymin=pred-1.96*se,ymax=pred+1.96*se,fill=Species),alpha=0.1)
Related
How do I plot a log linear model in R?
Currently, I am doing this but am not sure if it's the right/efficient way:
data(food)
model1 <- lm(food_exp~log(income), data = food)
temp_var <- predict(model1, interval="confidence")
new_df <- cbind(food, temp_var)
head(new_df)
ggplot(new_df, aes(x = income, y = food_exp))+
geom_point() +
geom_smooth(aes(y=lwr), color = "red", linetype = "dashed")+
geom_smooth(aes(y=upr), color = "red", linetype = "dashed")+
geom_smooth(aes(y = fit), color = "blue")+
theme_economist()
you can use geom_smooth and putting your formula directly in. It should yield the same as your fit (which you can check by also plotting that)
ggplot(new_df, aes(x = Sepal.Width, y = Sepal.Length))+
geom_point() +
geom_point(aes(y=fit), color="red") + #your original fit
geom_smooth(method=lm, formula=y~log(x)) #ggplot fit
If you don't car about extracting the parameters and just want the plot, you can plot directly in ggplot2.
Some fake data for plotting:
library(tidyverse)
set.seed(454)
income <- VGAM::rpareto(n = 100, scale = 20, shape = 2)*1000
food_exp <- rnorm(100, income*.3+.1, 3)
food <- data.frame(income, food_exp)
Now within ggplot2, use the geom_smooth function and specify that you want a linear model. Additionally, you can directly transform the income in the aes argument:
ggplot(food, aes(x = log(income), y = food_exp))+
geom_point()+
geom_smooth(method = "lm")+
theme_bw()+
labs(
title = "Log Linear Model Food Expense as a Function of Log(income)",
x = "Log(Income)",
y = "Food Expenses"
)
This will work for confidence intervals, but adding prediction intervals, you'll need to do what you did earlier with fitting the model, generating the prediction intervals.
I have a data.frame with observed success/failure outcomes per two groups along with expected probabilities:
library(dplyr)
observed.probability.df <- data.frame(group = c("A","B"), p = c(0.4,0.6))
expected.probability.df <- data.frame(group = c("A","B"), p = qlogis(c(0.45,0.55)))
observed.data.df <- do.call(rbind,lapply(c("A","B"), function(g)
data.frame(group = g, value = c(rep(0,1000*dplyr::filter(observed.probability.df, group != g)$p),rep(1,1000*dplyr::filter(observed.probability.df, group == g)$p)))
)) %>% dplyr::left_join(expected.probability.df)
observed.probability.df$group <- factor(observed.probability.df$group, levels = c("A","B"))
observed.data.df$group <- factor(observed.data.df$group, levels = c("A","B"))
I'm fitting a logistic regression (binomial glm with a logit link function) to these data with the offset term:
fit <- glm(value ~ group + offset(p), data = observed.data.df, family = binomial(link = 'logit'))
Now, I'd like to plot these data as a bar graph using ggplot2's geom_bar, color-coded by group, and to add to that the trend line and shaded standard error area estimated in fit.
I'd use stat_smooth for that but I don't think it can handle the offset term in it's formula, so looks like I need to resort to assembling this figure in an alternative way.
To get the bars and the trend line I used:
slope.est <- function(x, ests) plogis(ests[1] + ests[2] * x)
library(ggplot2)
ggplot(observed.probability.df, aes(x = group, y = p, fill = group)) +
geom_bar(stat = 'identity') +
stat_function(fun = slope.est,args=list(ests=coef(fit)),size=2,color="black") +
scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group), breaks = sort(unique(observed.probability.df$group))) +
theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")
So the question is how to add to that the shaded standard error around the trend line?
Using stat_function I am able to shade the entire area from the upper bound of the standard error all the way down to the X-axis:
ggplot(observed.probability.df, aes(x = group, y = p, fill = group)) +
geom_bar(stat = 'identity') +
stat_function(fun = slope.est,args=list(ests=coef(fit)),size=2,color="black") +
stat_function(fun = slope.est,args=list(ests=summary(fit)$coefficients[,1]+summary(fit)$coefficients[,2]),geom='area',fill="gray",alpha=0.25) +
scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group), breaks = sort(unique(observed.probability.df$group))) +
theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")
Which is close but not quite there.
Any idea how to subtract from the shaded area above the area that's below the lower bound of the standard error? Perhaps geom_ribbon is the way to go here, but I don't know how to combine it with the slope.est function
I want to overlay parameter estimates of group intercept and slope from a Bayesian analysis onto a grouped ggplot scatter-plot of actual data. I can overlay the individual lines just fine but I would really like to get a single mean line for each of the groups as well.
Here is some toy data. Three groups with differing intercepts and slopes
# data
x <- rnorm(120, 0, 1)
y <- c(20 + 3*x[1:40] + rnorm(40,0.01), rnorm(40,0.01), 10 + -3*x[81:120] + rnorm(40,0.01))
group = factor(rep(letters[1:3], each = 40))
df <- data.frame(group,x,y)
# fake parameter estimates of intercept and slope
parsDF <- data.frame(int = c(rnorm(10,20,.5), rnorm(10,0,.5), rnorm(10,10,.5)),
slope = c(rnorm(10,3,.3), rnorm(10,0,.3), rnorm(10,-3,.3)),
group = rep(letters[1:3], each = 10))
Now for the plot
ggplot(df, aes(x,y, colour = group)) +
geom_abline(data = parsDF, aes(intercept = int, slope = slope), colour = "gray75") +
geom_point() +
facet_wrap(~group)
I thought maybe I could add a single mean intercept and slope line for each group via stat.summary-type methods, like so
ggplot(df, aes(x,y, colour = group)) +
geom_abline(data = parsDF, aes(intercept = int, slope = slope), colour = "gray75") +
geom_abline(data = parsDF, aes(intercept = int, slope = slope), stat = "summary", fun.y = "mean", colour = "black", linetype = "dotted") +
geom_point() +
facet_wrap(~group)
But it just ignores those arguments and re-plots the individual lines over the existing ones.
I realise I could just calculate the mean of the intercepts and slopes for each group and brute-force that into the graph somehow but I can't see how to do that without mucking up the faceting by group, other than by creating another dataframe for mean slopes and intercepts and passing that into the plot as well. And I don't want to simply use geom_smooth() because that will use the actual data not my parameter estimates.
Any help much appreciated
Hi I have created a linear model and a regression plot - However, I would like to have the model results on the plot itself - something like the image below:
How do I show the key results on the plot? Below is my code for the plot:
library(ggplot2)
ggplot(HP_crime15, aes (x = as.numeric(HP_crime15$Theft15), y =
as.numeric(HP_crime15$X2015))) + geom_point(shape=1) +
geom_smooth(method=lm) + xlab ("Recorded number of Thefts") +
ylab("House prices (£)") + ggtitle("Title")
Ideally good questions are those that pose the problem by providing a reproducible example. Anyway, I have approached this problem in two steps;
Step 1: Determine the linear regression model;
fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
Step 2: Plot the model;
library (ggplot2)
ggplot(fit1$model, aes_string(x = names(fit1$model)[2], y = names(fit1$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
labs(title = paste("Adj R2 = ",signif(summary(fit1)$adj.r.squared, 5),
"Intercept =",signif(fit1$coef[[1]],5 ),
" Slope =",signif(fit1$coef[[2]], 5),
" P =",signif(summary(fit1)$coef[2,4], 5)))
Here is another option: instead of adding the statistics to the title, you could add a label to the plot:
library (ggplot2)
fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
ggplot(fit1$model, aes_string(x = names(fit1$model)[2], y = names(fit1$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
geom_label(aes(x = 0, y = 7.5), hjust = 0,
label = paste("Adj R2 = ",signif(summary(fit1)$adj.r.squared, 5),
"\nIntercept =",signif(fit1$coef[[1]],5 ),
" \nSlope =",signif(fit1$coef[[2]], 5),
" \nP =",signif(summary(fit1)$coef[2,4], 5)))
ggplot(data = wheatX,
aes(x = No.of.species,
y = Weight.of.weed,
color = Treatment)) +
geom_point(shape = 1) +
scale_colour_hue(l = 50) +
geom_smooth(method = glm,
se = FALSE)
This draws a straight line.
But the species number will decrease at somepoint. I want to make the line curve. How can I do it. Thanks
This is going to depend on what you mean by "smooth"
One thing you can do is apply a loess curve:
ggplot() + ... + stat_smooth(method = "loess", formula = biomass ~ numSpecies, size = 1)
Or you can manually build a polynomial model using the regular lm method:
ggplot() + ... + stat_smooth(method = "lm", formula = biomass ~ numSpecies + I(numSpecies^2), size = 1)
You'll need to figure out the exact model you want to use for the second case, hence what I originally meant by the definition of the term "smooth"