I have a dataset where I have bacterial carriage by age group. I used the bs spline function to model bacterial carriage by age.
fit2 <- lm(percentage ~ bs(Age_midpoint, knots = c(0.177, 1, 1.460, 2.585, 5, 8.75, 37.45)), data = Complete_data)
fit2
age <-range(Complete_data$Age_midpoint)
Age_grid <- seq(from=min(age), to = max(age))
pred2 <-predict(fit2, newdata = list(Age_midpoint = Age_grid), se = T)
se_bands = with(pred2, cbind("upper" = fit+2*se.fit, "lower" = fit-2*se.fit))
I now want to plot the line that describes fit2 in the following plot:
Fig1 <- ggplot(Complete_data, aes(x= Age_midpoint, y = percentage)) + geom_point(shape=1, aes( size = denominator)) + xlab("Average age") + ylab("Pneumococcal Carriage") + theme_light()
I am able to plot the line by itself using this:`
ggplot() + geom_line(aes(x = Age_midpoint.grid, y=pred2$fit), color = "red") +
geom_ribbon(aes(x = Age_midpoint.grid, ymin=se_bands[, "lower"], ymax = se_bands[, "upper"]), alpha = 0.3) + xlim(age)
But when I try to plot the line in the scatterplot I get a warning
Fig1 + geom_line(aes(x = Age_midpoint.grid, y=pred2$fit), color = "red") +
geom_ribbon(aes(x = Age_midpoint.grid, ymin=se_bands[, "lower"], ymax = se_bands[, "upper"]), alpha = 0.3) + xlim(age)
Error in geom_line(aes(x = Age_midpoint.grid, y = pred2$fit), color = "red") :
ℹ Error occurred in the 2nd layer.
Caused by error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (403)
✖ Fix the following mappings: `x` and `y`
How can I plot the line in my scatterplot?
I looked online for solution, but I only found information about histograms.
Related
I have a data.frame with observed success/failure outcomes per two groups along with expected probabilities:
library(dplyr)
observed.probability.df <- data.frame(group = c("A","B"), p = c(0.4,0.6))
expected.probability.df <- data.frame(group = c("A","B"), p = qlogis(c(0.45,0.55)))
observed.data.df <- do.call(rbind,lapply(c("A","B"), function(g)
data.frame(group = g, value = c(rep(0,1000*dplyr::filter(observed.probability.df, group != g)$p),rep(1,1000*dplyr::filter(observed.probability.df, group == g)$p)))
)) %>% dplyr::left_join(expected.probability.df)
observed.probability.df$group <- factor(observed.probability.df$group, levels = c("A","B"))
observed.data.df$group <- factor(observed.data.df$group, levels = c("A","B"))
I'm fitting a logistic regression (binomial glm with a logit link function) to these data with the offset term:
fit <- glm(value ~ group + offset(p), data = observed.data.df, family = binomial(link = 'logit'))
Now, I'd like to plot these data as a bar graph using ggplot2's geom_bar, color-coded by group, and to add to that the trend line and shaded standard error area estimated in fit.
I'd use stat_smooth for that but I don't think it can handle the offset term in it's formula, so looks like I need to resort to assembling this figure in an alternative way.
To get the bars and the trend line I used:
slope.est <- function(x, ests) plogis(ests[1] + ests[2] * x)
library(ggplot2)
ggplot(observed.probability.df, aes(x = group, y = p, fill = group)) +
geom_bar(stat = 'identity') +
stat_function(fun = slope.est,args=list(ests=coef(fit)),size=2,color="black") +
scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group), breaks = sort(unique(observed.probability.df$group))) +
theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")
So the question is how to add to that the shaded standard error around the trend line?
Using stat_function I am able to shade the entire area from the upper bound of the standard error all the way down to the X-axis:
ggplot(observed.probability.df, aes(x = group, y = p, fill = group)) +
geom_bar(stat = 'identity') +
stat_function(fun = slope.est,args=list(ests=coef(fit)),size=2,color="black") +
stat_function(fun = slope.est,args=list(ests=summary(fit)$coefficients[,1]+summary(fit)$coefficients[,2]),geom='area',fill="gray",alpha=0.25) +
scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group), breaks = sort(unique(observed.probability.df$group))) +
theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")
Which is close but not quite there.
Any idea how to subtract from the shaded area above the area that's below the lower bound of the standard error? Perhaps geom_ribbon is the way to go here, but I don't know how to combine it with the slope.est function
I have produced a glm interaction plot using ggplot2. I have attached the code I have used and the plot
.
I know that the grey shaded areas represent the 95% condfidence interval, but I am wondering if there is a method to get the exact values of the grey shaded areas and therefore 95% confidence interval?
#bind data togther
Modern_EarlyHolocene<-rbind(FladenF30, FladenB30, Early_Holocene)
#Build modern vs Holocene model
Modern_EarlyHolocene<-glm(Max_Height~Age+Time_period, data=Modern_EarlyHolocene,family = gaussian)
#Produce gg interaction plot
Modern_EarlyHolocene_plot<-ggplot(data=Modern_EarlyHolocene) +
aes(x = Age, y = Max_Height, group = Time_period, color = Time_period,) +>
geom_point( alpha = .7) +
stat_smooth(method = "glm", level=0.95) +
expand_limits(y=c(0,90), x=c(0,250))
#add axis labels
Modern_EarlyHolocene_plot + labs(x = "Age (years)", y = 'Maximum height (mm)') +
theme(legend.text = element_text(size = 14, colour = "Black"),
legend.title=element_blank()) +
theme(axis.text=element_text(size=14),
axis.title=element_text(size=16,face="bold"))
You can access de plot data with layer_data(Modern_EarlyHolocene_plot, i) with i corresponding to the layer to return, in the order added to the plot
You are effectively fitting a different regression line for each Time_period, so your glm has to include an interaction term. It should be:
Modern_EarlyHolocene<-glm(Max_Height~Age*Time_period, data=Modern_EarlyHolocene)
I do not have your data, so see below for an example with iris:
fit = glm(Sepal.Width ~ Sepal.Length * Species,data=iris)
g1 = ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width,color=Species)) +
geom_point( alpha = .7) + stat_smooth(method = "glm", level=0.95)
To get the se of the predictions, you do:
pred = predict(fit,iris,se.fit = TRUE)
df_pred = data.frame(iris,pred=pred$fit,se=pred$se)
We can plot this, and the upper and lower bounds of the prediction are 1.96 * the standard error:
g2 = ggplot(df_pred,aes(x=Sepal.Length,y=Sepal.Width,color=Species)) +
geom_point( alpha = .7) +
geom_ribbon(aes(ymin=pred-1.96*se,ymax=pred+1.96*se,fill=Species),alpha=0.1)
I have a question about ggplot2.
I want to connect data point with ols result via vertical line, like the code listed below.
Can I transfer ..y.., the value calculated by stat_smooth, to geom_linerange directly?
I tried stat_smooth(..., geom = "linerange", mapping(aes(ymin=pmin(myy, ..y..), ymax=pmax(myy,..y..)) but it is not the result I want.
library(ggplot2)
df <- data.frame(myx = 1:10,
myy = c(1:10) * 5 + 2 * rnorm(10, 0, 1))
lm.fit <- lm("myy~myx", data = df)
pred <- predict(lm.fit)
ggplot(df, aes(myx, myy)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_linerange(mapping = aes(ymin = pmin(myy, pred),
ymax = pmax(myy, pred)))
stat_smooth evaluates the values at n evenly spaced points, with n = 80 by default. These points may not coincide with the original x values in your data frame.
Since you are calculating predicted values anyway, it would probably be more straightforward to add that back to your data frame and plot all geom layers based on that as your data source, for example:
df$pred <- pred
ggplot(df, aes(myx, myy)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_linerange(aes(ymin = myy, ymax = pred))
I've tried searching for a solution to this seemingly easy problem, but to no avail. All I'm trying to do is plot a line in ggplot and its standard deviation around the line. However, I keep recovering this error:
Error: Discrete value supplied to continuous scale
My data frame plotdata is as follows:
sites Spoly Spolylower Spolyupper
526.790 0.03018671 0.1196077 0.1196077
1538.512 0.04106053 0.1429613 0.1429613
2540.500 0.02896953 0.1127456 0.1127456
3541.000 0.03560484 0.1200609 0.1200609
4560.143 0.06038193 0.1564464 0.1564464
5569.831 0.03608714 0.1296704 0.1296704
I can plot just the line perfectly fine:
ggplot(data = plotdata, aes(x = "Sites", y = "Mean Values")) +
geom_line(aes(x = sites, y = Spoly), color = "steelblue")
But when I try to add the ribbon, I get the error:
ggplot(data = plotdata, aes(x = "Sites", y = "Mean Values")) +
geom_line(aes(x = sites, y = Spoly), color = "steelblue") +
geom_ribbon(aes(x = sites, ymin = Spolylower, ymax = Spolyupper), alpha = 0.3)
Error: Discrete value supplied to continuous scale
What is going on? What am I doing wrong here?
one option is:
library(ggplot2)
library(cowplot)
data <- "
sites Spoly Spolylower Spolyupper
526.790 0.03018671 0.1196077 0.1196077
1538.512 0.04106053 0.1429613 0.1429613
2540.500 0.02896953 0.1127456 0.1127456
3541.000 0.03560484 0.1200609 0.1200609
4560.143 0.06038193 0.1564464 0.1564464
5569.831 0.03608714 0.1296704 0.1296704
"
dat <- read.table(text = data, header = TRUE)
#change Spolylower value (currently Spolylower= Spolyupper)
dat$Spolylower <- dat$Spolylower - .2
ggplot(data = dat, aes(x = sites, y = Spoly)) +
geom_line(color = "steelblue") +
geom_ribbon(aes(ymin = Spolylower, ymax = Spolyupper), alpha = 0.3) +
theme_cowplot()
I think you should try this:
ggplot(data = plotdata, aes(x = "Sites", y = "Mean Values")) +
geom_line(aes(x = sites, y = Spoly), color = "steelblue") +
geom_ribbon(aes(ymin = plotdata$Spolylower, ymax = plotdata$Spolyupper),fill="dimgray", alpha = 0.1)
let me know if it works
Hi I have created a linear model and a regression plot - However, I would like to have the model results on the plot itself - something like the image below:
How do I show the key results on the plot? Below is my code for the plot:
library(ggplot2)
ggplot(HP_crime15, aes (x = as.numeric(HP_crime15$Theft15), y =
as.numeric(HP_crime15$X2015))) + geom_point(shape=1) +
geom_smooth(method=lm) + xlab ("Recorded number of Thefts") +
ylab("House prices (£)") + ggtitle("Title")
Ideally good questions are those that pose the problem by providing a reproducible example. Anyway, I have approached this problem in two steps;
Step 1: Determine the linear regression model;
fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
Step 2: Plot the model;
library (ggplot2)
ggplot(fit1$model, aes_string(x = names(fit1$model)[2], y = names(fit1$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
labs(title = paste("Adj R2 = ",signif(summary(fit1)$adj.r.squared, 5),
"Intercept =",signif(fit1$coef[[1]],5 ),
" Slope =",signif(fit1$coef[[2]], 5),
" P =",signif(summary(fit1)$coef[2,4], 5)))
Here is another option: instead of adding the statistics to the title, you could add a label to the plot:
library (ggplot2)
fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
ggplot(fit1$model, aes_string(x = names(fit1$model)[2], y = names(fit1$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
geom_label(aes(x = 0, y = 7.5), hjust = 0,
label = paste("Adj R2 = ",signif(summary(fit1)$adj.r.squared, 5),
"\nIntercept =",signif(fit1$coef[[1]],5 ),
" \nSlope =",signif(fit1$coef[[2]], 5),
" \nP =",signif(summary(fit1)$coef[2,4], 5)))