How to adjust the position of regression equation on ggplot? - r

I would like to add the regression line and R^2 to my ggplot. I am fitting the regression line to different categories and for each category I am getting a unique equation. I'd like to set the position of equations for each category manually. i.e. Finding the max expression of y for each group and printing the equation at ymax + 1.
Here is my code:
library(ggpmisc)
df <- data.frame(x = c(1:100))
df$y <- 20 * c(0, 1) + 3 * df$x + rnorm(100, sd = 40)
df$group <- factor(rep(c("A", "B"), 50))
df <- df %>% group_by(group) %>% mutate(ymax = max(y))
my.formula <- y ~ x
df %>%
group_by(group) %>%
do(tidy(lm(y ~ x, data = .)))
p <- ggplot(data = df, aes(x = x, y = y, colour = group)) +
geom_smooth(method = "lm", se=FALSE, formula = my.formula) +
stat_poly_eq(formula = my.formula,
aes(x = x , y = ymax + 1, label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE) +
geom_point()
p
Any suggestion how to do this?
Also is there any way I can only print the slope of the equation. (remove the intercept from plot)?
Thanks,

I'm pretty sure that setting adjusting stat_poly_eq() with the geom argument will get what you want. Doing so will center the equations, leaving the left half of each clipped, so we use hjust = 0 to left-adjust the equations. Finally, depending on your specific data, the equations may be overlapping each other, so we use the position argument to have ggplot attempt to separate them.
This adjusted call should get you started, I hope:
p <- ggplot(data = df, aes(x = x, y = y, colour = group)) +
geom_smooth(method = "lm", se=FALSE, formula = my.formula) +
stat_poly_eq(
formula = my.formula,
geom = "text", # or 'label'
hjust = 0, # left-adjust equations
position = position_dodge(), # in case equations now overlap
aes(x = x , y = ymax + 1, label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE) +
geom_point()
p

Related

How to plot a single regression line but colour points by a different factor in ggplot2 R?

The scatterplot is colour-coded by factor z. By default, ggplot2 also pots the regression lines by factor. I want to plot a single regression line passing through the data. How do I achiece this?
x <- c(1:50)
y <- rnorm(50,4,1)
z <- rep(c("P1", "P2"), each = 25)
df <- data.frame(x,y,z)
my.formula = y ~ x
ggplot(aes(x = x, y = y, color = z), data = df) +
geom_point() + scale_fill_manual(values=c("purple", "blue")) +
geom_smooth(method="lm", formula = y ~ x ) +
stat_poly_eq(formula = my.formula, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), parse = TRUE, size = 2.5, col = "black")+
theme_classic()
If I undertand you correctly, you can assign group = 1 in the aes to plot just one regression line. You can use the following code:
library(tidyverse)
library(ggpmisc)
my.formula = y ~ x
ggplot(aes(x = x, y = y, color = z, group = 1), data = df) +
geom_point() + scale_fill_manual(values=c("purple", "blue")) +
geom_smooth(method="lm", formula = y ~ x ) +
stat_poly_eq(formula = my.formula, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), parse = TRUE, size = 2.5, col = "black")+
theme_classic()
Output:

How to display different y labels in the equations using stat_poly_eq of ggpmisc

I'm trying to display the equations on the plot using the stat_poly_eq function of ggpmisc.
My problem is how to change the y= ... in the equation, by y1=... and y2=... by referring to the key argument.
I tried to add the eq.with.lhs argument in the mapping but it does not recognize the argument.
I tried to pass a vector to the eq.with.lhs argument but it overlapped both elements in each equation...
Do you have a better idea?
In the last case, I could use geom_text after calculating the equation coefficients myself, but it seemed to be a less efficient way to solve the problem.
Here is a reprex of my problem.
data <- data.frame(x = rnorm(20)) %>%
mutate(y1 = 1.2*x + rnorm(20, sd=0.2),
y2 = 0.9*x + rnorm(20, sd=0.3)) %>%
gather(value = value, key = key, -x)
ggplot(data, aes(x = x, y = value)) +
geom_point(aes(shape = key, colour = key)) +
stat_poly_eq(aes(label = ..eq.label.., colour = key),
formula = y ~ poly(x, 1, raw = TRUE),
eq.x.rhs = "x",
# eq.with.lhs = c(paste0(expression(y[1]), "~`=`~"),
# paste0(expression(y[2]), "~`=`~")),
eq.with.lhs = paste0(expression(y[ind]), "~`=`~"),
parse = TRUE) +
ylab(NULL)
I'm not really sure if it's possible to do it through ggpmisc, but you can change the data once the plot is built, like so:
library(tidyverse)
library(ggpmisc)
data <- data.frame(x = rnorm(20)) %>%
mutate(y1 = 1.2*x + rnorm(20, sd=0.2),
y2 = 0.9*x + rnorm(20, sd=0.3)) %>%
gather(value = value, key = key, -x)
p <- ggplot(data, aes(x = x, y = value)) +
geom_point(aes(shape = key, colour = key)) +
stat_poly_eq(aes(label = ..eq.label.., colour = key),
formula = y ~ poly(x, 1, raw = TRUE),
eq.x.rhs = "x",
eq.with.lhs = paste0(expression(y), "~`=`~"),
parse = TRUE) +
ylab(NULL)
temp <- ggplot_build(p)
temp$data[[2]]$label <- temp$data[[2]]$label %>%
fct_relabel(~ str_replace(.x, "y", paste0("y[", 1:2, "]")))
grid::grid.newpage()
grid::grid.draw(ggplot_gtable(temp))

Can I mimick facet_wrap() with 5 completely separate ggplots?

I like the neatness of using facet_wrap() or facet_grid() with ggplot since the plots are all made to be the same size and are fitted row and column wise automatically.
I have a data frame and I am experimenting with various transformations and their impact on fit as measured by R2
dm1 <- lm(price ~ x, data = diamonds)
dm1R2 <- summary(dm1)$r.squared #0.78
dm2 <- lm(log(price) ~ x, data = diamonds)
dm2R2 <- summary(dm2)$r.squared # 0.9177831
dm3 <- lm(log(price) ~ x^2, data = diamonds)
dm3R2 <- summary(dm3)$r.squared # also 0.9177831. Aside, why?
ggplot(diamonds, aes(x = x, y = price)) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_text(x = 3.5, y = 10000, label = paste0('R-Squared: ', round(dm1R2, 3)))
ggplot(diamonds, aes(x = x, y = log(price))) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_text(x = 3, y = 9, label = paste0('R-Squared: ', round(dm2R2, 3)))
ggplot(diamonds, aes(x = x^2, y = log(price))) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_text(x = 3, y = 20, label = paste0('R-Squared: ', round(dm3R2, 3)))
This produces 3 completely separate plots. Within Rmd file they will appear one after the other.
Is there a way to add them to a grid like when using facet_wrap?
You can use ggplot2's built-in faceting if you generate a "long" data frame from the regression model objects. The model object returned by lm includes the data used to fit the model, so we can extract the data and the r-squared for each model, stack them into a single data frame, and generate a faceted plot.
The disadvantage of this approach is that you lose the ability to easily set separate x-axis and y-axis titles for each panel, which is important, because the x and y values have different transformations in different panels. In an effort to mitigate that problem, I've used the model formulas as the facet labels.
Also, the reason you got the same r-squared for the models specified by log(price) ~ x and log(price) ~ x^2 is that R treats them as the same model. To tell R that you literally mean x^2 in a model formula, you need to wrap it in the I() function, making the formula log(price) ~ I(x^2). You could also do log(price) ~ poly(x, 2, raw=TRUE).
library(tidyverse)
theme_set(theme_bw(base_size=14))
# Generate a small subset of the diamonds data frame
set.seed(2)
dsub = diamonds[sample(1:nrow(diamonds), 2000), ]
dm1 <- lm(price ~ x, data = dsub)
dm2 <- lm(log(price) ~ x, data = dsub)
dm3 <- lm(log(price) ~ I(x^2), data = dsub)
# Create long data frame from the three model objects
dat = list(dm1, dm2, dm3) %>%
map_df(function(m) {
tibble(r2=summary(m)$r.squared,
form=as_label(formula(m))) %>%
cbind(m[["model"]] %>% set_names(c("price","x")))
}, .id="Model") %>%
mutate(form=factor(form, levels=unique(form)))
# Create data subset for geom_text
text.dat = dat %>% group_by(form) %>%
summarise(x = quantile(x, 1),
price = quantile(price, 0.05),
r2=r2[1])
dat %>%
ggplot(aes(x, price)) +
geom_point(alpha=0.3, colour="red") +
geom_smooth(method="lm") +
geom_text(data=text.dat, parse=TRUE,
aes(label=paste0("r^2 ==", round(r2, 2))),
hjust=1, size=3.5, colour="grey30") +
facet_wrap(~ form, scales="free")
ggarrange from the ggpubr package can do this:
p1 = ggplot(diamonds, aes(x = x, y = price)) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_text(x = 3.5, y = 10000, label = paste0('R-Squared: ', round(dm1R2, 3)))
p2 = ggplot(diamonds, aes(x = x, y = log(price))) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_text(x = 3, y = 9, label = paste0('R-Squared: ', round(dm2R2, 3)))
p3 = ggplot(diamonds, aes(x = x^2, y = log(price))) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_text(x = 3, y = 20, label = paste0('R-Squared: ', round(dm3R2, 3)))
ggpubr::ggarrange(p1, p2, p3, ncol = 2, nrow = 2, align = "hv")
Other packages that have been suggested in the comments like cowplot and patchwork also offer good options for this.

Adding to ggplot mean and SD plot

I would like to ask for the help.
I am trying to plot data and their mean and SD values on one graph. But I am getting this error
Error in eval(substitute(list(...)), _data, parent.frame()) :
object 'x' not found
First I am dividing data into intervals, and calculate mean and SD values of the intervals using summary. Than I am trying to plot data points (that part works) and add mean and SD value graph to the previous one (here I fail).
Please help me to resolve this issue.
UPD: Ok, I think I should have used stat_summary on the ss data set. Just do not know how to do that at the moment. Any suggestions would be appreciated.
Here is my code:
#Data
s <- data.frame(L5=rnorm(1686, mean=0.3, sd=1.5),
GLDAS=rnorm(1686, mean=0.25, sd=0.8))
#1 )
#Divide data into 0.02 intervals
breaks = seq(from = 0, to = max(s$GLDAS)+0.02, by = 0.02) #intervals
s$group <- cut(s$GLDAS,
breaks = breaks,
labels = seq(from = 1, to = length(breaks)-1, by = 1),
#create label
right = FALSE)
#Assign labels to a value equal to the middle of the interval
pos <- seq(from = breaks[1]+0.02/2, to = max(breaks)-0.02/2, by = 0.02)
group <- seq(from = 1, to = length(breaks)-1, by = 1)
poss <- cbind.data.frame(pos,group)
ss <- merge(s, poss, by = "group")
#Calculate summary
Summary <- ss %>% #
group_by(pos) %>% # the grouping variable
summarise(mean = mean(L5), # calculates the mean of each group
sd = sd(L5), # calculates the standard deviation of each group
n = n(), # calculates the sample size per group
SE = sd(L5)/sqrt(n())) # calculates the standard error of each group
2) #Plot data points
p2 <- ggplot()+
geom_point(data = s, aes(x = GLDAS, y = L5)) +
#geom_smooth(method = "lm", se=FALSE, color="black",
# formula = my.formula) +
stat_poly_eq(formula = my.formula, size = 4,
aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE) + geom_point()+
geom_abline(intercept=0, slope=1)+
xlim (0,0.6) + ylim(0,0.6) + labs(x="GLDAS [mm/hr]", y="L5 [mm/hr]" ) +
theme(text = element_text(size=16))
3) #plot mean and SD values
p2 + geom_line(data = Summary, aes(x=pos, y=mean), color='blue') +
geom_point(data = Summary, aes(x=pos, y=mean), color='blue')+
geom_errorbar(data = Summary, aes(ymin=mean-sd, ymax=mean+sd), width=.01,
position=position_dodge(0.005), color='blue')
I think I've got it, I did not need to use Summary, there is built in function.
p2 <- ggplot()+
geom_point(data = ss, aes(x = GLDAS, y = L5)) +
stat_poly_eq(formula = my.formula, size = 4,
aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE) + geom_point()+
geom_abline(intercept=0, slope=1)+
xlim (0,0.5) + ylim(0,0.5) + labs(x="GLDAS [mm/hr]", y="L5 [mm/hr]" ) +
theme(text = element_text(size=16))
p <- p2 + stat_summary(data = ss, aes(x = pos, y = L5),
fun.y = 'mean', fun.ymin = function(x) 0, geom = 'point',
position = 'dodge') +
stat_summary(data = ss, aes(x = pos, y = L5),
fun.y = mean,
fun.ymin = function(y) mean(y) - sd(y),
fun.ymax = function(y) mean(y) + sd(y),
color = "red",
geom ="pointrange",show.legend = FALSE)
p

R package ggpmisc: Putting hat on y in Regression Equation

I'm using R package ggpmisc. Wonder how to put hat on y in Regression Equation or how to get custom Response and Explanatory variable name in Regression Equation on graph.
library(ggplot2)
library(ggpmisc)
df <- data.frame(x1 = c(1:100))
set.seed(12345)
df$y1 <- 2 + 3 * df$x1 + rnorm(100, sd = 40)
p <- ggplot(data = df, aes(x = x1, y = y1)) +
geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
stat_poly_eq(formula = y ~ x,
aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE) +
geom_point()
p
I would turn off the default value for y that is pasted in and build your own formula. For example
ggplot(data = df, aes(x = x1, y = y1)) +
geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
stat_poly_eq(formula = y ~ x, eq.with.lhs=FALSE,
aes(label = paste("hat(italic(y))","~`=`~",..eq.label..,"~~~", ..rr.label.., sep = "")),
parse = TRUE) +
geom_point()
We use eq.with.lhs=FALSE to turn off the automatic inclusion of y= and then we paste() the hat(y) on to the front (with the equals sign). Note that the formatting comes from the ?plotmath help page.

Resources