R package ggpmisc: Putting hat on y in Regression Equation - r

I'm using R package ggpmisc. Wonder how to put hat on y in Regression Equation or how to get custom Response and Explanatory variable name in Regression Equation on graph.
library(ggplot2)
library(ggpmisc)
df <- data.frame(x1 = c(1:100))
set.seed(12345)
df$y1 <- 2 + 3 * df$x1 + rnorm(100, sd = 40)
p <- ggplot(data = df, aes(x = x1, y = y1)) +
geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
stat_poly_eq(formula = y ~ x,
aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE) +
geom_point()
p

I would turn off the default value for y that is pasted in and build your own formula. For example
ggplot(data = df, aes(x = x1, y = y1)) +
geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
stat_poly_eq(formula = y ~ x, eq.with.lhs=FALSE,
aes(label = paste("hat(italic(y))","~`=`~",..eq.label..,"~~~", ..rr.label.., sep = "")),
parse = TRUE) +
geom_point()
We use eq.with.lhs=FALSE to turn off the automatic inclusion of y= and then we paste() the hat(y) on to the front (with the equals sign). Note that the formatting comes from the ?plotmath help page.

Related

How to plot a single regression line but colour points by a different factor in ggplot2 R?

The scatterplot is colour-coded by factor z. By default, ggplot2 also pots the regression lines by factor. I want to plot a single regression line passing through the data. How do I achiece this?
x <- c(1:50)
y <- rnorm(50,4,1)
z <- rep(c("P1", "P2"), each = 25)
df <- data.frame(x,y,z)
my.formula = y ~ x
ggplot(aes(x = x, y = y, color = z), data = df) +
geom_point() + scale_fill_manual(values=c("purple", "blue")) +
geom_smooth(method="lm", formula = y ~ x ) +
stat_poly_eq(formula = my.formula, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), parse = TRUE, size = 2.5, col = "black")+
theme_classic()
If I undertand you correctly, you can assign group = 1 in the aes to plot just one regression line. You can use the following code:
library(tidyverse)
library(ggpmisc)
my.formula = y ~ x
ggplot(aes(x = x, y = y, color = z, group = 1), data = df) +
geom_point() + scale_fill_manual(values=c("purple", "blue")) +
geom_smooth(method="lm", formula = y ~ x ) +
stat_poly_eq(formula = my.formula, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), parse = TRUE, size = 2.5, col = "black")+
theme_classic()
Output:

How to adjust the position of regression equation on ggplot?

I would like to add the regression line and R^2 to my ggplot. I am fitting the regression line to different categories and for each category I am getting a unique equation. I'd like to set the position of equations for each category manually. i.e. Finding the max expression of y for each group and printing the equation at ymax + 1.
Here is my code:
library(ggpmisc)
df <- data.frame(x = c(1:100))
df$y <- 20 * c(0, 1) + 3 * df$x + rnorm(100, sd = 40)
df$group <- factor(rep(c("A", "B"), 50))
df <- df %>% group_by(group) %>% mutate(ymax = max(y))
my.formula <- y ~ x
df %>%
group_by(group) %>%
do(tidy(lm(y ~ x, data = .)))
p <- ggplot(data = df, aes(x = x, y = y, colour = group)) +
geom_smooth(method = "lm", se=FALSE, formula = my.formula) +
stat_poly_eq(formula = my.formula,
aes(x = x , y = ymax + 1, label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE) +
geom_point()
p
Any suggestion how to do this?
Also is there any way I can only print the slope of the equation. (remove the intercept from plot)?
Thanks,
I'm pretty sure that setting adjusting stat_poly_eq() with the geom argument will get what you want. Doing so will center the equations, leaving the left half of each clipped, so we use hjust = 0 to left-adjust the equations. Finally, depending on your specific data, the equations may be overlapping each other, so we use the position argument to have ggplot attempt to separate them.
This adjusted call should get you started, I hope:
p <- ggplot(data = df, aes(x = x, y = y, colour = group)) +
geom_smooth(method = "lm", se=FALSE, formula = my.formula) +
stat_poly_eq(
formula = my.formula,
geom = "text", # or 'label'
hjust = 0, # left-adjust equations
position = position_dodge(), # in case equations now overlap
aes(x = x , y = ymax + 1, label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE) +
geom_point()
p

How to plot trend line with regression equation in R?

My data is like this
Date
Speed
1/2019
4500
2/2019
3400
3/2019
5300
4/2019
2000
The date is my independent variable and Speed is my Dependent variable.
I'm trying to plot the trend line with a regression equation to understand if there is an increasing trend or decreasing trend.
I try to use this code but it did not show the equation in the graph.
ggscatter(a, x = "Date", y = "Speed", add = "reg.line") +
stat_cor(label.x = 03/2019, label.y = 3700) +
stat_regline_equation(label.x = 03/2019, label.y = 3600)
#> `geom_smooth()` using formula 'y ~ x'
Example output that I want (the Correlation Equation and the Regression Equation)
Here's the method I usually use:
library(tidyverse)
library(lubridate)
library(ggpmisc)
df <- tibble::tribble(
~Date, ~Speed,
"1/2019", 4500L,
"2/2019", 3400L,
"3/2019", 5300L,
"4/2019", 2000L
)
df$Date <- lubridate::my(df$Date)
ggplot(df, aes(x = Date, y = Speed)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x, se = FALSE) +
stat_poly_eq(formula = x ~ y,
aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE)
EDIT
With the p-value:
geom_point() +
geom_smooth(method = "lm", formula = y ~ x, se = FALSE) +
stat_poly_eq(formula = x ~ y,
aes(label = paste(..eq.label.., ..rr.label.., ..p.value.label.., sep = "~~~")),
parse = TRUE)

Coefficients per facet with output.type="numeric" in ggpmisc::stat_poly_eq

ggpmisc::stat_poly_eq has an option output.type = "numeric" allowing to get the estimates of the parameters of the fitted model. Below is my attempt to use it with facet_wrap. I get a different R² per facet but the coefficients are the same in the two facets. Do I do something wrong, or is it a bug?
library(ggpmisc)
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x = x,
y = y,
group = c("A", "B"))
my.data[my.data$group=="A",]$y <- my.data[my.data$group=="A",]$y + 200000
formula <- y ~ poly(x, 1, raw = TRUE)
myformat <- "Intercept: %s\nSlope: %s\nR²: %s"
ggplot(my.data, aes(x, y)) +
facet_wrap(~ group) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_poly_eq(formula = formula, output.type = "numeric",
mapping = aes(label =
sprintf(myformat,
formatC(stat(coef.ls)[[1]][[1, "Estimate"]]),
formatC(stat(coef.ls)[[1]][[2, "Estimate"]]),
formatC(stat(r.squared)))))
Edit
We have to catch the panel number. It is strange that formatC(stat(as.integer(PANEL))) returns the panel number per facet:
but however formatC(stat(coef.ls)[[stat(as.integer(PANEL))]][[1, "Estimate"]]) does not work, because here PANEL = c(1,2).
Ok, I figured it out.
ggplot(my.data, aes(x, y)) +
facet_wrap(~ group) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_poly_eq(
formula = formula, output.type = "numeric",
mapping = aes(label =
sprintf(myformat,
c(formatC(stat(coef.ls)[[1]][[1, "Estimate"]]),
formatC(stat(coef.ls)[[2]][[1, "Estimate"]])),
c(formatC(stat(coef.ls)[[1]][[2, "Estimate"]]),
formatC(stat(coef.ls)[[2]][[2, "Estimate"]])),
formatC(stat(r.squared)))))
Version 0.3.2 of 'ggpmisc' is now in CRAN. Submitted earlier this week. In the documentation I now give some examples of the use of geom_debug() from my package 'gginnards' to have a look at the data frame returned by stats (usable with any ggplot stat or by itself). For your example, it would work like this:
library(ggpmisc)
library(gginnards)
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x = x,
y = y,
group = c("A", "B"))
my.data[my.data$group=="A",]$y <- my.data[my.data$group=="A",]$y + 200000
formula <- y ~ poly(x, 1, raw = TRUE)
myformat <- "Intercept: %s\nSlope: %s\nR²: %s"
ggplot(my.data, aes(x, y)) +
facet_wrap(~ group) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_poly_eq(formula = formula, output.type = "numeric",
aes(label = ""),
geom = "debug")
Which prints to the console, two tibbles, one for each panel:
Example below added to address comment:
ggplot(my.data, aes(x, y)) +
facet_wrap(~ group) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_poly_eq(formula = formula, output.type = "numeric",
aes(label = ""),
summary.fun = function(x) {x[["coef.ls"]][[1]]})
prints just the coefs.ls.
I added the "numeric" option recently in response to a suggestion and with this example I noticed a bug: aes(label = "") should not have been needed, but is needed because the default mapping for the label aesthetic is wrong. I will fix this for the next release.

How to add legend to geom_smooth in ggplot in R

Have a problem of adding legend to different smooth in ggplot.
library(splines)
library(ggplot2)
temp <- data.frame(x = rnorm(200, 20, 15), y = rnorm(200, 30, 8))
ggplot(data = temp, aes(x, y)) + geom_point() +
geom_smooth(method = 'lm', formula = y ~ bs(x, df=5, intercept = T), col='blue') +
geom_smooth(method = 'lm', formula = y ~ ns(x, df=2, intercept = T), col='red')
I have two splines: red and blue. How I can add a legend for them?
Put the colour in aes() and add scale_colour_manual():
ggplot(data = temp, aes(x, y)) + geom_point() +
geom_smooth(method = 'lm', formula = y ~ bs(x, df=5, intercept = T), aes(colour="A")) +
geom_smooth(method = 'lm', formula = y ~ ns(x, df=2, intercept = T), aes(colour="B")) +
scale_colour_manual(name="legend", values=c("blue", "red"))

Resources