Receive the equation of the stat_smooth in ggplot2 R mtcars example - r

Hi I would like to know how can I retrieve the equation of stat_smooth either in the ggplot2 or in a vector or somewhere else. the code that I am using is:
p <- ggplot(data = mtcars, aes(x = disp, y = drat))
p <- p + geom_point() + stat_smooth(method="loess")
p
Thanks

The ggpmisc package can be very usefull. However, it will not work with loess as loess doesn't give a formula. See here: Loess Fit and Resulting Equation
library(ggplot2)
library(ggpmisc)
p <- ggplot(data = mtcars, aes(x = disp, y = drat)) +
geom_point() +
geom_smooth(method="lm", formula=y~x) +
stat_poly_eq(parse=T, aes(label = ..eq.label..), formula=y~x)
p

Related

Shorthand functions for multiple geoms in ggplot2

I would like to create shorthand notations or functions that combines multiple geoms for ggplot.
For example, instead of
mtcars %>%
ggplot(aes(x = cyl, y = mpg)) +
geom_point() +
geom_smooth(method = "lm") +
ggpubr::stat_cor()
I would like to be able to create a function to combine the geoms like so
lm_and_cor <- function() {
geom_smooth(method = "lm", se = FALSE) +
stat_cor()
}
mtcars %>%
ggplot(aes(x = cyl, y = mpg)) +
geom_point() +
lm_and_cor()
I am aware that I can create functions that does all of the plotting, basically
plot_data <- function(x) {
x %>%
ggplot(aes(x = cyl, y = mpg)) +
geom_point() +
geom_smooth(method = "lm") +
ggpubr::stat_cor()
}
which to be fair does what I want, to some degree. However, I would instead like to combine multiple geoms in a single function, as the underlying geom (e.g. point, lines, etc.) will not always be the same. Is this doable, and is it feasible?
With ggplot2 you can use list of elements:
lm_and_cor <- function()
list(geom_smooth(method = "lm", se = FALSE),
ggpubr::stat_cor()
)
mtcars %>%
ggplot(aes(x = cyl, y = mpg)) +
geom_point() +
lm_and_cor()
Output:
Do you mean something like this?
You can store multiple geom in a list object.
Edit: I misunderstand the question. This should meet the expectation.
data(iris)
library(ggplot2)
x <- list(geom_point(), geom_line())
ggplot(iris, aes(Sepal.Length, Sepal.Width)) + x
Or if you want to make a function to plot by column use this {{variable}}.
library(dplyr)
plotting <- function(data, x, y){
data %>%
ggplot(aes({{x}}, {{y}})) +
geom_point() +
geom_smooth(method = "lm")}
plotting(iris, Sepal.Length, Sepal.Width)

Is there a ggplot2 function that can plot the straight line inferred from a linear model that came from a call to lm()?

With base R's plotting, I can generate a plot of a new linear model that I just generated with lm very quickly:
with(mtcars,{
plot(wt,mpg)
abline(lm(mpg~wt))})
Is there a ggplot2 equivalent? I tried geom_abline but nothing that I've seen suggests that it works on forumla objects.
To be clear, I do not want to have to extract the coefficients from the linear model and pass them to ggplot. I also do not want to generate the linear model with a ggplot2 function. I want exactly what base R has - the ability to pass linear model objects to a plotting function in a way that displays its corresponding straight line.
If we're allowed to do stuff but just not the stuff you listed, then I'd go with stat_function:
m <- lm( mpg~wt, data=mtcars )
ggplot( mtcars, aes(x=wt, y=mpg) ) +
geom_point() +
stat_function( fun=function(x)predict(m, newdata=data.frame(wt=x)), color="red" )
You could do:
ggplot() +
geom_point(data = mtcars, aes(wt, mpg)) +
geom_abline(data = lm(mpg ~ wt, data = mtcars) %>% broom::tidy(),
aes(intercept = estimate[1], slope = estimate[2]))
or:
ggplot() +
geom_point(data = mtcars, aes(wt, mpg)) +
geom_abline(data = coef(lm(mpg ~ wt, data = mtcars)) %>% t() %>% data.frame(),
aes(intercept = X.Intercept., slope = wt))
The function geom_smooth can do something similar to what you're saying, although I don't think you can pass the object directly inside there:
ggplot(mtcars) +
geom_point(aes(wt, mpg)) +
geom_smooth(method = lm, aes(wt, mpg))

R: plotting geom_line() of lm() prediction values and geometric smooth do not coincide

I have the following data
df <- data.frame(x= c(0,1,10,100,1000,0,1, 10,100,1000,0,1,10,100,1000),
y=c(7,15,135,1132,6459,-3,11,127,1120,6249,-5,13,126,1208,6208))
After making a linear model using the data, I used the model to predict y values from know x values. Stored the predicted y values in a data frame "pred.fits"
fit <- lm(data = df, y ~ x)
pred.fits <- expand.grid(x=seq(1, 2000, length=2001))
pm <- predict(fit, newdata=pred.fits, interval="confidence")
pred.fits$py <- pm[,1]
I plot the data and use both geom_smooth() and geom_line(), they seem to be quite coincident.
ggplot(df, aes(x=x, y=y)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
geom_line(data=pred.fits, aes(x=x, y=py), size=.2)
However, when I plot the same data, with setting the axes in log scale the two regressions differs drastically.
ggplot(df, aes(x=x, y=y)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
geom_line(data=pred.fits, aes(x=x, y=py), size=.2) +
scale_x_log10() +
scale_y_log10()
Am I missing something here?
UPDATE
After #Duck pointed me to correct direction, I was able to get it right. The issue was, I wanted the data to be untransformed, but the axes transformed to log10 scale. This is how I was able to do it.
df2 <- df[df$x>=1,] # remove annoying warning msgs.
fit2 <- lm(data = df2, log10(y) ~ log10(x))
pred.fits2 <- expand.grid(x=seq(10^0, 10^3 , length=200))
pm2 <- predict(fit2, newdata=pred.fits2, interval="confidence")
pred.fits2$py <- 10^pm2[,1] # convert the predicted y values to linear scale
ggplot(df2, aes(x=x, y=y)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
geom_line(data=pred.fits2, aes(x=x, y=py), size=1.5, linetype = "longdash") +
scale_x_log10() +
scale_y_log10()
Thanks everyone for your help.
This code can be useful for your understanding (Thanks to #BWilliams for the valious comment). You want x and y in log scale so if mixing a linear model with different scales can mess everything. If you want to see similar scales it is better if you train a different model with log variables and then plot it also using the proper values. Here an approach where we build a log-log model and then plot (data values as ones or negative have been isolated in a new dataframe df2). Here the code:
First linear model:
library(ggplot2)
#Data
df <- data.frame(x= c(0,1,10,100,1000,0,1, 10,100,1000,0,1,10,100,1000),
y=c(7,15,135,1132,6459,-3,11,127,1120,6249,-5,13,126,1208,6208))
#Model 1 all obs
fit <- lm(data = df, y ~ x)
pred.fits <- expand.grid(x=seq(1, 2000, length=2001))
pm <- predict(fit, newdata=pred.fits, interval="confidence")
pred.fits$py <- pm[,1]
#Plot 1
ggplot(df, aes(x=x, y=y)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
geom_line(data=pred.fits, aes(x=x, y=py), size=.2)
Output:
Now the sketch for log variables, notice how we use log() across main variables and also how the model is build:
#First remove issue values
df2 <- df[df$x>1,]
#Train a new model
pred.fits2 <- expand.grid(x=seq(1, 2000, length=2001))
fit2 <- lm(data = df2, log(y) ~ log(x))
pm2 <- predict(fit2, newdata=pred.fits2, interval="confidence")
pred.fits2$py <- pm2[,1]
#Plot 2
ggplot(df2, aes(x=log(x), y=log(y))) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
geom_line(data=pred.fits2, aes(x=log(x), y=py), size=.2)
Output:

Only display correlation coefficient in ggplot stat_cor

How do you only display the correlation coefficient in ggpubr::stat_cor, and not the p-value? There doesn't seem to be an argument within stat_cor to specify only one statistic or the other. Is there some other creative work-around?
Following Ben's solution - I including a reproducible example.
First let's use a simple example:
library('ggplot2')
library('ggpubr')
data(iris)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width )) +
geom_point() + theme_bw() +
stat_cor(method = "pearson")
Now, you wanted to display only the correlation coefficient, meaning R.
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width )) +
geom_point() + theme_bw() +
stat_cor(method = "pearson", aes(label = ..r.label..))
Actually you can also calculate the R-value independently, and ad a textbox in the plot using annotate(example from iris database):
round(cor(iris$Sepal.Length, iris$Sepal.Width),2)

ggplot using purrr map() to plot same x with multiple y's

I want to create multiple plots that have the same x but different y's using purrr package methodology. That is, I would like to use the map() or walk() functions to perform this.
Using mtcars dataset for simplicity.
ggplot(data = mtcars, aes(x = hp, y = mpg)) + geom_point()
ggplot(data = mtcars, aes(x = hp, y = cyl)) + geom_point()
ggplot(data = mtcars, aes(x = hp, y = disp)) + geom_point()
edit
So far I have tried
y <- list("mpg", "cyl", "disp")
mtcars %>% map(y, ggplot(., aes(hp, y)) + geom_point()
This is one possibility
ys <- c("mpg","cyl","disp")
ys %>% map(function(y)
ggplot(mtcars, aes(hp)) + geom_point(aes_string(y=y)))
It's just like any other map function, you just need to configure your aesthetics properly in the function.
I've made a bit more general function for this, because it's part of EDA protocol (Zuur et al., 2010). This article from Ariel Muldoon helped me.
plotlist <- function(data, resp, efflist) {
require(ggplot2)
require(purrr)
y <- enquo(resp)
map(efflist, function(x)
ggplot(data, aes(!!sym(x), !!y)) +
geom_point(alpha = 0.25, color = "darkgreen") +
ylab(NULL)
)
}
where:
data is your dataframe
resp is response variable
efflist is a char of effects (independent variables)
Of course, you may change the geom and/or aesthetics as it needs. The function returns a list of plots which you can pass to e.g. cowplot or gridExtra as in example:
library(gridExtra)
library(dplyr) # just for pipes
plotlist(mtcars, hp, c("mpg","cyl","disp")) %>%
grid.arrange(grobs = ., left = "HP")

Resources