How to extract stat_smooth exponential fit parameters ggplot2 - r

I've already tried many of the suggestions found here, but I simply can't figure it out.
Is it possible to extract the equation (y = a+exp(-b*x)) from a line fitted with stat_smooth?
This is a data example:
df <-data_frame(Time = c(0.5,1,2,4,8,16,24), Concentration = c(1,0.5,0.2,0.05,0.02,0.01,0.001))
Plot <- ggplot(df, aes(x=Time, y=Concentration))+
geom_point(size=2) +
stat_smooth(method = nls, formula = y ~ a*exp(-b *x),
se = FALSE,
method.args = list(start = c(a=10, b=0.01)))+
theme_classic(base_size = 15) +
labs(x=expression(Time (h)),
y=expression(C[t]/C[0]))
I tried to use "stat_regline_equation" , but it does not work when I add the exponential function.

To extract data from ggplot you can use: ggplot_build()
Values from stat_smooth() are in ggplot_build(Plot)$data[[2]]
You can assign it to the object: build <- ggplot_build(Plot)$data[[2]]
Both codes below give the same result
Plot <- ggplot(df, aes(x=Time, y=Concentration)) + geom_point(size=2) +
stat_smooth(method = nls, formula = y ~ a*exp(-b *x), se = FALSE,
method.args = list(start = c(a=10, b=0.01)))
and
Plot <- ggplot(df, aes(x=Time, y=Concentration)) + geom_point(size=2) +
geom_line(data=build,aes(x=x,y=y),color="blue")

I don't think it's possible. (1) I poked around in the guts of the object generated by ggplot_build(Plot) and didn't find anything likely (that doesn't prove it isn't there but...) (2) If you poke around in the source code of the ggpubr::stat_regline_equation() function you can see that rather than poke around in the stored information from the smooth it has to call a package function that re-fits the linear model so it can extract the coefficients and construct the equation.
You probably just have to re-fit the model yourself:
nls_fit <- nls(formula = Concentration ~ a*exp(-b *Time),
start = c(a=10, b=0.01), data = df)
coef(nls_fit)
(You might find the format returned by broom::tidy(nls_fit) convenient.)
For this particular model you can also get the coefficients via
cc <- coef(glm(Concentration ~ Time, data = df, family = gaussian(link= "log")))
c(exp(cc[1]), -cc[2])
You could in principle write your own stat_ function mirroring stat_regline_equation that would encapsulate this functionality, but it would be a lot more work/wouldn't be worth it unless you were doing this operation very routinely or wanted to make it easy for others to do ...

Related

Compute and plot 'grand' regression over multiple smaller regressions

I have grouped Area values, for each of which I can compute and plot regressions:
set.seed(123)
df <- data.frame(
Group = c(rep("A",8), rep("B",10), rep("C",7)),
Area = c(1,3,2,4,3,5,7,9, rnorm(10), sample(7)),
x = c(1:8,1:10,1:7)
)
library(ggplot2)
ggplot(df,
aes(x = x, y = Area, group = factor(Group))) +
geom_smooth(method = "lm", se = FALSE)
But what I'm looking for is how to compute and plot what could be called a 'grand' regression for all Area groups. Is this possible and how would it be possible?
EDIT:
My guess is that it's not enough to simply disregard the group variable by running a model over all Area and all x values and excluding the groupvariable. This would treat the different groups as irrelevant. In actual fact each group represents a distribution in its own right. Consider each group as collecting the values of an independent event . What I need is a model that incorporates the distinction between the groups/events while at the same time summarizing over them.
I would be wary of the answers using stat_smooth/geom_smooth to plot a fitted line for the disaggregated values. This simply draws a best fit line through all of the data, ignoring how they are clustered.
As you say in your edit, what you need is a model that can account for the fact that you have an Area ~ X relationship in each group:
EDIT: My guess is that it's not enough to simply disregard the group variable by running a model over all Area and all x values and excluding the groupvariable. This would treat the different groups as irrelevant. In actual fact each group represents a distribution in its own right. Consider each group as collecting the values of an independent event . What I need is a model that incorporates the distinction between the groups/events while at the same time summarizing over them.
Without knowing more about your design, my first recommendation would be a mixed-effects model (e.g., using lme4).
You can fit the model, accounting for the fact that you have unique relationships in each group:
example_mod<- lmer(Area~
# Fixed Effects
1+X+
# Random Effects
(1+X|Group),
data=df,
REML=TRUE,
control=lmerControl(optimizer="bobyqa",optCtrl=list(maxfun=5e5)))
You can then extract the predicted values from this model to plot those, or calculate your own predicted values from the fixed-effects.
fitted(example_mod)
fixef(example_mod)
use two geom_smooth and put the grouping aesthetic into each geom separately
set.seed(123)
df <- data.frame(
Group = c(rep("A",8), rep("B",10), rep("C",7)),
Area = c(1,3,2,4,3,5,7,9, rnorm(10), sample(7)),
x = c(1:8,1:10,1:7)
)
library(ggplot2)
ggplot(df, aes(x = x, y = Area)) +
geom_smooth(aes(group = factor(Group)), method = "lm", se = FALSE) +
geom_smooth()
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2022-06-29 by the reprex package (v2.0.1)
ggplot(df,
aes(x = x, y = Area)) +
geom_smooth(method = "lm", se = FALSE, colour="red") +
geom_smooth(method="lm", se = FALSE, aes(group=factor(Group)))
Edit: Since I'd been called out to provide more details, here is what is going on behind the scene for when you run geom_smooth(aes(group=factor(Group))
library(nlme)
fit1 <- lmList(Area ~ x|Group, data=df)
df$fit1 <- fitted(fit1)
ggplot(df, aes(x, fit1, colour=Group)) + geom_line()
When you add a second geom_smooth without the group factor, you are running a linear regression (method lm) for the whole data set. i.e.
fit2 <- lm(Area ~ x, data=df)
df$fit2 <- fitted(fit2)
ggplot(df, aes(x, fit2)) + geom_line()

ggplot to see model fit and scatterplot data at the same time [duplicate]

I'm trying hard to add a regression line on a ggplot. I first tried with abline but I didn't manage to make it work. Then I tried this...
data = data.frame(x.plot=rep(seq(1,5),10),y.plot=rnorm(50))
ggplot(data,aes(x.plot,y.plot))+stat_summary(fun.data=mean_cl_normal) +
geom_smooth(method='lm',formula=data$y.plot~data$x.plot)
But it is not working either.
In general, to provide your own formula you should use arguments x and y that will correspond to values you provided in ggplot() - in this case x will be interpreted as x.plot and y as y.plot. You can find more information about smoothing methods and formula via the help page of function stat_smooth() as it is the default stat used by geom_smooth().
ggplot(data,aes(x.plot, y.plot)) +
stat_summary(fun.data=mean_cl_normal) +
geom_smooth(method='lm', formula= y~x)
If you are using the same x and y values that you supplied in the ggplot() call and need to plot the linear regression line then you don't need to use the formula inside geom_smooth(), just supply the method="lm".
ggplot(data,aes(x.plot, y.plot)) +
stat_summary(fun.data= mean_cl_normal) +
geom_smooth(method='lm')
As I just figured, in case you have a model fitted on multiple linear regression, the above mentioned solution won't work.
You have to create your line manually as a dataframe that contains predicted values for your original dataframe (in your case data).
It would look like this:
# read dataset
df = mtcars
# create multiple linear model
lm_fit <- lm(mpg ~ cyl + hp, data=df)
summary(lm_fit)
# save predictions of the model in the new data frame
# together with variable you want to plot against
predicted_df <- data.frame(mpg_pred = predict(lm_fit, df), hp=df$hp)
# this is the predicted line of multiple linear regression
ggplot(data = df, aes(x = mpg, y = hp)) +
geom_point(color='blue') +
geom_line(color='red',data = predicted_df, aes(x=mpg_pred, y=hp))
# this is predicted line comparing only chosen variables
ggplot(data = df, aes(x = mpg, y = hp)) +
geom_point(color='blue') +
geom_smooth(method = "lm", se = FALSE)
The simple and versatile solution is to draw a line using slope and intercept from geom_abline. Example usage with a scatterplot and lm object:
library(tidyverse)
petal.lm <- lm(Petal.Length ~ Petal.Width, iris)
ggplot(iris, aes(x = Petal.Width, y = Petal.Length)) +
geom_point() +
geom_abline(slope = coef(petal.lm)[["Petal.Width"]],
intercept = coef(petal.lm)[["(Intercept)"]])
coef is used to extract the coefficients of the formula provided to lm. If you have some other linear model object or line to plot, just plug in the slope and intercept values similarly.
I found this function on a blog
ggplotRegression <- function (fit) {
`require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}`
once you loaded the function you could simply
ggplotRegression(fit)
you can also go for ggplotregression( y ~ x + z + Q, data)
Hope this helps.
If you want to fit other type of models, like a dose-response curve using logistic models you would also need to create more data points with the function predict if you want to have a smoother regression line:
fit: your fit of a logistic regression curve
#Create a range of doses:
mm <- data.frame(DOSE = seq(0, max(data$DOSE), length.out = 100))
#Create a new data frame for ggplot using predict and your range of new
#doses:
fit.ggplot=data.frame(y=predict(fit, newdata=mm),x=mm$DOSE)
ggplot(data=data,aes(x=log10(DOSE),y=log(viability)))+geom_point()+
geom_line(data=fit.ggplot,aes(x=log10(x),y=log(y)))
Another way to use geom_line() to add regression line is to use broom package to get fitted values and use it as shown here
https://cmdlinetips.com/2022/06/add-regression-line-to-scatterplot-ggplot2/

ggplot2 geom_smooth, extended model for method=lm

I would like to use geom_smooth to get a fitted line from a certain linear regression model.
It seems to me that the formula can only take x and y and not any additional parameter.
To show more clearly what I want:
library(dplyr)
library(ggplot2)
set.seed(35413)
df <- data.frame(pred = runif(100,10,100),
factor = sample(c("A","B"), 100, replace = TRUE)) %>%
mutate(
outcome = 100 + 10*pred +
ifelse(factor=="B", 200, 0) +
ifelse(factor=="B", 4, 0)*pred +
rnorm(100,0,60))
With
ggplot(df, aes(x=pred, y=outcome, color=factor)) +
geom_point(aes(color=factor)) +
geom_smooth(method = "lm") +
theme_bw()
I produce fitted lines that, due to the color=factor option, are basically the output of the linear model lm(outcome ~ pred*factor, df)
In some cases, however, I prefer the lines to be the output of a different model fit, like lm(outcome ~ pred + factor, df), for which I can use something like:
fit <- lm(outcome ~ pred+factor, df)
predval <- expand.grid(
pred = seq(
min(df$pred), max(df$pred), length.out = 1000),
factor = unique(df$factor)) %>%
mutate(outcome = predict(fit, newdata = .))
ggplot(df, aes(x=pred, y=outcome, color=factor)) +
geom_point() +
geom_line(data = predval) +
theme_bw()
which results in :
My question: is there a way to produce the latter graph exploiting the geom_smooth instead? I know there is a formula = - option in geom_smooth but I can't make something like formula = y ~ x + factor or formula = y ~ x + color (as I defined color = factor) work.
This is a very interesting question. Probably the main reason why geom_smooth is so "resistant" to allowing custom models of multiple variables is that it is limited to producing 2-D curves; consequently, its arguments are designed for handling two-dimensional data (i.e. formula = response variable ~ independent variable).
The trick to getting what you requested is using the mapping argument within geom_smooth, instead of formula. As you've probably seen from looking at the documentation, formula only allows you to specify the mathematical structure of the model (e.g. linear, quadratic, etc.). Conversely, the mapping argument allows you to directly specify new y-values - such as the output of a custom linear model that you can call using predict().
Note that, by default, inherit.aes is set to TRUE, so your plotted regressions will be coloured appropriately by your categorical variable. Here's the code:
# original plot
plot1 <- ggplot(df, aes(x=pred, y=outcome, color=factor)) +
geom_point(aes(color=factor)) +
geom_smooth(method = "lm") +
ggtitle("outcome ~ pred") +
theme_bw()
# declare new model here
plm <- lm(formula = outcome ~ pred + factor, data=df)
# plot with lm for outcome ~ pred + factor
plot2 <-ggplot(df, aes(x=pred, y=outcome, color=factor)) +
geom_point(aes(color=factor)) +
geom_smooth(method = "lm", mapping=aes(y=predict(plm,df))) +
ggtitle("outcome ~ pred + factor") +
theme_bw()

Using modelr::add_predictions for glm

I am trying to calculate the logistic regression prediction for a set of data using the tidyverse and modelr packages. Clearly I am doing something wrong in the add_predictions as I am not receiving the "response" of the logistic function as I would if I were using the 'predict' function in stats. This should be simple, but I can't figure it out and multiple searches yielded little.
library(tidyverse)
library(modelr)
options(na.action = na.warn)
library(ISLR)
d <- as_tibble(ISLR::Default)
model <- glm(default ~ balance, data = d, family = binomial)
grid <- d %>% data_grid(balance) %>% add_predictions(model)
ggplot(d, aes(x=balance)) +
geom_point(aes(y = default)) +
geom_line(data = grid, aes(y = pred))
predict.glm's type parameter defaults to "link", which add_predictions does not change by default, nor provide you with any way to change to the almost-certainly desired "response". (A GitHub issue exists; add your nice reprex on it if you like.) That said, it's not hard to just use predict directly within the tidyverse via dplyr::mutate.
Also note that ggplot is coercing default (a factor) to numeric in order to plot the line, which is fine, except that "No" and "Yes" are replaced by 1 and 2, while the probabilities returned by predict will be between 0 and 1. Explicitly coercing to numeric and subtracting one fixes the plot, though an extra scale_y_continuous call is required to fix the labels.
library(tidyverse)
library(modelr)
d <- as_tibble(ISLR::Default)
model <- glm(default ~ balance, data = d, family = binomial)
grid <- d %>% data_grid(balance) %>%
mutate(pred = predict(model, newdata = ., type = 'response'))
ggplot(d, aes(x = balance)) +
geom_point(aes(y = as.numeric(default) - 1)) +
geom_line(data = grid, aes(y = pred)) +
scale_y_continuous('default', breaks = 0:1, labels = levels(d$default))
Also note that if all you want is a plot, geom_smooth can calculate predictions directly for you:
ggplot(d, aes(balance, as.numeric(default) - 1)) +
geom_point() +
geom_smooth(method = 'glm', method.args = list(family = 'binomial')) +
scale_y_continuous('default', breaks = 0:1, labels = levels(d$default))

Adding a regression line on a ggplot

I'm trying hard to add a regression line on a ggplot. I first tried with abline but I didn't manage to make it work. Then I tried this...
data = data.frame(x.plot=rep(seq(1,5),10),y.plot=rnorm(50))
ggplot(data,aes(x.plot,y.plot))+stat_summary(fun.data=mean_cl_normal) +
geom_smooth(method='lm',formula=data$y.plot~data$x.plot)
But it is not working either.
In general, to provide your own formula you should use arguments x and y that will correspond to values you provided in ggplot() - in this case x will be interpreted as x.plot and y as y.plot. You can find more information about smoothing methods and formula via the help page of function stat_smooth() as it is the default stat used by geom_smooth().
ggplot(data,aes(x.plot, y.plot)) +
stat_summary(fun.data=mean_cl_normal) +
geom_smooth(method='lm', formula= y~x)
If you are using the same x and y values that you supplied in the ggplot() call and need to plot the linear regression line then you don't need to use the formula inside geom_smooth(), just supply the method="lm".
ggplot(data,aes(x.plot, y.plot)) +
stat_summary(fun.data= mean_cl_normal) +
geom_smooth(method='lm')
As I just figured, in case you have a model fitted on multiple linear regression, the above mentioned solution won't work.
You have to create your line manually as a dataframe that contains predicted values for your original dataframe (in your case data).
It would look like this:
# read dataset
df = mtcars
# create multiple linear model
lm_fit <- lm(mpg ~ cyl + hp, data=df)
summary(lm_fit)
# save predictions of the model in the new data frame
# together with variable you want to plot against
predicted_df <- data.frame(mpg_pred = predict(lm_fit, df), hp=df$hp)
# this is the predicted line of multiple linear regression
ggplot(data = df, aes(x = mpg, y = hp)) +
geom_point(color='blue') +
geom_line(color='red',data = predicted_df, aes(x=mpg_pred, y=hp))
# this is predicted line comparing only chosen variables
ggplot(data = df, aes(x = mpg, y = hp)) +
geom_point(color='blue') +
geom_smooth(method = "lm", se = FALSE)
The simple and versatile solution is to draw a line using slope and intercept from geom_abline. Example usage with a scatterplot and lm object:
library(tidyverse)
petal.lm <- lm(Petal.Length ~ Petal.Width, iris)
ggplot(iris, aes(x = Petal.Width, y = Petal.Length)) +
geom_point() +
geom_abline(slope = coef(petal.lm)[["Petal.Width"]],
intercept = coef(petal.lm)[["(Intercept)"]])
coef is used to extract the coefficients of the formula provided to lm. If you have some other linear model object or line to plot, just plug in the slope and intercept values similarly.
I found this function on a blog
ggplotRegression <- function (fit) {
`require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}`
once you loaded the function you could simply
ggplotRegression(fit)
you can also go for ggplotregression( y ~ x + z + Q, data)
Hope this helps.
If you want to fit other type of models, like a dose-response curve using logistic models you would also need to create more data points with the function predict if you want to have a smoother regression line:
fit: your fit of a logistic regression curve
#Create a range of doses:
mm <- data.frame(DOSE = seq(0, max(data$DOSE), length.out = 100))
#Create a new data frame for ggplot using predict and your range of new
#doses:
fit.ggplot=data.frame(y=predict(fit, newdata=mm),x=mm$DOSE)
ggplot(data=data,aes(x=log10(DOSE),y=log(viability)))+geom_point()+
geom_line(data=fit.ggplot,aes(x=log10(x),y=log(y)))
Another way to use geom_line() to add regression line is to use broom package to get fitted values and use it as shown here
https://cmdlinetips.com/2022/06/add-regression-line-to-scatterplot-ggplot2/

Resources