exponential fit with ggplot, showing regression line and R^2 - r

I am trying to fit an exponential model through my data using ggplot2 and the package plotly, further I want to display the regression line and also obtain an R^2 to check the model assumption
This is my data
SR.irrig<-c(67.39368816,28.7369497,60.18499455,49.32404863,166.393182,222.2902192 ,271.8357323,241.7224707,368.4630364,220.2701789,169.9234274,56.49579274,38.183813,49.337,130.9175233,161.6353594,294.1473982,363.910286,358.3290509,239.8411217,129.6507822 ,32.76462234,30.13952285,52.8365588,67.35426966,132.2303449,366.8785687,247.4012487
,273.1931613,278.2790213,123.2425639,45.98362999,83.50199402,240.9945866
,308.6981358,228.3425602,220.5131914,83.97942185,58.32171185,57.93814837,94.64370151 ,264.7800652,274.258633,245.7294036,155.4177734,77.4523639,70.44223322,104.2283817 ,312.4232116,122.8083088,41.65770103,242.2266084,300.0714687,291.5990173,230.5447786,89.42497778,55.60525466,111.6426307,305.7643166,264.2719213,233.2821407,192.7560296,75.60802862,63.75376269)
temp.pred<-c(2.8,8.1,12.6,7.4,16.1,20.5,20.4,18.4,25.8,14.8,13,5.3,9.4,6.8,15.2,14.3,22.4,23.7,20.8,16.5,7.4,4.61,4.79,8.3,12.1,18.4,22,14.6,15.4,15.5,8.2,10.2,14.8,23.4,20.9,14.5,13,9,2,11.6,13,21,24.7,22.3,10.8,13.2,9.7,15.6,21,10.6,8.3,20.7,24.3,17.9,14.7,5.5,7.,11.7,22.3,17.8,15.5,14.8,2.1,7.3)
temp2 <- data.frame(SR.irrig,temp.pred)
This is my code:
gg1 <- ggplot(temp2, aes(x=temp.pred, y=SR.irrig)) +
geom_point() + #show points
stat_smooth(method = 'lm', aes(colour = 'linear'), se = FALSE) +
stat_smooth(method = 'lm', formula = y ~ poly(x,2), aes(colour = 'polynomial'), se= FALSE)+
stat_smooth(method = 'nls', formula = y ~ a*exp(b*x), aes(colour = 'Exponential'), se = FALSE, start = list(a=1,b=1))+
stat_smooth(method = 'nls', formula = y ~ a * log(x) +b, aes(colour = 'logarithmic'), se = FALSE, start = list(a=1,b=1))
For the starting values I tried multiple different options and nothing works for the exponential model.
As an output I get following graph, where all the models are included expect the exponential one
What am I missing that no exp. curve is displayed? and how can I check how good the exponential fit is?

You can try with better initial values for nls and also considering what #RichardTelford suggested:
library(tidyverse)
#Data
SR.irrig<-c(67.39368816,28.7369497,60.18499455,49.32404863,166.393182,222.2902192 ,271.8357323,241.7224707,368.4630364,220.2701789,169.9234274,56.49579274,38.183813,49.337,130.9175233,161.6353594,294.1473982,363.910286,358.3290509,239.8411217,129.6507822 ,32.76462234,30.13952285,52.8365588,67.35426966,132.2303449,366.8785687,247.4012487
,273.1931613,278.2790213,123.2425639,45.98362999,83.50199402,240.9945866
,308.6981358,228.3425602,220.5131914,83.97942185,58.32171185,57.93814837,94.64370151 ,264.7800652,274.258633,245.7294036,155.4177734,77.4523639,70.44223322,104.2283817 ,312.4232116,122.8083088,41.65770103,242.2266084,300.0714687,291.5990173,230.5447786,89.42497778,55.60525466,111.6426307,305.7643166,264.2719213,233.2821407,192.7560296,75.60802862,63.75376269)
temp.pred<-c(2.8,8.1,12.6,7.4,16.1,20.5,20.4,18.4,25.8,14.8,13,5.3,9.4,6.8,15.2,14.3,22.4,23.7,20.8,16.5,7.4,4.61,4.79,8.3,12.1,18.4,22,14.6,15.4,15.5,8.2,10.2,14.8,23.4,20.9,14.5,13,9,2,11.6,13,21,24.7,22.3,10.8,13.2,9.7,15.6,21,10.6,8.3,20.7,24.3,17.9,14.7,5.5,7.,11.7,22.3,17.8,15.5,14.8,2.1,7.3)
temp2 <- data.frame(SR.irrig,temp.pred)
#Try with better initial vals
fm0 <- nls(log(SR.irrig) ~ log(a*exp(b*temp.pred)), temp2, start = c(a = 1, b = 1))
#Plot
gg1 <- ggplot(temp2, aes(x=temp.pred, y=SR.irrig)) +
geom_point() + #show points
stat_smooth(method = 'lm', aes(colour = 'linear'), se = FALSE) +
stat_smooth(method = 'lm', formula = y ~ poly(x,2), aes(colour = 'polynomial'), se= FALSE)+
stat_smooth(method = 'nls', formula = y ~ a*exp(b*x), aes(colour = 'Exponential'), se = FALSE,
method.args = list(start=coef(fm0)))+
stat_smooth(method = 'nls', formula = y ~ a * log(x) +b, aes(colour = 'logarithmic'), se = FALSE, start = list(a=1,b=1))
#Display
gg1
Output:

You can do this within ggplot without needing to get the nls model first (though the end result is the same). You need to decrease the minFactor and increase the maximum iterations of the nls control to get the model to converge, but the results seem reasonable. Note how the arguments are passed from stat_smooth to nls.
ggplot(temp2, aes(x=temp.pred, y=SR.irrig)) +
geom_point() +
stat_smooth(method = 'lm',
formula = y ~ x,
mapping = aes(colour = 'linear'),
se = FALSE) +
stat_smooth(method = 'lm',
formula = y ~ poly(x,2),
mapping = aes(colour = 'polynomial'),
se= FALSE)+
stat_smooth(method = 'nls',
formula = y ~ a*exp(b*x),
mapping = aes(colour = 'Exponential'),
se = FALSE,
method.args = list(start = list(a = 1, b = 1),
control = list(minFactor = 1/ 8192,
maxiter = 100))) +
stat_smooth(method = 'nls',
formula = y ~ a * log(x) +b,
mapping = aes(colour = 'logarithmic'),
se = FALSE,
method.args = list(start = list(a=1,b=1)))

Related

How to pass multiple formulae in geom_smooth (facet_wrap) in R ggplot2?

I want to fit three different functions for each of these factors (var.test). I tried the following method but I get an error that reads Warning messages: 1: Computation failed in stat_smooth():invalid formula. Any other way to get multiple formulae to read be passed at once?
set.seed(14)
df <- data.frame(
var.test = c("T","T","T","T","M","M","M","M","A","A","A","A"),
val.test = rnorm(12,4,5),
x = c(1:12)
)
my.formula <- c(y~x + I(x^2), y~x, y~x + I(x^2))
ggplot(df, aes(x = x, y = val.test)) + geom_point() +
geom_smooth(method="glm", formula = my.formula,
method.args = list(family = "poisson"), color = "black" ) + facet_grid(.~var.test)
You can only have one formula per geom_smooth(). You'll need to add three different geom_smooth layers. YOu can do that manually
ggplot(df, aes(x = x, y = val.test)) +
geom_point() +
geom_smooth(method="glm", formula = my.formula[[1]], method.args = list(family = "poisson"), color = "black" ) +
geom_smooth(method="glm", formula = my.formula[[2]], method.args = list(family = "poisson"), color = "black" ) +
geom_smooth(method="glm", formula = my.formula[[3]], method.args = list(family = "poisson"), color = "black" ) +
facet_grid(.~var.test)
Or you can use lapply to help
ggplot(df, aes(x = x, y = val.test)) +
geom_point() +
lapply(my.formula, function(x) geom_smooth(method="glm", formula = x,
method.args = list(family = "poisson"), color = "black" )) +
facet_grid(.~var.test)
If you want different lines per panel, then you can filter your data for each panel. Here we use an mapply helper and subset the data for each line.
ggplot(df, aes(x = x, y = val.test)) +
geom_point() +
mapply(function(x, z) geom_smooth(method="glm", data=function(d) subset(d, var.test==z), formula = x,
method.args = list(family = "poisson"), color = "black" ),
my.formula, c("A","M","T")) +
facet_grid(.~var.test)

Does the formula argument of geom_smooth mirror whats in aes()?

I have a ggplot for a logarithmic relationship between variable growth_rate and tenure:
pdata %>%
ggplot(aes(x = log(TENURE), y = GROWTH_RATE)) +
geom_point(color = 'gray', alpha = 0.3) +
geom_smooth(method = 'lm', formula = 'y ~ x')
But the geom_smooth appears to fit better with:
pdata %>%
ggplot(aes(x = log(TENURE), y = GROWTH_RATE)) +
geom_point(color = 'gray', alpha = 0.3) +
geom_smooth(method = 'lm', formula = 'y ~ log(x)')
Which plot is correct? Which plot shows a smooth fit line based on a linear model with formula y ~ log(TENURE)?
It looks like your underlying growth rate varies with the log of the log of tenure. Here's some sample data with that "log of log" relationship:
tibble(TENURE = runif(1E4, min = 7, max = 1000),
GROWTH_RATE = rnorm(1E4, mean = 1, sd = 0.1) * log(log(TENURE))) %>%
ggplot(aes(log(TENURE), GROWTH_RATE)) +
geom_point(alpha = 0.3, color = "gray50") +
geom_smooth(method = 'lm', formula = 'y ~ x')
Plotting growth against the log results in a loose fit like your first one. Note that the lm is using the transformed values from your x and y mapping, so we can see that it is using log(TENURE) for x. (See bottom for a confirmation of that.)
But modeling against the log of the log of tenure is a better fit. Here, when we use y ~ log(x), it means y ~ log( [log(TENURE)] ) since x is globally mapped in ggplot(aes(...)) to relate to the log of TENURE.
... + geom_smooth(method = 'lm', formula = 'y ~ log(x)')
If instead the original relationship had been a good fit for y ~ log(x), like the different generated data here, your first lm would have matched better:
tibble(TENURE = runif(1E4, min = 7, max = 1000),
GROWTH_RATE = rnorm(1E4, mean = 1, sd = 0.1) * log(TENURE)) %>%
ggplot(aes(log(TENURE), GROWTH_RATE)) +
geom_point(alpha = 0.3, color = "gray50") +
geom_smooth(method = 'lm', formula = 'y ~ x')

How to use models in a list when plotting stat_smooth function

I'm trying to plot multiple models within a list. However while plotting I'm unable to change the formula to standard y ~ x notation so I get an error. This would be well explained through an example. How do I use the models for plotting?
xvar=1:100
yvar=(1:100+(1:100)^2)
df=data.frame(xvar,yvar)
## this works fine
ggplot(df, aes(x=xvar, y=yvar)) + geom_point(size = 1) + geom_smooth(data = df, method = "lm", aes(x=xvar,y=yvar), formula = as.formula(y ~ x), size = 1, se = FALSE, colour = "yellow")
models=list(
lm(yvar~xvar, data = df),
lm(yvar~I(xvar^2), data = df)
)
ggplot(df, aes(x = xvar, y = yvar)) + geom_point(size = 1) + geom_smooth(data = df, method = "lm", aes(x=xvar,y=yvar), formula = as.formula(models[[1]]), size = 1, se = FALSE, colour = "yellow")
Warning messages:
1: 'newdata' had 80 rows but variables found have 100 rows
2: Computation failed in `stat_smooth()`:
arguments imply differing number of rows: 80, 100
Does this give you what you need?
library(ggplot2)
xvar=1:100
yvar=(1:100+(1:100)^2)
df=data.frame(xvar,yvar)
## this works fine
ggplot(df, aes(x=xvar, y=yvar)) + geom_point(size = 1) + geom_smooth(data = df, method = "lm", aes(x=xvar,y=yvar), formula = as.formula(y ~ x), size = 1, se = FALSE, colour = "yellow")
#name the models
models=list(
m1 = lm(yvar~xvar, data = df),
m2 = lm(yvar~I(xvar^2), data = df)
)
#use the name of the first model and then a formula
ggplot(df, aes(x = xvar, y = yvar)) + geom_point(size = 1) + geom_smooth(data = models[[c("m1","model")]], method = "lm", aes(x=xvar,y=yvar), formula = as.formula(y ~ x), size = 1, se = FALSE, colour = "yellow")

How to apply geom_smooth() for every group?

How can I apply geom_smooth() for every group ?
The code below uses facet_wrap(), so plots every group in a separate graph.
I would like to integrate the graph, and get one graph.
ggplot(data = iris, aes(x = Sepal.Length, y = Petal.Length)) +
geom_point(aes(color = Species)) +
geom_smooth(method = "nls", formula = y ~ a * x + b, se = F,
method.args = list(start = list(a = 0.1, b = 0.1))) +
facet_wrap(~ Species)
You have to put all your variable in ggplot aes():
ggplot(data = iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) +
geom_point() +
geom_smooth(method = "nls", formula = y ~ a * x + b, se = F,
method.args = list(start = list(a = 0.1, b = 0.1)))
Adding a mapping aes(group=Species) to the geom_smooth() call will do what you want.
Basic plot:
library(ggplot2); theme_set(theme_bw())
g0 <- ggplot(data = iris, aes(x = Sepal.Length, y = Petal.Length)) +
geom_point(aes(color = Species))
geom_smooth:
g0 + geom_smooth(aes(group=Species),
method = "nls", formula = y ~ a * x + b, se = FALSE,
method.args = list(start = list(a = 0.1, b = 0.1)))
formula is always expressed in terms of x and y, no matter what variables are called in the original data set:
the x variable in the formula refers to the variable that is mapped to the x-axis (Sepal.Length)
the y variable to the y-axis variable (Petal.Length)
The model is fitted separately to groups in the data (Species).
If you add a colour mapping (for a factor variable) that will have the same effect (groups are implicitly defined according to the intersection of all the mappings used to distinguish geoms), plus the lines will be appropriately coloured.
g0 + geom_smooth(aes(colour=Species),
method = "nls", formula = y ~ a * x + b, se = FALSE,
method.args = list(start = list(a = 0.1, b = 0.1)))
As #HubertL points out, if you want to apply the same aesthetics to all of your geoms, you can just put them in the original ggplot call ...
By the way, I assume that in reality you want to use a more complex nls model - otherwise you could just use geom_smooth(...,method="lm") and save yourself trouble ...

How to add legend to geom_smooth in ggplot in R

Have a problem of adding legend to different smooth in ggplot.
library(splines)
library(ggplot2)
temp <- data.frame(x = rnorm(200, 20, 15), y = rnorm(200, 30, 8))
ggplot(data = temp, aes(x, y)) + geom_point() +
geom_smooth(method = 'lm', formula = y ~ bs(x, df=5, intercept = T), col='blue') +
geom_smooth(method = 'lm', formula = y ~ ns(x, df=2, intercept = T), col='red')
I have two splines: red and blue. How I can add a legend for them?
Put the colour in aes() and add scale_colour_manual():
ggplot(data = temp, aes(x, y)) + geom_point() +
geom_smooth(method = 'lm', formula = y ~ bs(x, df=5, intercept = T), aes(colour="A")) +
geom_smooth(method = 'lm', formula = y ~ ns(x, df=2, intercept = T), aes(colour="B")) +
scale_colour_manual(name="legend", values=c("blue", "red"))

Resources