Geom_smooth - linear regression through x-axis intercept - r

I would like to force a linear regression through a specific x-axis crossing point using "geom_smooth" in ggplot2:
geom_smooth(aes(x = x, y = y), method = "lm", formula = y ~ x)
Intuitively, choosing an x-axis intercept, one would use the formula y = a * (x - b) + c.
Implementing this in the "formula" code as e.g. :
geom_smooth(aes(x = x, y = y), method = "lm", formula = y ~ x - 5)
Does not work.

I am not sure it is possible to do this just using geom_smooth. However, you could predict the regression outside of your ggplot2 call, using an offset to set the intercept required and plot it subsequently.
For example:
set.seed(1)
# Generate some data
x <- 1:10
y <- 3 + 2*x + rnorm(length(x), 0, 2)
# Simple regression
z_1 <- lm(y ~ x)
# Regression with no intercept
z_2 <- lm(y ~ x + 0)
# Regression with intercept at (0,3) - the 'true' intercept
z_3 <- lm(y ~ x + 0, offset=rep(3, length(x)))
# See the coefficients
coef(z_1)
#(Intercept) x
# 2.662353 2.109464
coef(z_2)
# x
#2.4898
coef(z_3)
# x
#1.775515
# Combine into one dataframe
df <- cbind.data.frame(x,y,predict(z_1),predict(z_2), predict(z_3))
# Plot the three regression lines
library(ggplot2)
ggplot(df) + geom_point(aes(x,y)) +
geom_line(aes(x,predict(z_1)), color = "red") +
geom_line(aes(x,predict(z_2)), color = "blue") +
geom_line(aes(x,predict(z_3)), color = "green") +
scale_x_continuous(limits = c(0,10)) +
scale_y_continuous(limits = c(0,30))

You'll need to use the offset function for the x-intercept that's already locked in. That's passed via the method.args argument of geom_smooth, since not all smoothing methods can use that argument.
You'll also need to specify the orientation argument to confirm that you've got an x-intercept, rather than the y-intercept.
I also specified the number of smoothing points to plot (n) and the offset repeats to match -- not sure if that's strictly necessary.
Some gymnastics to be sure, but hopefully it helps.
library("tidyverse")
mtcars %>%
ggplot(aes(disp, hp)) +
geom_point() +
geom_smooth(method = "lm",
orientation = "y",
formula = y ~ x + 0,
color= "blue",
se = FALSE,
n = nrow(mtcars),
method.args=list(offset=rep(100, nrow(mtcars))),
fullrange = TRUE) +
scale_x_continuous(limits =c(0, 600))
#> Warning: Removed 5 rows containing missing values (geom_smooth).
Created on 2020-07-08 by the reprex package (v0.3.0)

Related

Plot binomial GAM in ggplot

I'm trying to visualize a dataset that uses a binomial response variable (proportions). I'm using a gam to examine the trend, but having difficult getting it to plot with ggplot. How do I get the smooth added to the plot?
Example:
set.seed(42)
df <- data.frame(y1 = sample.int(100),
y2 = sample.int(100),
x = runif(100, 0, 100))
ggplot(data = df,
aes(y = y1/(y1+y2), x = x)) +
geom_point(shape = 1) +
geom_smooth(method = "gam",
method.args = list(family = binomial),
formula = cbind(y1, y2) ~ s(x))
Warning message:
Computation failed in `stat_smooth()`
Caused by error in `cbind()`:
! object 'y1' not found
The formula in geom_smooth has to be in terms of x and y, representing the variables on your x and y axes, so you can't pass in y1 and y2.
The way round this is that rather than attempting to use the cbind type left-hand side of your gam, you can expand the counts into 1s and 0s so that there is only a single y variable. Although this makes for a little extra pre-processing, it allows you to draw your points just as easily using stat = 'summary' inside geom_point and makes your geom_smooth very straightforward:
library(tidyverse)
set.seed(42)
df <- data.frame(y1 = sample.int(100),
y2 = sample.int(100),
x = runif(100, 0, 100))
df %>%
rowwise() %>%
summarize(y = rep(c(1, 0), times = c(y1, y2)), x = x) %>%
ggplot(aes(x, y)) +
geom_point(stat = 'summary', fun = mean, shape = 1) +
geom_smooth(method = "gam",
method.args = list(family = binomial),
formula = y ~ s(x)) +
theme_classic()
Created on 2023-01-20 with reprex v2.0.2

How to fit non-linear function to data in ggplot2 using maximum likelihood model in R?

The data set (x.test, y.test) is an exponential fit. I'm trying to fit a custom non-linear function and attached is the code. The regular points plot just fine but I'm unable to get the fit line to work. Any suggestions?
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
library(ggpmisc)
my.formula <- y ~ lambda/ (1 + aii*x)
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",formula = y.test ~ lambda/ (1 + aii*x.test), method.args=list(start=c(lambda=1000,aii=-816.39)),se=F,color="red") +
geom_smooth(method="lm", formula = my.formula , col = "red") + stat_poly_eq(formula = my.formula, aes(label = stringr::str_wrap(paste(..eq.label.., ..rr.label.., sep = "~~~"))), parse = TRUE, size = 2.5, col = "red") + stat_function(fun=function (x.test){
y.test ~ lambda/ (1 + aii*x.test)}, color = "blue")
A few things:
you need to use y and x as the variable names in the formula argument to geom_smooth, regardless of what the names are in your data set
you need better starting values (see below)
there's a GLM trick you can use to fit this model; doesn't always work (can be numerically unstable), but it doesn't need starting values and will work more often than nls()
I don't think lm() and stat_poly_eq() are going to work as expected (or maybe at all) with a nonlinear formula ...
simulate data
(same as your code but using set.seed() - probably not important here but good practice)
set.seed(101)
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
attempt nls fit with your starting values
It's usually a good idea to troubleshoot by fitting any smoothing terms outside of ggplot2, so you have fewer layers to dig through to find the problems:
nls(y.test ~ lambda/(1+ aii*x.test),
start = list(lambda=1000,aii=-816.39),
data = df)
Error in nls(y.test ~ lambda/(1 + aii * x.test), start = list(lambda = 1000, :
singular gradient
OK, still doesn't work. Let's use glm() to get better starting values: we use an inverse-link GLM:
1/y = b0 + b1*x
y = 1/(b0 + b1*x)
= (1/b0)/(1 + (b1/b0)*x)
So:
g1 <- glm(y.test ~ x.test, family = gaussian(link = "inverse"))
s0 <- with(as.list(coef(g1)), list(lambda = 1/`(Intercept)`, aii = x.test/`(Intercept)`))
This gives lambda = -0.09, aii = -0.638 (with a little bit more work we could probably also figure out how to eyeball these by looking at the starting point and scale of the curve).
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",
formula = y ~ lambda/ (1 + aii*x),
method.args=list(start=s0),
se=FALSE,color="red") +
stat_smooth(method = "glm",
formula = y ~ x,
method.args = list(gaussian(link = "inverse")),
color = "blue", linetype = 2)

How can I add confidence intervals to a scatterplot for a regression on two variables?

I need to create an insightful graphic with a regression line, data points, and confidence intervals. I am not looking for smoothed lines. I have tried multiple codes, but I just can't get it right.
I am looking for something like this:
Some codes I have tried:
p <- scatterplot(df.regsoft$w ~ df.regsoft$b,
data = df.regsoft,
boxplots = FALSE,
regLine = list(method=lm, col="red"),
pch = 16,
cex = 0.7,
xlab = "Fitted Values",
ylab = "Residuals",
legend = TRUE,
smooth = FALSE)
abline(coef = confint.lm(result.rs))
But this doesn't create what I want to create, however it is closest to what I intended. Notice that I took out "smooth" since this is not really what I am looking for.
How can I make this plot interactive?
If you don't mind switch to ggplot and the tidyverse, then this is simply a geom_smooth(method = "lm"):
library(tidyverse)
d <- tibble( #random stuff
x = rnorm(100, 0, 1),
y = 0.25 * x + rnorm(100, 0, 0.25)
)
m <- lm(y ~ x, data = d) #linear model
d %>%
ggplot() +
aes(x, y) + #what to plot
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
without method = "lm" it draws a smoothed line.
As for the Conf. interval (Obs 95%) lines, it seems to me that's simply a quantile regression. In that case, you can use the quantreg package.
If you want to make it interactive, you can use the plotly package:
library(plotly)
p <- d %>%
ggplot() +
aes(x, y) +
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
ggplotly(p)
================================================
P.S.
I am not completely sure this is what the figure you posted is showing (I guess so), but to add the quantile lines, I would just perform two quantile regressions (upper and lower) and then calculate the values of the quantile lines for your data:
library(tidyverse)
library(quantreg)
d <- tibble( #random stuff
x = rnorm(100, 0, 1),
y = 0.25 * x + rnorm(100, 0, 0.25)
)
m <- lm(y ~ x, data = d) #linear model
# 95% quantile, two tailed
rq_low <- rq(y ~ x, data = d, tau = 0.025) #lower quantile
rq_high <- rq(y ~ x, data = d, tau = 0.975) #upper quantile
d %>%
mutate(low = rq_low$coefficients[1] + x * rq_low$coefficients[2],
high = rq_high$coefficients[1] + x * rq_high$coefficients[2]) %>%
ggplot() +
geom_point(aes(x, y)) +
geom_smooth(aes(x, y), method = "lm") +
geom_line(aes(x, low), linetype = "dashed") +
geom_line(aes(x, high), linetype = "dashed") +
theme_bw()

Smooth interpolation of my data

I'm trying create smooth lines on a plot which include the maxima of the data points. I've searched around a lot and have tinkered with loess() and ksmooth() but I'm yet to make it work.
My best attempt so far has been with ksmooth() but the line doesn't pass through the maximum data point
I'm a chemist, not a statistician, so the methods/descriptions of various smoothing techniques often go over my head. Any suggestions would be really appreciated.
Edit: Just wanted to make a few things clearer. Basically what I'm after is a smoothed version of the following plot with the line passing through the maximum y value.
To generate the plot in the first picture I used the following code:
plot(ChiM~Temp, xlim=c(2,6), ylim=c(0,0.225), lwd=2, pch=16, col='red',subset=(v=='20'), main='Out-of-Phase AC Suscetability Plot', xlab='Temperature (K)', ylab=expression(chi[M]*'" (cm'^3*~'mol'^-1*')'))
setone <- subset(DSM32ac, v=='20') #v=20 is the subset of the data I have provided
attach(setone)
lines(ksmooth(Temp, ChiM, 'normal', bandwidth=0.5), col='red',lwd=2)
I hope this makes things a little clearer. If you need any more information to answer this question just let me know.
Edit 2: I've removed the data since I can't make a neat table. If it's really important I'll try and put it back in.
Try this:
y <- c(.07, .12, .17, .11, .04, .02, .01)
x <- seq_along(y)
s <- spline(x, y)
plot(y ~ x)
lines(s)
giving:
You can try this:
n <- 10 # generate 10 data points
d <- data.frame(x = 1:n, y = rnorm(n))
# with loes smoothing (span parameter controls the degree of smoothing)
library(ggplot2)
ggplot() + geom_point(data=d,aes(x,y), size=5) +
geom_smooth(data=d,aes(x,y, colour='span=0.5'), span=0.5, se=FALSE) +
geom_smooth(data=d,aes(x,y, colour='span=0.6'), span=0.6, se=FALSE) +
geom_smooth(data=d,aes(x,y, colour='span=0.7'), span=0.7, se=FALSE) +
geom_smooth(data=d,aes(x,y, colour='span=0.8'), span=0.8, se=FALSE)
# with B-spline curves using lm (degree of polynomial fitted controls the smoothness)
ggplot() +
geom_point(data=d, aes(x, y), size=5) +
geom_smooth(data=d, aes(x, y,col='degree=3'), method = "lm", formula = y ~ splines::bs(x, 3), se = FALSE) +
geom_smooth(data=d, aes(x, y,col='degree=4'), method = "lm", formula = y ~ splines::bs(x, 4), se = FALSE) +
geom_smooth(data=d, aes(x, y,col='degree=5'), method = "lm", formula = y ~ splines::bs(x, 5), se = FALSE) +
geom_smooth(data=d, aes(x, y,col='degree=6'), method = "lm", formula = y ~ splines::bs(x, 6), se = FALSE) +
geom_smooth(data=d, aes(x, y,col='degree=7'), method = "lm", formula = y ~ splines::bs(x, 6), se = FALSE)
# with smooth.spline (spar parameter control smoothness of the fitted curve)
colors <- rainbow(100)
plot(d$x, d$y, pch=19, xlab='x', ylab='y')
i <- 1
for (spar in seq(0.001,1,length=100)) {
lines(smooth.spline(d$x, d$y, spar=spar, all.knots=TRUE)$y, col=colors[i])
i <- i + 1
}
points(d$x, d$y, pch=19)

Different behaviour lm in stat_smooth

In this question someone asked if it is possible change the colour in a ggplot2 plot depending on a linear regression line.
The proposed solution worked, the points have a different colour above and below the plot.
library(ggplot2)
set.seed(2015)
df <- data.frame(x = rnorm(100),
y = rnorm(100))
# Fit linear regression
l = lm(y ~ x, data = df)
# Make new group variable based on residuals
df$group = NA
df$group[which(l$residuals >= 0)] = "above"
df$group[which(l$residuals < 0)] = "below"
# Make the plot
ggplot(df, aes(x,y)) +
geom_point(aes(colour = group)) +
geom_smooth(method = "lm", formula = y ~ x)
But I would like to do regression for y-1. As asked in this question.
# Fit linear regression
l = lm(y - 1 ~ x, data = df)
# Make new group variable based on residuals
df$group = NA
df$group[which(l$residuals >= 0)] = "above"
df$group[which(l$residuals < 0)] = "below"
# Make the plot
ggplot(df, aes(x,y)) +
geom_point(aes(colour = group)) +
geom_smooth(method = "lm", formula = y - 1 ~ x)
This is not what I expected. It looks to me that stat_smooth did what expected. The lm however gives the same result for y ~ x and y - 1 ~ x
What am I missing here?
If you want to color points based on where they lie according to the line, you can try comparing the actual value to the predicted value rather than using the residual
df$group = NA
df$group[df$y>predict(l)] = "above"
df$group[df$y<predict(l)] = "below"

Resources