How to create prediction line for Quadratic Model - r

I am trying to create a quadratic prediction line for a quadratic model. I am using the Auto dataset that comes with R. I had no trouble creating the prediction line for a linear model. However, the quadratic model yields crazy looking lines. Here is my code.
# Linear Model
plot(Auto$horsepower, Auto$mpg,
main = "MPG versus Horsepower",
pch = 20)
lin_mod = lm(mpg ~ horsepower,
data = Auto)
lin_pred = predict(lin_mod)
lines(
Auto$horsepower, lin_pred,
col = "blue", lwd = 2
)
# The Quadratic model
Auto$horsepower2 = Auto$horsepower^2
quad_model = lm(mpg ~ horsepower2,
data = Auto)
quad_pred = predict(quad_model)
lines(
Auto$horsepower,
quad_pred,
col = "red", lwd = 2
)
I am 99% sure that the issue is the prediction function. Why can't I produce a neat looking quadratic prediction curve? The following code I tried does not work—could it be related?:
quad_pred = predict(quad_model, data.frame(horsepower = Auto$horsepower))
Thanks!

The issue is that the x-axis values aren't sorted. It wouldn't matter if was a linear model but it would be noticeable if it was polynomial. I created a new sorted data set and it works fine:
library(ISLR) # To load data Auto
# Linear Model
plot(Auto$horsepower, Auto$mpg,
main = "MPG versus Horsepower",
pch = 20)
lin_mod = lm(mpg ~ horsepower,
data = Auto)
lin_pred = predict(lin_mod)
lines(
Auto$horsepower, lin_pred,
col = "blue", lwd = 2
)
# The Quadratic model
Auto$horsepower2 = Auto$horsepower^2
# Sorting Auto by horsepower2
Auto2 <- Auto[order(Auto$horsepower2), ]
quad_model = lm(mpg ~ horsepower2,
data = Auto2)
quad_pred = predict(quad_model)
lines(
Auto2$horsepower,
quad_pred,
col = "red", lwd = 2
)

One option is to create the sequence of x-values for which you would like to plot the fitted lines. This can be useful if your data has a "gap" or if you wish to plot the fitted lines outside of the range of the x-variables.
# load dataset; if necessary run install.packages("ISLR")
data(Auto, package = "ISLR")
# since only 2 variables at issue, use short names
mpg <- Auto$mpg
hp <- Auto$horsepower
# fit linear and quadratic models
lmod <- lm(mpg ~ hp)
qmod <- lm(mpg ~ hp + I(hp^2))
# plot the data
plot(x=hp, y=mpg, pch=20)
# use predict() to find coordinates of points to plot
x_coords <- seq(from=floor(min(hp)), to=ceiling(max(hp)), by=1)
y_coords_lmod <- predict(lmod, newdata=data.frame(hp=x_coords))
y_coords_qmod <- predict(qmod, newdata=data.frame(hp=x_coords))
# alternatively, calculate this manually using the fitted coefficients
y_coords_lmod <- coef(lmod)[1] + coef(lmod)[2]*x_coords
y_coords_qmod <- coef(qmod)[1] + coef(qmod)[2]*x_coords + coef(qmod)[3]*x_coords^2
# add the fitted lines to the plot
points(x=x_coords, y=y_coords_lmod, type="l", col="blue")
points(x=x_coords, y=y_coords_qmod, type="l", col="red")

Alternatively, using ggplot2:
ggplot(Auto, aes(x = horsepower, y = mpg)) + geom_point() +
stat_smooth(aes(x = horsepower, y = mpg), method = "lm", formula = y ~ x, colour = "red") +
stat_smooth(aes(x = horsepower, y = mpg), method = "lm", formula = y ~ poly(x, 2), colour = "blue")

Related

Plot output of non-linear model output in ggplot2

I have some data where the best fitting non-linear regression is the S curve model. I want to plot the S curve in ggplot2 but do not know how to specify this model. I assume I should use the following code but do not know how to specify the method or formula. Can anyone help?
'''geom_smooth(method = XXX,
method.args = list(formula = XXX)'''
You can wrap a prediction in geom_function(). Example with a built-in dataset below:
library(ggplot2)
# From the ?nls examples
df <- subset(DNase, Run == 1)
fit <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), df)
ggplot(df, aes(conc, density)) +
geom_point() +
geom_function(
fun = function(x) {
predict(fit, newdata = data.frame(conc = x))
},
colour = "red",
) +
scale_x_continuous(trans = "log10")

How to create a 2nd order trendline in R [duplicate]

I have a simple polynomial regression which I do as follows
attach(mtcars)
fit <- lm(mpg ~ hp + I(hp^2))
Now, I plot as follows
> plot(mpg~hp)
> points(hp, fitted(fit), col='red', pch=20)
This gives me the following
I want to connect these points into a smooth curve, using lines gives me the following
> lines(hp, fitted(fit), col='red', type='b')
What am I missing here. I want the output to be a smooth curve which connects the points
I like to use ggplot2 for this because it's usually very intuitive to add layers of data.
library(ggplot2)
fit <- lm(mpg ~ hp + I(hp^2), data = mtcars)
prd <- data.frame(hp = seq(from = range(mtcars$hp)[1], to = range(mtcars$hp)[2], length.out = 100))
err <- predict(fit, newdata = prd, se.fit = TRUE)
prd$lci <- err$fit - 1.96 * err$se.fit
prd$fit <- err$fit
prd$uci <- err$fit + 1.96 * err$se.fit
ggplot(prd, aes(x = hp, y = fit)) +
theme_bw() +
geom_line() +
geom_smooth(aes(ymin = lci, ymax = uci), stat = "identity") +
geom_point(data = mtcars, aes(x = hp, y = mpg))
Try:
lines(sort(hp), fitted(fit)[order(hp)], col='red', type='b')
Because your statistical units in the dataset are not ordered, thus, when you use lines it's a mess.
Generally a good way to go is to use the predict() function. Pick some x values, use predict() to generate corresponding y values, and plot them. It can look something like this:
newdat = data.frame(hp = seq(min(mtcars$hp), max(mtcars$hp), length.out = 100))
newdat$pred = predict(fit, newdata = newdat)
plot(mpg ~ hp, data = mtcars)
with(newdat, lines(x = hp, y = pred))
See Roman's answer for a fancier version of this method, where confidence intervals are calculated too. In both cases the actual plotting of the solution is incidental - you can use base graphics or ggplot2 or anything else you'd like - the key is just use the predict function to generate the proper y values. It's a good method because it extends to all sorts of fits, not just polynomial linear models. You can use it with non-linear models, GLMs, smoothing splines, etc. - anything with a predict method.

Having several fits in one plot (in R)

I was wondering how I can modify the following code to have a plot something like
data(airquality)
library(quantreg)
library(ggplot2)
library(data.table)
library(devtools)
# source Quantile LOESS
source("https://www.r-statistics.com/wp-content/uploads/2010/04/Quantile.loess_.r.txt")
airquality2 <- na.omit(airquality[ , c(1, 4)])
#'' quantreg::rq
rq_fit <- rq(Ozone ~ Temp, 0.95, airquality2)
rq_fit_df <- data.table(t(coef(rq_fit)))
names(rq_fit_df) <- c("intercept", "slope")
#'' quantreg::lprq
lprq_fit <- lapply(1:3, function(bw){
fit <- lprq(airquality2$Temp, airquality2$Ozone, h = bw, tau = 0.95)
return(data.table(x = fit$xx, y = fit$fv, bw = paste0("bw=", bw), fit = "quantreg::lprq"))
})
#'' Quantile LOESS
ql_fit <- Quantile.loess(airquality2$Ozone, jitter(airquality2$Temp), window.size = 10,
the.quant = .95, window.alignment = c("center"))
ql_fit_df <- data.table(x = ql_fit$x, y = ql_fit$y.loess, bw = "bw=1", fit = "Quantile LOESS")
I want to have all these fits in a plot.
geom_quantile can calculate quantiles using the rq method internally, so we don't need to create the rq_fit_df separately. However, the lprq and Quantile LOESS methods aren't available within geom_quantile, so I've used the data frames you provided and plotted them using geom_line.
In addition, to include the rq line in the color and linetype mappings and in the legend we add aes(colour="rq", linetype="rq") as a sort of "artificial" mapping inside geom_quantile.
library(dplyr) # For bind_rows()
ggplot(airquality2, aes(Temp, Ozone)) +
geom_point() +
geom_quantile(quantiles=0.95, formula=y ~ x, aes(colour="rq", linetype="rq")) +
geom_line(data=bind_rows(lprq_fit, ql_fit_df),
aes(x, y, colour=paste0(gsub("q.*:","",fit),": ", bw),
linetype=paste0(gsub("q.*:","",fit),": ", bw))) +
theme_bw() +
scale_linetype_manual(values=c(2,4,5,1,1)) +
labs(colour="Method", linetype="Method",
title="Different methods of estimating the 95th percentile by quantile regression")

plotting polynomials in R [duplicate]

I have a simple polynomial regression which I do as follows
attach(mtcars)
fit <- lm(mpg ~ hp + I(hp^2))
Now, I plot as follows
> plot(mpg~hp)
> points(hp, fitted(fit), col='red', pch=20)
This gives me the following
I want to connect these points into a smooth curve, using lines gives me the following
> lines(hp, fitted(fit), col='red', type='b')
What am I missing here. I want the output to be a smooth curve which connects the points
I like to use ggplot2 for this because it's usually very intuitive to add layers of data.
library(ggplot2)
fit <- lm(mpg ~ hp + I(hp^2), data = mtcars)
prd <- data.frame(hp = seq(from = range(mtcars$hp)[1], to = range(mtcars$hp)[2], length.out = 100))
err <- predict(fit, newdata = prd, se.fit = TRUE)
prd$lci <- err$fit - 1.96 * err$se.fit
prd$fit <- err$fit
prd$uci <- err$fit + 1.96 * err$se.fit
ggplot(prd, aes(x = hp, y = fit)) +
theme_bw() +
geom_line() +
geom_smooth(aes(ymin = lci, ymax = uci), stat = "identity") +
geom_point(data = mtcars, aes(x = hp, y = mpg))
Try:
lines(sort(hp), fitted(fit)[order(hp)], col='red', type='b')
Because your statistical units in the dataset are not ordered, thus, when you use lines it's a mess.
Generally a good way to go is to use the predict() function. Pick some x values, use predict() to generate corresponding y values, and plot them. It can look something like this:
newdat = data.frame(hp = seq(min(mtcars$hp), max(mtcars$hp), length.out = 100))
newdat$pred = predict(fit, newdata = newdat)
plot(mpg ~ hp, data = mtcars)
with(newdat, lines(x = hp, y = pred))
See Roman's answer for a fancier version of this method, where confidence intervals are calculated too. In both cases the actual plotting of the solution is incidental - you can use base graphics or ggplot2 or anything else you'd like - the key is just use the predict function to generate the proper y values. It's a good method because it extends to all sorts of fits, not just polynomial linear models. You can use it with non-linear models, GLMs, smoothing splines, etc. - anything with a predict method.

Plot the observed and fitted values from a linear regression using xyplot() from the lattice package

I can create simple graphs. I would like to have observed and predicted values (from a linear regression) on the same graph. I am plotting say Yvariable vs Xvariable. There is only 1 predictor and only 1 response. How could I also add linear regression curve to the same graph?
So to conclude need help with:
plotting actuals and predicted both
plotting regression line
Here is one option for the observed and predicted values in a single plot as points. It is easier to get the regression line on the observed points, which I illustrate second
First some dummy data
set.seed(1)
x <- runif(50)
y <- 2.5 + (3 * x) + rnorm(50, mean = 2.5, sd = 2)
dat <- data.frame(x = x, y = y)
Fit our model
mod <- lm(y ~ x, data = dat)
Combine the model output and observed x into a single object for plott
res <- stack(data.frame(Observed = dat$y, Predicted = fitted(mod)))
res <- cbind(res, x = rep(dat$x, 2))
head(res)
Load lattice and plot
require("lattice")
xyplot(values ~ x, data = res, group = ind, auto.key = TRUE)
The resulting plot should look similar to this
To get just the regression line on the observed data, and the regression model is a simple straight line model as per the one I show then you can circumvent most of this and just plot using
xyplot(y ~ x, data = dat, type = c("p","r"), col.line = "red")
(i.e. you don't even need to fit the model or make new data for plotting)
The resulting plot should look like this
An alternative to the first example which can be used with anything that will give coefficients for the regression line is to write your own panel functions - not as scary as it seems
xyplot(y ~ x, data = dat, col.line = "red",
panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.abline(coef = coef(mod), ...) ## using mod from earlier
}
)
That gives a plot from Figure 2 above, but by hand.
Assuming you've done this with caret then
mod <- train(y ~ x, data = dat, method = "lm",
trControl = trainControl(method = "cv"))
xyplot(y ~ x, data = dat, col.line = "red",
panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.abline(coef = coef(mod$finalModel), ...) ## using mod from caret
}
)
Will produce a plot the same as Figure 2 above.
Another option is to use panel.lmlineq from latticeExtra.
library(latticeExtra)
set.seed(0)
xsim <- rnorm(50, mean = 3)
ysim <- (0 + 2 * xsim) * (1 + rnorm(50, sd = 0.3))
## basic use as a panel function
xyplot(ysim ~ xsim, panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.lmlineq(x, y, adj = c(1,0), lty = 1,xol.text='red',
col.line = "blue", digits = 1,r.squared =TRUE)
})

Resources