How to use smoothing splines in gam in the R package mgcv - r

The question is that is this the correct way to specify the knots in the smoothing spline in gam in mgcv?
The confusion part is that in the vignette, it says the k is the dimension of the basis used to represent the smooth term.
(Previously I thought that in the "cr" setting, the dimension of the basis is 3. After reading p. 149-150 (GAM, an introduction to R), it seems that the gam uses a set of k basis to write the cubic regression splines.)
However, in the post below, it shows that k is actually the number of knots. This is verified by the code below
# reference
# https://stackoverflow.com/questions/40056566/mgcv-how-to-set-number-and-or-locations-of-knots-for-splines
library(mgcv)
## toy data
set.seed(0); x <- sort(rnorm(400, 0, pi)) ## note, my x are not uniformly sampled
set.seed(1); e <- rnorm(400, 0, 0.4)
y0 <- sin(x) + 0.2 * x + cos(abs(x))
y <- y0 + e
## fitting natural cubic spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = 20))
cr_knots <- cr_fit$smooth[[1]]$xp ## extract knots locations
par(mfrow = c(1,2))
plot(x, y, col= "blue", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
Then, to use the smoothing spline, should I assign the knots manually in the argument of gam? The attempted code is below:
## fitting natural cubic spline, smoothing spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = length(x)), knots=list(x))
cr_knots <- cr_fit$smooth[[1]]$xp ## extract knots locations
## summary plot
par(mfrow = c(1,2))
plot(x, y, col= "blue", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
plot(x,cr_knots)
cr_fit$sp
Is this understanding correct?
If yes, then how can I implement the smoothing splines method with the gam in the mgcv?

Related

Creating a Smooth Line in 3D R

I have a set of 3-dimensional points, like the sample data below. I would like to create a smooth line from it. There's information out there about to smooth a 2D surface in 3D space but how would I smooth a 1D line in 3D space?
Z = seq(0, 1, 0.01)
X = rnorm(length(Z), mean = 0, sd = 0.1)
Y = 2 * Z ^ 2 + rnorm(length(Z), mean = 0, sd = 0.1)
data = data.frame(X = X, Y = Y, Z= Z)
This is an example of multivariate regression. If you happen to know that the relationship with Z should be quadratic, you can do
fit <- lm(cbind(X, Y) ~ poly(Z, 2))
But I'm assuming you don't know that, and want some kind of general smoother. I don't think loess, lowess, or gam handle multivariate regression, but you can use natural splines in lm:
library(splines)
fit <- lm(cbind(X, Y) ~ ns(Z, df = 4))
The fitted values will be returned in a two-column matrix by predict(fit).
To plot the result, you can use rgl:
library(rgl)
plot3d(X, Y, Z, col = "red")
lines3d(cbind(predict(fit), Z))

How can I plot the regression lines analysis in R

I want to plot regression lines in R for technical analysis.
First, I regress the price on the date and I get the main regression line. However, also, I need lines that correspond to (Main regression line +- 2*standard deviation).
Do you know how I can implement this? I already checked the TTR package, but I couldn't find a built-in indicator for this purpose.
Thank you.
To obtain points on the regression line, you can use the function predict on the fitted model. For confidence intervals, use the options interval and level, e.g.:
lsq <- lm(y ~ x, data)
predict(lsq, data.frame(x=c(12,45), interval="confidence", level=0.95)
To expand on #cdalitz answer this is how you plot the regression line with the confidence interval:
# Generate data
set.seed(123)
n = 100
x = runif(n)
y = 2 * x + rnorm(n, sd = 0.5)
m = lm(y ~ x)
newx = seq(min(x), max(x), length.out = 100)
pred = predict(m, newdata = data.frame(x = newx), interval="confidence", level=0.95)
# Plot data
plot(x, y)
# Plot model
abline(m)
# Plot 95% confidence interval
lines(newx, pred[, 2], col = "red", lty = 2)
lines(newx, pred[, 3], col = "red", lty = 2)
This question also shows many ways to do the same thing.

Generate smoothing splines for one feature variable in R

After reading materials about smoothing splines, I want to use the following R code to generate the smoothing spline for the feature variable x.
Here is what I did to obtain the bases for the smoothing spline for the feature variable x:
x = sort(rnorm(30)) # x is the feature variable
px = stats::poly(x, degree = 3) # orthogonal polynomial basis
smooth_spline_basis1 = smooth.spline(x, px[,1],df=3, all.knots = TRUE)$y
smooth_spline_basis2 = smooth.spline(x, px[,2],df=3, all.knots = TRUE)$y
smooth_spline_basis3 = smooth.spline(x, px[,3],df=3, all.knots = TRUE)$y
par(mfrow=c(2,2))
plot(px[,1],smooth_spline_basis1, main = "smoothing_spline_basis1 VS polynomial_spline_basis1")
plot(px[,2],smooth_spline_basis2, main = "smoothing_spline_basis2 VS polynomial_spline_basis2")
plot(px[,3],smooth_spline_basis3, main = "smoothing_spline_basis3 VS polynomial_spline_basis3")
par(mfrow=c(1,1))
Does the thought process correct? Or am I missing something?
The package mgcv gives you a nice spline smoother with the function gam() for generalized additive models. Here is an example where a spline is fitted to a sin-curve:
library(mgcv)
x <- seq(0, 2 * pi, length.out = 100)
y <- sin(x)
mod <- gam(y ~ s(x))
summary(mod)
plot(x, y)
lines(x, fitted(mod), col = "green", lwd = 2)

How are the "plot.gam" confidence intervals calculated?

If a model is fitted using mgcv and then the smooth terms are plotted,
m <- gam(y ~ s(x))
plot(m, shade = TRUE)
then you get a plot of the curve with a confidence interval. These are, I presume, pointwise-confidence intervals. How are they computed?
I tried to write
object <- plot(m, shade = true)
object[[1]]$fit +- 2*object[[1]]$se
in order to extract the lower and upper bounds using the standard errors and a multiplier of 2, but when I plot it, it looks a bit different than the confidence intervals plotted by plot.gam?
So, how are those calculated?
I do not use seWithMean = true or anything like that.
It is 1 standard deviation.
oo <- plot.gam(m)
oo <- oo[[1]]
points(oo$x, oo$fit, pch = 20)
points(oo$x, oo$fit - oo$se, pch = 20)
Reproducible example:
x <- seq(0, 2 * pi, length = 100)
y <- x * sin(x) + rnorm(100, 0, 0.5)
m <- gam(y ~ s(x))

How to perform a non linear regression for my data

I have set of Temperature and Discomfort index value for each temperature data. When I plot a graph between temperature(x axis) and Calculated Discomfort index value( y axis) I get a reversed U-shape curve. I want to do non linear regression out of it and convert it into PMML model. My aim is to get the predicted discomfort value if I give certain temperature.
Please find the below dataset :
Temp <- c(0,5,10,6 ,9,13,15,16,20,21,24,26,29,30,32,34,36,38,40,43,44,45, 50,60)
Disc<-c(0.00,0.10,0.25,0.15,0.24,0.26,0.30,0.31,0.40,0.41,0.49,0.50,0.56, 0.80,0.90,1.00,1.00,1.00,0.80,0.50,0.40,0.20,0.15,0.00)
How to do non linear regression (possibly with nls??) for this dataset?
I did take a look at this, then I think it is not as simple as using nls as most of us first thought.
nls fits a parametric model, but from your data (the scatter plot), it is hard to propose a reasonable model assumption. I would suggest using non-parametric smoothing for this.
There are many scatter plot smoothing methods, like kernel smoothing ksmooth, smoothing spline smooth.spline and LOESS loess. I prefer to using smooth.spline, and here is what we can do with it:
fit <- smooth.spline(Temp, Disc)
Please read ?smooth.spline for what it takes and what it returns. We can check the fitted spline curve by
plot(Temp, Disc)
lines(fit, col = 2)
Should you want to make prediction elsewhere, use predict function (predict.smooth.spline). For example, if we want to predict Temp = 20 and Temp = 44, we can use
predict(fit, c(20,44))$y
# [1] 0.3940963 0.3752191
Prediction outside range(Temp) is not recommended, as it suffers from potential bad extrapolation effect.
Before I resort to non-parametric method, I also tried non-linear regression with regression splines and orthogonal polynomial basis, but they don't provide satisfying result. The major reason is that there is no penalty on the smoothness. As an example, I show some try with poly:
try1 <- lm(Disc ~ poly(Temp, degree = 3))
try2 <- lm(Disc ~ poly(Temp, degree = 4))
try3 <- lm(Disc ~ poly(Temp, degree = 5))
plot(Temp, Disc, ylim = c(-0.3,1.0))
x<- seq(min(Temp), max(Temp), length = 50)
newdat <- list(Temp = x)
lines(x, predict(try1, newdat), col = 2)
lines(x, predict(try2, newdat), col = 3)
lines(x, predict(try3, newdat), col = 4)
We can see that the fitted curve is artificial.
We can fit polynomials as follows, but it's going to overfit the data as we have higher degree:
m <- nls(Disc ~ a + b*Temp + c*Temp^2 + d*Temp^3 + e*Temp^4, start=list(a=0, b=1, c=1, d=1, e=1))
plot(Temp,Disc,pch=19)
lines(Temp,predict(m),lty=2,col="red",lwd=3)
m <- nls(Disc ~ a + b*Temp + c*Temp^2 + d*Temp^3 + e*Temp^4 + f*Temp^5, start=list(a=0, b=1, c=1, d=1, e=1, f=1))
lines(Temp,predict(m),lty=2,col="blue",lwd=3)
m <- nls(Disc ~ a + b*Temp + c*Temp^2 + d*Temp^3 + e*Temp^4 + f*Temp^5 + g*Temp^6, start=list(a=0, b=1, c=1, d=1, e=1, f=1, g=1))
lines(Temp,predict(m),lty=2,col="green",lwd=3)
m.poly <- lm(Disc ~ poly(Temp, degree = 15))
lines(Temp,predict(m),lty=2,col="yellow",lwd=3)
legend(x = "topleft", legend = c("Deg 4", "Deg 5", "Deg 6", "Deg 20"),
col = c("red", "green", "blue", "yellow"),
lty = 2)

Resources