I have a set of 3-dimensional points, like the sample data below. I would like to create a smooth line from it. There's information out there about to smooth a 2D surface in 3D space but how would I smooth a 1D line in 3D space?
Z = seq(0, 1, 0.01)
X = rnorm(length(Z), mean = 0, sd = 0.1)
Y = 2 * Z ^ 2 + rnorm(length(Z), mean = 0, sd = 0.1)
data = data.frame(X = X, Y = Y, Z= Z)
This is an example of multivariate regression. If you happen to know that the relationship with Z should be quadratic, you can do
fit <- lm(cbind(X, Y) ~ poly(Z, 2))
But I'm assuming you don't know that, and want some kind of general smoother. I don't think loess, lowess, or gam handle multivariate regression, but you can use natural splines in lm:
library(splines)
fit <- lm(cbind(X, Y) ~ ns(Z, df = 4))
The fitted values will be returned in a two-column matrix by predict(fit).
To plot the result, you can use rgl:
library(rgl)
plot3d(X, Y, Z, col = "red")
lines3d(cbind(predict(fit), Z))
Related
I want to plot a discontinuous surface using the persp function.
Here is the function:
f <- function(x, y)
{
r <- sqrt(x^2 + y^2)
out <- numeric(length(r))
ok <- r >= 1
out[ok] <- exp(-(r[ok] - 1))
return(out)
}
To get a perspective plot of the function on a regular grid, I use
x <- y <- seq(-4, 4, length.out = 50)
z <- outer(x, y, f)
persp(x, y, z, , theta = 30, phi = 30, expand = 0.5, col = "lightblue")
The resulting plot does not properly show the circular nature of discontinuity points of the surface. Any suggestion about how to obtain a better perspective plot, instead of contour plot or image?
If something interactive works for you, I would go for something like this:
library(plotly)
plot_ly(z = ~ z) %>% add_surface()
Because the circular nature is best seen from above, a phi of 90 would be best to highlight this feature, but then you lose the rest of the shape and it is pretty useless. Hence, I would go for something interactive.
persp(x, y, z, , theta = 30, phi = 30, expand = 0.5, col = "lightblue")
After reading materials about smoothing splines, I want to use the following R code to generate the smoothing spline for the feature variable x.
Here is what I did to obtain the bases for the smoothing spline for the feature variable x:
x = sort(rnorm(30)) # x is the feature variable
px = stats::poly(x, degree = 3) # orthogonal polynomial basis
smooth_spline_basis1 = smooth.spline(x, px[,1],df=3, all.knots = TRUE)$y
smooth_spline_basis2 = smooth.spline(x, px[,2],df=3, all.knots = TRUE)$y
smooth_spline_basis3 = smooth.spline(x, px[,3],df=3, all.knots = TRUE)$y
par(mfrow=c(2,2))
plot(px[,1],smooth_spline_basis1, main = "smoothing_spline_basis1 VS polynomial_spline_basis1")
plot(px[,2],smooth_spline_basis2, main = "smoothing_spline_basis2 VS polynomial_spline_basis2")
plot(px[,3],smooth_spline_basis3, main = "smoothing_spline_basis3 VS polynomial_spline_basis3")
par(mfrow=c(1,1))
Does the thought process correct? Or am I missing something?
The package mgcv gives you a nice spline smoother with the function gam() for generalized additive models. Here is an example where a spline is fitted to a sin-curve:
library(mgcv)
x <- seq(0, 2 * pi, length.out = 100)
y <- sin(x)
mod <- gam(y ~ s(x))
summary(mod)
plot(x, y)
lines(x, fitted(mod), col = "green", lwd = 2)
The question is that is this the correct way to specify the knots in the smoothing spline in gam in mgcv?
The confusion part is that in the vignette, it says the k is the dimension of the basis used to represent the smooth term.
(Previously I thought that in the "cr" setting, the dimension of the basis is 3. After reading p. 149-150 (GAM, an introduction to R), it seems that the gam uses a set of k basis to write the cubic regression splines.)
However, in the post below, it shows that k is actually the number of knots. This is verified by the code below
# reference
# https://stackoverflow.com/questions/40056566/mgcv-how-to-set-number-and-or-locations-of-knots-for-splines
library(mgcv)
## toy data
set.seed(0); x <- sort(rnorm(400, 0, pi)) ## note, my x are not uniformly sampled
set.seed(1); e <- rnorm(400, 0, 0.4)
y0 <- sin(x) + 0.2 * x + cos(abs(x))
y <- y0 + e
## fitting natural cubic spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = 20))
cr_knots <- cr_fit$smooth[[1]]$xp ## extract knots locations
par(mfrow = c(1,2))
plot(x, y, col= "blue", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
Then, to use the smoothing spline, should I assign the knots manually in the argument of gam? The attempted code is below:
## fitting natural cubic spline, smoothing spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = length(x)), knots=list(x))
cr_knots <- cr_fit$smooth[[1]]$xp ## extract knots locations
## summary plot
par(mfrow = c(1,2))
plot(x, y, col= "blue", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
plot(x,cr_knots)
cr_fit$sp
Is this understanding correct?
If yes, then how can I implement the smoothing splines method with the gam in the mgcv?
set.seed(1); x <- round(rnorm(30), 1); y <- sin(pi * x) + rnorm(30)/10
plot(x, y, main = "spline(x,y) when x has ties")
lines(spline(x, y, n = 201), col = 2)
Is there a way to adjust the smoothness of the spline? Especially from -0.5 and onwards, there are wiggly parts that could be smoother. I have looked at the documentation but there doesn't seem to be a straightforward parameter that does this (something like spar in smooth.spline).
loess is one method, but if you want to use splines, use smooth.spline, not the interpolating spline
set.seed(1);
x <- round(rnorm(30), 1);
y <- sin(pi * x) + rnorm(30)/10
plot(x, y, main = "spline(x,y) when x has ties")
sm <- smooth.spline(x, y, spar = 0.5) # play with spar
pred <- predict(sm, seq(-2, 2, by = 0.1))
lines(pred, col = "red")
There is a problem with this solution: note that in the negative region where points are less dense, the fit is not so good. loess is more local (that's what the l stands for), so it might be better.
I would use LOESS for smoothing:
lines(loess.smooth(y=spl[["y"]], x=spl[["x"]], span = 0.05), col=2)
Adjust span as needed.
If a model is fitted using mgcv and then the smooth terms are plotted,
m <- gam(y ~ s(x))
plot(m, shade = TRUE)
then you get a plot of the curve with a confidence interval. These are, I presume, pointwise-confidence intervals. How are they computed?
I tried to write
object <- plot(m, shade = true)
object[[1]]$fit +- 2*object[[1]]$se
in order to extract the lower and upper bounds using the standard errors and a multiplier of 2, but when I plot it, it looks a bit different than the confidence intervals plotted by plot.gam?
So, how are those calculated?
I do not use seWithMean = true or anything like that.
It is 1 standard deviation.
oo <- plot.gam(m)
oo <- oo[[1]]
points(oo$x, oo$fit, pch = 20)
points(oo$x, oo$fit - oo$se, pch = 20)
Reproducible example:
x <- seq(0, 2 * pi, length = 100)
y <- x * sin(x) + rnorm(100, 0, 0.5)
m <- gam(y ~ s(x))