I want to plot regression lines in R for technical analysis.
First, I regress the price on the date and I get the main regression line. However, also, I need lines that correspond to (Main regression line +- 2*standard deviation).
Do you know how I can implement this? I already checked the TTR package, but I couldn't find a built-in indicator for this purpose.
Thank you.
To obtain points on the regression line, you can use the function predict on the fitted model. For confidence intervals, use the options interval and level, e.g.:
lsq <- lm(y ~ x, data)
predict(lsq, data.frame(x=c(12,45), interval="confidence", level=0.95)
To expand on #cdalitz answer this is how you plot the regression line with the confidence interval:
# Generate data
set.seed(123)
n = 100
x = runif(n)
y = 2 * x + rnorm(n, sd = 0.5)
m = lm(y ~ x)
newx = seq(min(x), max(x), length.out = 100)
pred = predict(m, newdata = data.frame(x = newx), interval="confidence", level=0.95)
# Plot data
plot(x, y)
# Plot model
abline(m)
# Plot 95% confidence interval
lines(newx, pred[, 2], col = "red", lty = 2)
lines(newx, pred[, 3], col = "red", lty = 2)
This question also shows many ways to do the same thing.
Related
After reading materials about smoothing splines, I want to use the following R code to generate the smoothing spline for the feature variable x.
Here is what I did to obtain the bases for the smoothing spline for the feature variable x:
x = sort(rnorm(30)) # x is the feature variable
px = stats::poly(x, degree = 3) # orthogonal polynomial basis
smooth_spline_basis1 = smooth.spline(x, px[,1],df=3, all.knots = TRUE)$y
smooth_spline_basis2 = smooth.spline(x, px[,2],df=3, all.knots = TRUE)$y
smooth_spline_basis3 = smooth.spline(x, px[,3],df=3, all.knots = TRUE)$y
par(mfrow=c(2,2))
plot(px[,1],smooth_spline_basis1, main = "smoothing_spline_basis1 VS polynomial_spline_basis1")
plot(px[,2],smooth_spline_basis2, main = "smoothing_spline_basis2 VS polynomial_spline_basis2")
plot(px[,3],smooth_spline_basis3, main = "smoothing_spline_basis3 VS polynomial_spline_basis3")
par(mfrow=c(1,1))
Does the thought process correct? Or am I missing something?
The package mgcv gives you a nice spline smoother with the function gam() for generalized additive models. Here is an example where a spline is fitted to a sin-curve:
library(mgcv)
x <- seq(0, 2 * pi, length.out = 100)
y <- sin(x)
mod <- gam(y ~ s(x))
summary(mod)
plot(x, y)
lines(x, fitted(mod), col = "green", lwd = 2)
The question is that is this the correct way to specify the knots in the smoothing spline in gam in mgcv?
The confusion part is that in the vignette, it says the k is the dimension of the basis used to represent the smooth term.
(Previously I thought that in the "cr" setting, the dimension of the basis is 3. After reading p. 149-150 (GAM, an introduction to R), it seems that the gam uses a set of k basis to write the cubic regression splines.)
However, in the post below, it shows that k is actually the number of knots. This is verified by the code below
# reference
# https://stackoverflow.com/questions/40056566/mgcv-how-to-set-number-and-or-locations-of-knots-for-splines
library(mgcv)
## toy data
set.seed(0); x <- sort(rnorm(400, 0, pi)) ## note, my x are not uniformly sampled
set.seed(1); e <- rnorm(400, 0, 0.4)
y0 <- sin(x) + 0.2 * x + cos(abs(x))
y <- y0 + e
## fitting natural cubic spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = 20))
cr_knots <- cr_fit$smooth[[1]]$xp ## extract knots locations
par(mfrow = c(1,2))
plot(x, y, col= "blue", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
Then, to use the smoothing spline, should I assign the knots manually in the argument of gam? The attempted code is below:
## fitting natural cubic spline, smoothing spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = length(x)), knots=list(x))
cr_knots <- cr_fit$smooth[[1]]$xp ## extract knots locations
## summary plot
par(mfrow = c(1,2))
plot(x, y, col= "blue", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
plot(x,cr_knots)
cr_fit$sp
Is this understanding correct?
If yes, then how can I implement the smoothing splines method with the gam in the mgcv?
If a model is fitted using mgcv and then the smooth terms are plotted,
m <- gam(y ~ s(x))
plot(m, shade = TRUE)
then you get a plot of the curve with a confidence interval. These are, I presume, pointwise-confidence intervals. How are they computed?
I tried to write
object <- plot(m, shade = true)
object[[1]]$fit +- 2*object[[1]]$se
in order to extract the lower and upper bounds using the standard errors and a multiplier of 2, but when I plot it, it looks a bit different than the confidence intervals plotted by plot.gam?
So, how are those calculated?
I do not use seWithMean = true or anything like that.
It is 1 standard deviation.
oo <- plot.gam(m)
oo <- oo[[1]]
points(oo$x, oo$fit, pch = 20)
points(oo$x, oo$fit - oo$se, pch = 20)
Reproducible example:
x <- seq(0, 2 * pi, length = 100)
y <- x * sin(x) + rnorm(100, 0, 0.5)
m <- gam(y ~ s(x))
I have the following data:
I plotted the points of that data and then smoothed it on the plot using the following code :
scatter.smooth(x=1:length(Ticker$ROIC[!is.na(Ticker$ROIC)]),
y=Ticker$ROIC[!is.na(Ticker$ROIC)],col = "#AAAAAA",
ylab = "ROIC Values", xlab = "Quarters since Feb 29th 2012 till Dec 31st 2016")
Now I want to find the Point-wise slope of this smoothed curve. Also fit a trend line to the smoothed graph. How can I do that?
There are some interesting R packages that implement nonparametric derivative estimation. The short review of Newell and Einbeck can be helpful: http://maths.dur.ac.uk/~dma0je/Papers/newell_einbeck_iwsm07.pdf
Here we consider an example based on the pspline package (smoothing splines with penalties on order m derivatives):
The data generating process is a negative logistic models with an additive noise (hence y values are all negative like the ROIC variable of #ForeverLearner) :
set.seed(1234)
x <- sort(runif(200, min=-5, max=5))
y = -1/(1+exp(-x))-1+0.1*rnorm(200)
We start plotting the nonparametric estimation of the curve (the black line is the true curve and the red one the estimated curve):
library(pspline)
pspl <- smooth.Pspline(x, y, df=5, method=3)
f0 <- predict(pspl, x, nderiv=0)
Then, we estimate the first derivative of the curve:
f1 <- predict(pspl, x, nderiv=1)
curve(-exp(-x)/(1+exp(-x))^2,-5,5, lwd=2, ylim=c(-.3,0))
lines(x, f1, lwd=3, lty=2, col="red")
And here the second derivative:
f2 <- predict(pspl, x, nderiv=2)
curve((exp(-x))/(1+exp(-x))^2-2*exp(-2*x)/(1+exp(-x))^3, -5, 5,
lwd=2, ylim=c(-.15,.15), ylab=)
lines(x, f2, lwd=3, lty=2, col="red")
#DATA
set.seed(42)
x = rnorm(20)
y = rnorm(20)
#Plot the points
plot(x, y, type = "p")
#Obtain points for the smooth curve
temp = loess.smooth(x, y, evaluation = 50) #Use higher evaluation for more points
#Plot smooth curve
lines(temp$x, temp$y, lwd = 2)
#Obtain slope of the smooth curve
slopes = diff(temp$y)/diff(temp$x)
#Add a trend line
abline(lm(y~x))
I have set of Temperature and Discomfort index value for each temperature data. When I plot a graph between temperature(x axis) and Calculated Discomfort index value( y axis) I get a reversed U-shape curve. I want to do non linear regression out of it and convert it into PMML model. My aim is to get the predicted discomfort value if I give certain temperature.
Please find the below dataset :
Temp <- c(0,5,10,6 ,9,13,15,16,20,21,24,26,29,30,32,34,36,38,40,43,44,45, 50,60)
Disc<-c(0.00,0.10,0.25,0.15,0.24,0.26,0.30,0.31,0.40,0.41,0.49,0.50,0.56, 0.80,0.90,1.00,1.00,1.00,0.80,0.50,0.40,0.20,0.15,0.00)
How to do non linear regression (possibly with nls??) for this dataset?
I did take a look at this, then I think it is not as simple as using nls as most of us first thought.
nls fits a parametric model, but from your data (the scatter plot), it is hard to propose a reasonable model assumption. I would suggest using non-parametric smoothing for this.
There are many scatter plot smoothing methods, like kernel smoothing ksmooth, smoothing spline smooth.spline and LOESS loess. I prefer to using smooth.spline, and here is what we can do with it:
fit <- smooth.spline(Temp, Disc)
Please read ?smooth.spline for what it takes and what it returns. We can check the fitted spline curve by
plot(Temp, Disc)
lines(fit, col = 2)
Should you want to make prediction elsewhere, use predict function (predict.smooth.spline). For example, if we want to predict Temp = 20 and Temp = 44, we can use
predict(fit, c(20,44))$y
# [1] 0.3940963 0.3752191
Prediction outside range(Temp) is not recommended, as it suffers from potential bad extrapolation effect.
Before I resort to non-parametric method, I also tried non-linear regression with regression splines and orthogonal polynomial basis, but they don't provide satisfying result. The major reason is that there is no penalty on the smoothness. As an example, I show some try with poly:
try1 <- lm(Disc ~ poly(Temp, degree = 3))
try2 <- lm(Disc ~ poly(Temp, degree = 4))
try3 <- lm(Disc ~ poly(Temp, degree = 5))
plot(Temp, Disc, ylim = c(-0.3,1.0))
x<- seq(min(Temp), max(Temp), length = 50)
newdat <- list(Temp = x)
lines(x, predict(try1, newdat), col = 2)
lines(x, predict(try2, newdat), col = 3)
lines(x, predict(try3, newdat), col = 4)
We can see that the fitted curve is artificial.
We can fit polynomials as follows, but it's going to overfit the data as we have higher degree:
m <- nls(Disc ~ a + b*Temp + c*Temp^2 + d*Temp^3 + e*Temp^4, start=list(a=0, b=1, c=1, d=1, e=1))
plot(Temp,Disc,pch=19)
lines(Temp,predict(m),lty=2,col="red",lwd=3)
m <- nls(Disc ~ a + b*Temp + c*Temp^2 + d*Temp^3 + e*Temp^4 + f*Temp^5, start=list(a=0, b=1, c=1, d=1, e=1, f=1))
lines(Temp,predict(m),lty=2,col="blue",lwd=3)
m <- nls(Disc ~ a + b*Temp + c*Temp^2 + d*Temp^3 + e*Temp^4 + f*Temp^5 + g*Temp^6, start=list(a=0, b=1, c=1, d=1, e=1, f=1, g=1))
lines(Temp,predict(m),lty=2,col="green",lwd=3)
m.poly <- lm(Disc ~ poly(Temp, degree = 15))
lines(Temp,predict(m),lty=2,col="yellow",lwd=3)
legend(x = "topleft", legend = c("Deg 4", "Deg 5", "Deg 6", "Deg 20"),
col = c("red", "green", "blue", "yellow"),
lty = 2)