Following workflow for nonlinear quantile regression seems to work. However I don´t know how to plot the resulting curve.
btw.: I´d prefer to use the function graphics::curve() instead of graphics::lines()
require(quantreg)
# load sample data
dat <- DNase
# introduce variable
x <- DNase$conc
y <- DNase$density
# introduce function
f <- function(a, b, x) {(a*x/(b+x))}
# fit the model
fm0 <- nls(log(y) ~ log(f(a,b,x)), dat, start = c(a = 1, b = 1))
# fit a nonlinear least-square regression
fit <- nls(y ~ f(a,b,x), dat, start = coef(fm0))
# receive coeffientes
co <- coef(fit)
a=co[1]
b=co[2]
# plot
plot(y~x)
# add curve
curve((a*x/(b+x)), add=T)
# then fit the median using nlrq
dat.nlrq <- nlrq(y ~ SSlogis(x, Asym, mid, scal), data=dat, tau=0.5)
# add curve
???
EDIT: What I´m looking for is a way to plot various quantile regressions of a formula, like a*x/(b+x).
Inserting the formula leads me to the question what to put as 'start' argument
dat.nlrq.075 <- nlrq(formula=fit, data = dat, start=???, tau = 0.75)
curve uses lines so there is really no reason to use curve when it's easier to use lines.
First ensure that data is sorted so plots come out right. Then fit with nls or nlrq and use fitted for the fitted line.
library(quantreg)
dat <- DNase[order(DNase$conc), ]
fit.nlrq <- nlrq(density ~ SSlogis(conc, Asym, mid, scal), data = dat, tau = 0.5)
plot(density ~ conc, dat)
lines(fitted(fit.nlrq) ~ conc, dat)
If you want to plot the fit at a different number of equally spaced points such as 250 then do the same except use predict instead of fitted:
x <- seq(min(dat$conc), max(dat$conc), length = 250)
lines(predict(fit.nlrq, list(conc = x)) ~ x, lty = 2, col = "red")
The same style works with nls.
Note that if you use require its value should be checked. If you don't want to do that use library instead.
Related
hope all is well.
I have been exploring a few options for constraining a spline function so that it not only stays positive, but, so that it stays above the lowest value of y in the dataframe. I am assuming there is a penalized spline function out there where one can readily adapt the shape, though I have not found easily or tried yet. I have also tried nls with an exponential decay function which works, however, the last estimated point is much higher than desired (would like it to pass through, or be closer to the final value of y). see code below with the options i have tried. The ultimate goal however is to fit a spline that passes through all points and never decreases below the lowest value of y at any point while also acknowledging that yes there are only 5 data points. thanks in advance for the help.
library(tidyverse)
library(broom)
library(gnm)
library(cobs)
library(zoo)
DF <- data.frame(x = seq(1,5,1),y=c(26419753,9511111,3566667,57993,52194))
t=1:5
# option 1a and 1b: preferred method which is fitting a spline function
mod1a <- splinefun(DF$x,DF$y)
curve(mod1a, 1,5)
pred_interval_mod1a <- seq(1,5,length = 40)
interp(pred_interval_mod1a) # has that dip to negative near the end which should remain larger than y= 52,194
mod1b <- cobs(x= DF$x,y = DF$y,pointwise=rbind(c(0,52194,-1),c(0,26419753,1)))
pred_interval_mod1b <- seq(1,5,length = 40)
interp(pred_interval_mod1b)
# option 2: NLS for exponential decay with starting values
mod2 <- nls(y ~ SSasymp(t, yf, y0, log_alpha), data = DF)
qplot(t, y, data = augment(mod2)) + geom_line(aes(y = .fitted))
# option 3: similar NLS premise but with lower defined
mod3 <- nls(y ~ yf + (y0 - yf) * exp(-alpha * t), data = DF,
start = list(y0 = 26419753, yf = 52194, alpha = 1),
lower= c(-Inf,52194,-Inf),algorithm="port")
# option 4: similar to 2 and 3
a=log(52194)
mod4 <- gnm(y ~ Exp(1 + t) -1, verbose = FALSE, constrain="Exp(.+x).Intercept",
constrainTo=a, start=c(a,-0.05), data=DF)
mod4_df <- data.frame(t = seq(1,5,by=1))
mod4_pred <- predict(mod4,newdata=mod4_df)
mod4_pred
I am struggling with "out-of-sample" prediction using loess. I get NA values for new x that are outside the original sample. Can I get these predictions?
x <- c(24,36,48,60,84,120,180)
y <- c(3.94,4.03,4.29,4.30,4.63,4.86,5.02)
lo <- loess(y~x)
x.all <- seq(3, 200, 3)
predict(object = lo, newdata = x.all)
I need to model full yield curve, i.e. interest rates for different maturities.
From the manual page of predict.loess:
When the fit was made using surface = "interpolate" (the default), predict.loess will not extrapolate – so points outside an axis-aligned hypercube enclosing the original data will have missing (NA) predictions and standard errors
If you change the surface parameter to "direct" you can extrapolate values.
For instance, this will work (on a side note: after plotting the prediction, my feeling is that you should increase the span parameter in the loess call a little bit):
lo <- loess(y~x, control=loess.control(surface="direct"))
predict(lo, newdata=x.all)
In addition to nico's answer: I would suggest to fit a gam (which uses penalized regression splines) instead. However, extrapolation is not advisable if you don't have a model based on science.
x <- c(24,36,48,60,84,120,180)
y <- c(3.94,4.03,4.29,4.30,4.63,4.86,5.02)
lo <- loess(y~x, control=loess.control(surface = "direct"))
plot(x.all <- seq(3,200,3),
predict(object = lo,newdata = x.all),
type="l", col="blue")
points(x, y)
library(mgcv)
fit <- gam(y ~ s(x, bs="cr", k=7, fx =FALSE), data = data.frame(x, y))
summary(fit)
lines(x.all, predict(fit, newdata = data.frame(x = x.all)), col="green")
I need to create confidence interval for linear regression using R-lang. I followed a few tutorials, yet my result is quite different. As far as I am concerned, I should get two lines, one above and one below the main line, as shown here.
Unfortunately what I got is a few stacked lines, as shown here.
Could anyone help me to understand what am I doing wrong?
Here's sample of my code:
speed <- c(61,225,110,51,114,68,24,24,133,83,83,92,93,37,111,172,142,105,143,77,154,108,98,164,124,97,90,87,137,71,73,74,62,88,100,101,126,113,49)
length <- c(58,149,90,55,91,69,31,35,109,77,78,82,86,44,89,121,106,98,116,65,111,88,86,122,104,85,72,80,105,74,71,66,73,72,72,90,91,98,59);
cars <- data.frame(speed, length)
modelReg <- lm(length ~ speed, data = cars)
x <- cars$speed
conf_interval <- predict(modelReg, newdata = data.frame(seq(from=min(x),to=max(x),by = 1)),interval = 'confidence')
lines(x,conf_interval[,2],lty=2)
lines(x,conf_interval[,3],lty=2)
After the first four lines of your code above, use Gosink's plot.add.ci function:
# John Gosink's Intervals Plotter (from http://gosink.org/?page_id=120)
plot.add.ci <- function(x, y, interval='prediction', level=0.9,
regressionColor='red', ...) {
xOrder <- order(x)
x <- x[xOrder]
y <- y[xOrder]
fit <- lm(y ~ x, data=data.frame(x=x, y=y))
newX <- data.frame(x=jitter(x))
fitPred <- predict.lm(fit,newdata=newX,interval=interval,level=level, ...)
abline(lm(y ~ x), col=regressionColor)
lines(newX$x, fitPred[,2], lty=2, ...)
lines(newX$x, fitPred[,3], lty=2, ...)
}
plot(cars$speed,cars$length)
abline(modelReg,col="red")
plot.add.ci(speed, length, level=0.95, interval="confidence", lwd=3)
Which gives this plot (change level if you want a different confidence level, or drop interval= for a prediction interval):
I fit an exponential formula with a set of data (x, y). then I want to calculate the y values from the formula with x values beyond the actual data set. It does't work, always prints the y values for the actual x values. Here is the code. What have I done wrong? What's the solution for my task with R language:
data <- data.frame(x=seq(1,69), y=othertable[1:69, 2])
nlsxypw <- nls(data$y ~ a*data$x^b, col2_60, start=list(a=2200000, b=0))
predict(nlsxypw)
#here I want to calculate the y values for x = 70-80
xnew <- seq(70, 80, 1)
predict(nlsxypw, xnew)
#it doesn't print these values, still the actual values for x=1~69.
This is kind of a strange feature with predict.nls (possibly other predict methods as well?), but you have to supply the new data with the same name that your model was defined in terms of:
set.seed(123)
Data <- data.frame(
x = 1:69,
y = ((1:69)**2)+rnorm(69,0,5))
nlsxypw <- nls(y ~ a*(x^b),
data=Data,
start=list(a=2.5, b=1))
##
xnew <- 70:80
## note how newdata is specified
y.pred <- predict(nlsxypw, newdata=list(x=xnew))
> y.pred
[1] 4900.355 5041.359 5184.364 5329.368 5476.373 5625.377 5776.381 5929.386 6084.390 6241.393 6400.397
##
with(
Data,
plot(x,y,pch=20,
xlim=c(0,90),
ylim=c(0,6700)))
lines(fitted(nlsxypw),col="red")
points(
x=xnew,
y=y.pred,
pch=20,
col="blue")
##
I have this code
factors<-read.csv("India_Factors.csv",header=TRUE)
marketfactor<-factors[,4]
sizefactor<-factors[,5]
valuefactor<-factors[,6]
dati<-get.hist.quote("SI", quote = "AdjClose", compression = "m")
returns<-diff(dati)
regression<-lm(returns ~ marketfactor + sizefactor + valuefactor,na.action=na.omit)
that does multilinear regression.
I want to plot on a 2D plane the returns against a factor (and this is trivial of course) with superimposed the projection of the linear regression hyperplane for the specific factor. To be more clear the result should be like this: wolfram demonstrations (see the snapshots).
Any help will be greatly appreciated.
Thank you for your time and have a nice week end.
Giorgio.
The points in my comment withstanding, here is the canonical way to generate output from a fitted model in R for combinations of predictors. It really isn't clear what the plots you want are showing, but the ones that make sense to me are partial plots; where one variable is varied over its range whilst holding the others at some common value. Here I use the sample mean when holding a variable constant.
First some dummy data, with only to covariates, but this extends to any number
set.seed(1)
dat <- data.frame(y = rnorm(100))
dat <- transform(dat,
x1 = 0.2 + (0.4 * y) + rnorm(100),
x2 = 2.4 + (2.3 * y) + rnorm(100))
Fit the regression model
mod <- lm(y ~ x1 + x2, data = dat)
Next some data values to predict at using the model. You could do all variables in a single prediction and then subset the resulting object to plot only the relevant rows. Alternatively, more clearly (though more verbose), you can deal with each variable separately. Below I create two data frames, one per covariate in the model. In a data frame I generate 100 values over the range of the covariate being varied, and repeat the mean value of the other covariate(s).
pdatx1 <- with(dat, data.frame(x1 = seq(min(x1), max(x1), length = 100),
x2 = rep(mean(x2), 100)))
pdatx2 <- with(dat, data.frame(x1 = rep(mean(x1), 100),
x2 = seq(min(x2), max(x2), length = 100)))
In the linear regression with straight lines, you really don't need 100 values --- the two end points of the range of the covariate will do. However for models where the fitted function is not linear you need to predict at more locations.
Next, use the model to predict at these data points
pdatx1 <- transform(pdatx1, yhat = predict(mod, pdatx1))
pdatx2 <- transform(pdatx2, yhat = predict(mod, pdatx2))
Now we are ready to draw the partial plots. First compute a range for the y axis - again it is mostly redundant here but if you are adding confidence intervals you will need to include their values below,
ylim <- range(pdatx1$y, pdatx2$y, dat$y)
To plot (here putting two figures on the same plot device) we can use the following code
layout(matrix(1:2, ncol = 2))
plot(y ~ x1, data = dat)
lines(yhat ~ x1, data = pdatx1, col = "red", lwd = 2)
plot(y ~ x2, data = dat)
lines(yhat ~ x2, data = pdatx2, col = "red", lwd = 2)
layout(1)
Which produces