Confidence interval of polynomial regression - r

I have a little issue with R and statistics.
I fitted a model with the Maximum Likelihood method, who gave me the following coefficients with their respective Standard Errors (among other parameters estimates):
ParamIndex Estimate SE
1 a0 0.2135187 0.02990105
2 a1 1.1343072 0.26123775
3 a2 -1.0000000 0.25552696
From what I can draw my curve:
y= 0.2135187 + 1.1343072 * x - 1 * I(x^2)
But from that, I have now to calculate the confidence interval around this curve, and I don't have a clear idea how to do that.
Apparently, I should use the propagation or error/uncertainty, but the methods I found require the raw data, or more than just the polynomial formula.
Is there any method to calculate the CI of my curve when the SE of the estimates are known with R?
Thank you for your help.
Edit:
So, right now, I have the covariance table (v) obtain with the function vcov:
a0 a1 a2
a0 0.000894073 -0.003622614 0.002874075
a1 -0.003622614 0.068245163 -0.065114661
a2 0.002874075 -0.065114661 0.065294027
and n = 279.

You don't have enough information right now. To compute confidence interval of your fitted curve, a complete variance-covariance matrix for your three coefficients is required, but right now you only have diagonal entries of that matrix.
If you have fitted an orthogonal polynomial, then variance-covariance matrix is diagonal, with identical diagonal elements. This is certainly not your case, as:
standard errors you show are different from each other;
you have explicitly used raw polynomial notation: x + I(x ^ 2)
but the methods I found require the raw data
It's not "raw data" used for fitting the model. It is "new data" where you want to produce the confidence band. However, you do need to know the number of data used for fitting the model, say n, as that is necessary to derive residual degree of freedom. In your case with 3 coefficients, this degree of freedom is n - 3.
Once you have:
the full variance-covariance matrix, let's say V;
n, the number of data used for model fitting;
a vector of points x giving where to produce confidence band,
you can first get prediction standard error from:
X <- cbind(1, x, x ^ 2) ## prediction matrix
e <- sqrt( rowSums(X * (X %*% V)) ) ## prediction standard error
You know how to get predicted mean, from your fitted polynomial formula, right? Suppose the mean is mu, now for 95%-CI, use
## residual degree of freedom: n - 3
mu + e * qt(0.025, n - 3) ## lower bound
mu - e * qt(0.025, n - 3) ## upper bound
A complete theory is at How does predict.lm() compute confidence interval and prediction interval?
Update
Based on your provided covariance matrix, it is now possible to produce some result and figures.
V <- structure(c(0.000894073, -0.003622614, 0.002874075, -0.003622614,
0.068245163, -0.065114661, 0.002874075, -0.065114661, 0.065294027
), .Dim = c(3L, 3L), .Dimnames = list(c("a0", "a1", "a2"), c("a0",
"a1", "a2")))
Suppose we want to produce CI at x = seq(-5, 5, by = 0.2):
beta <- c(0.2135187, 1.1343072, -1.0000000)
x <- seq(-5, 5, by = 0.2)
X <- cbind(1, x, x ^ 2)
mu <- X %*% beta
e <- sqrt( rowSums(X * (X %*% V)) )
n <- 279
lo <- mu + e * qt(0.025, n - 3)
up <- mu - e * qt(0.025, n - 3)
matplot(x, cbind(mu, lo, up), type = "l", col = 1, lty = c(1,2,2))

Related

Syntax for three-piece segmented regression using NLS in R when concave

My goal is to fit a three-piece (i.e., two break-point) regression model to make predictions using propagate's predictNLS function, making sure to define knots as parameters, but my model formula seems off.
I've used the segmented package to estimate the breakpoint locations (used as starting values in NLS), but would like to keep my models in the NLS format, specifically, nlsLM {minipack.lm} because I am fitting other types of curves to my data using NLS, want to allow NLS to optimize the knot values, am sometimes using variable weights, and need to be able to easily calculate the Monte Carlo confidence intervals from propagate. Though I'm very close to having the right syntax for the formula, I'm not getting the expected/required behaviour near the breakpoint(s). The segments SHOULD meet directly at the breakpoints (without any jumps), but at least on this data, I'm getting a weird local minimum at the breakpoint (see plots below).
Below is an example of my data and general process. I believe my issue to be in the NLS formula.
library(minpack.lm)
library(segmented)
y <- c(-3.99448113, -3.82447011, -3.65447803, -3.48447030, -3.31447855, -3.14448753, -2.97447972, -2.80448401, -2.63448380, -2.46448069, -2.29448796, -2.12448912, -1.95448783, -1.78448797, -1.61448563, -1.44448719, -1.27448469, -1.10448651, -0.93448525, -0.76448637, -0.59448626, -0.42448586, -0.25448588, -0.08448548, 0.08551417, 0.25551393, 0.42551411, 0.59551395, 0.76551389, 0.93551398)
x <- c(61586.1711, 60330.5550, 54219.9925, 50927.5381, 48402.8700, 45661.9175, 37375.6023, 33249.1248, 30808.6131, 28378.6508, 22533.3782, 13901.0882, 11716.5669, 11004.7305, 10340.3429, 9587.7994, 8736.3200, 8372.1482, 8074.3709, 7788.1847, 7499.6721, 7204.3168, 6870.8192, 6413.0828, 5523.8097, 3961.6114, 3460.0913, 2907.8614, 2016.1158, 452.8841)
df<- data.frame(x,y)
#Use Segmented to get estimates for parameters with 2 breakpoints
my.seg2 <- segmented(lm(y ~ x, data = df), seg.Z = ~ x, npsi = 2)
#extract knot, intercept, and coefficient values to use as NLS start points
my.knot1 <- my.seg2$psi[1,2]
my.knot2 <- my.seg2$psi[2,2]
my.m_2 <- slope(my.seg2)$x[1,1]
my.b1 <- my.seg2$coefficients[[1]]
my.b2 <- my.seg2$coefficients[[2]]
my.b3 <- my.seg2$coefficients[[3]]
#Fit a NLS model to ~replicate segmented model. Presumably my model formula is where the problem lies
my.model <- nlsLM(y~m*x+b+(b2*(ifelse(x>=knot1&x<=knot2,1,0)*(x-knot1))+(b3*ifelse(x>knot2,1,0)*(x-knot2-knot1))),data=df, start = c(m = my.m_2, b = my.b1, b2 = my.b2, b3 = my.b3, knot1 = my.knot1, knot2 = my.knot2))
How it should look
plot(my.seg2)
How it does look
plot(x, y)
lines(x=x, y=predict(my.model), col='black', lty = 1, lwd = 1)
I was pretty sure I had it "right", but when the 95% confidence intervals are plotted with the line and prediction resolution (e.g., the density of x points) is increased, things seem dramatically incorrect.
Thank you all for your help.
Define g to be a grouping vector having the same length as x which takes on values 1, 2, 3 for the 3 sections of the X axis and create an nls model from these. The resulting plot looks ok.
my.knots <- c(my.knot1, my.knot2)
g <- cut(x, c(-Inf, my.knots, Inf), label = FALSE)
fm <- nls(y ~ a[g] + b[g] * x, df, start = list(a = c(1, 1, 1), b = c(1, 1, 1)))
plot(y ~ x, df)
lines(fitted(fm) ~ x, df, col = "red")
(continued after graph)
Constraints
Although the above looks ok and may be sufficient it does not guarantee that the segments intersect at the knots. To do that we must impose the constraints that both sides are equal at the knots:
a[2] + b[2] * my.knots[1] = a[1] + b[1] * my.knots[1]
a[3] + b[3] * my.knots[2] = a[2] + b[2] * my.knots[2]
so
a[2] = a[1] + (b[1] - b[2]) * my.knots[1]
a[3] = a[2] + (b[2] - b[3]) * my.knots[2]
= a[1] + (b[1] - b[2]) * my.knots[1] + (b[2] - b[3]) * my.knots[2]
giving:
# returns a vector of the three a values
avals <- function(a1, b) unname(cumsum(c(a1, -diff(b) * my.knots)))
fm2 <- nls(y ~ avals(a1, b)[g] + b[g] * x, df, start = list(a1 = 1, b = c(1, 1, 1)))
To get the three a values we can use:
co <- coef(fm2)
avals(co[1], co[-1])
To get the residual sum of squares:
deviance(fm2)
## [1] 0.193077
Polynomial
Although it involves a large number of parameters, a polynomial fit could be used in place of the segmented linear regression. A 12th degree polynomial involves 13 parameters but has a lower residual sum of squares than the segmented linear regression. A lower degree could be used with corresponding increase in residual sum of squares. A 7th degree polynomial involves 8 parameters and visually looks not too bad although it has a higher residual sum of squares.
fm12 <- nls(y ~ cbind(1, poly(x, 12)) %*% b, df, start = list(b = rep(1, 13)))
deviance(fm12)
## [1] 0.1899218
It may, in part, reflect a limitation in segmented. segmented returns a single change point value without quantifying the associated uncertainty. Redoing the analysis using mcp which returns Bayesian posteriors, we see that the second change point is bimodally distributed:
library(mcp)
model = list(
y ~ 1 + x, # Intercept + slope in first segment
~ 0 + x, # Only slope changes in the next segments
~ 0 + x
)
# Fit it with a large number of samples and plot the change point posteriors
fit = mcp(model, data = data.frame(x, y), iter = 50000, adapt = 10000)
plot_pars(fit, regex_pars = "^cp*", type = "dens_overlay")
FYI, mcp can plot credible intervals as well (the red dashed lines):
plot(fit, q_fit = TRUE)

How to see Bayesian computations as clean matrices in R

Given the clarity needed to understand how R can help doing Bayesian computations, in what follows, I will be asking R coding questions in this regard.
some necessary details:
Suppose, I have an object called mu, defined as:
mu <- rnorm( 1e4 , 178 , 20 ) ## A vector of hypothesized values
The object mu is going to serve as the mean argument of the next object called y.given.mu:
y.given.mu <- rnorm( 1e4 , mu , 1 ) ## A vector of normal densities conditional on `mu`
Question
I was wondering how I could:
A) cleanly see the matrix structure of y.given.mu?
B) multiply object mu by y.given.mu and cleanly see the matrix structure of the product of these two objects (i.e., joint distribution)
C) integrate out mu from B) so that I get p(y)?
As we discussed, we move all follow-up questions of your previous question What does it mean to put an `rnorm` as an argument of another `rnorm` in R? into another thread.
A reasonably sufficient grid
delta.mu <- 0.5 # this affects numerical integration precision
mu <- seq(178 - 3 * 20, 178 + 3 * 20, by = delta.mu)
delta.y <- 1 # this does not affect precision, by only plotting
y <- seq(min(mu) - 3, max(mu) + 3, by = delta.y)
# the range above is chosen using 3-sigma rule of normal distribution.
# normal distribution has near 0 density outside (3 * sd) range of its mean
Conditional density p(y | mu)
cond <- outer(y, mu, dnorm)
dimnames(cond) <- list(y = y, mu = mu)
# each column is a conditional density, conditioned on some `mu`
# you can view them by for example `plot(y, cond[, 1]), type = "l")
# you can view all of them by `matplot(y, cond, type = "l", lty = 2)`
Joint density p(y, mu)
# marginal of `mu`
p.mu <- dnorm(mu, 178, 20)
# multiply `p.mu` to `cond` column by column (i.e., column scaling)
joint <- cond * rep(p.mu, each = length(y))
Marginal density p(y)
# numerical integration by Simpson / Trapezoidal Rule
p.y <- rowSums(joint * delta.mu)
Now let's plot and check
plot(y, p.y, type = "l")

How to find interval prbability for a given distribution?

Suppose I have some data and I fit them to a gamma distribution, how to find the interval probability for Pr(1 < x <= 1.5), where x is an out-of-sample data point?
require(fitdistrplus)
a <- c(2.44121289,1.70292449,0.30550832,0.04332383,1.0553436,0.26912546,0.43590885,0.84514809,
0.36762336,0.94935435,1.30887437,1.08761895,0.66581035,0.83108270,1.7567334,1.00241339,
0.96263021,1.67488277,0.87400413,0.34639636,1.16804671,1.4182144,1.7378907,1.7462686,
1.7427784,0.8377457,0.1428738,0.71473956,0.8458882,0.2140742,0.9663167,0.7933085,
0.0475603,1.8657773,0.18307362,1.13519144)
fit <- fitdist(a, "gamma",lower = c(0, 0))
Someone does not like my above approach, which is conditional on MLE; now let's see something unconditional. If we take direct integration, we need a triple integration: one for shape, one for rate and finally one for x. This is not appealing. I will just produce Monte Carlo estimate instead.
Under Central Limit Theorem, MLE are normally distributed. fitdistrplus::fitdist does not give standard error, but we can use MASS::fitdistr which would performs exact inference here.
fit <- fitdistr(a, "gamma", lower = c(0,0))
b <- fit$estimate
# shape rate
#1.739737 1.816134
V <- fit$vcov ## covariance
shape rate
shape 0.1423679 0.1486193
rate 0.1486193 0.2078086
Now we would like to sample from parameter distribution and get samples of target probability.
set.seed(0)
## sample from bivariate normal with mean `b` and covariance `V`
## Cholesky method is used here
X <- matrix(rnorm(1000 * 2), 1000) ## 1000 `N(0, 1)` normal samples
R <- chol(V) ## upper triangular Cholesky factor of `V`
X <- X %*% R ## transform X under desired covariance
X <- X + b ## shift to desired mean
## you can use `cov(X)` to check it is very close to `V`
## now samples for `Pr(1 < x < 1.5)`
p <- pgamma(1.5, X[,1], X[,2]) - pgamma(1, X[,1], X[,2])
We can make a histogram of p (and maybe do a density estimation if you want):
hist(p, prob = TRUE)
Now, we often want sample mean for predictor:
mean(p)
# [1] 0.1906975
Here goes an example that uses MCMC techniques and a Bayesian mode of inference to estimate the posterior probability that a new observation falls in the interval (1:1.5). This is an unconditional estimate, as opposed to the conditional estimate obtained by integrating the gamma-distribution with maximum-likelihood parameter estimates.
This code requires that JAGS be installed on your computer (free and easy to install).
library(rjags)
a <- c(2.44121289,1.70292449,0.30550832,0.04332383,1.0553436,0.26912546,0.43590885,0.84514809,
0.36762336,0.94935435,1.30887437,1.08761895,0.66581035,0.83108270,1.7567334,1.00241339,
0.96263021,1.67488277,0.87400413,0.34639636,1.16804671,1.4182144,1.7378907,1.7462686,
1.7427784,0.8377457,0.1428738,0.71473956,0.8458882,0.2140742,0.9663167,0.7933085,
0.0475603,1.8657773,0.18307362,1.13519144)
# Specify the model in JAGS language using diffuse priors for shape and scale
sink("GammaModel.txt")
cat("model{
# Priors
shape ~ dgamma(.001,.001)
rate ~ dgamma(.001,.001)
# Model structure
for(i in 1:n){
a[i] ~ dgamma(shape, rate)
}
}
", fill=TRUE)
sink()
jags.data <- list(a=a, n=length(a))
# Give overdispersed initial values (not important for this simple model, but very important if running complicated models where you need to check convergence by monitoring multiple chains)
inits <- function(){list(shape=runif(1,0,10), rate=runif(1,0,10))}
# Specify which parameters to monitor
params <- c("shape", "rate")
# Set-up for MCMC run
nc <- 1 # number of chains
n.adapt <-1000 # number of adaptation steps
n.burn <- 1000 # number of burn-in steps
n.iter <- 500000 # number of posterior samples
thin <- 10 # thinning of posterior samples
# Running the model
gamma_mod <- jags.model('GammaModel.txt', data = jags.data, inits=inits, n.chains=nc, n.adapt=n.adapt)
update(gamma_mod, n.burn)
gamma_samples <- coda.samples(gamma_mod,params,n.iter=n.iter, thin=thin)
# Summarize the result
summary(gamma_samples)
# Compute improper (non-normalized) probability distribution for x
x <- rep(NA, 50000)
for(i in 1:50000){
x[i] <- rgamma(1, gamma_samples[[1]][i,1], rate = gamma_samples[[1]][i,2])
}
# Find which values of x fall in the desired range and normalize.
length(which(x>1 & x < 1.5))/length(x)
Answer:
Pr(1 < x <= 1.5) = 0.194
So pretty close to the conditional estimate, but this is not guaranteed to generally be the case.
You can just use pgamma with estimated parameters in fit.
b <- fit$estimate
# shape rate
#1.739679 1.815995
pgamma(1.5, b[1], b[2]) - pgamma(1, b[1], b[2])
# [1] 0.1896032
Thanks. But how about P(x > 2)?
Check out the lower.tail argument:
pgamma(q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE, log.p = FALSE)
By default, pgamma(q) evaluates Pr(x <= q). Setting lower.tail = FALSE gives Pr(x > q). So you can do:
pgamma(2, b[1], b[2], lower.tail = FALSE)
# [1] 0.08935687
Or you can also use
1 - pgamma(2, b[1], b[2])
# [1] 0.08935687

Plotting classification decision boundary line based on perceptron coefficients

This is practically a repeat of this question. However, I want to ask a very specific question regarding plotting of the decision boundary line based on the perceptron coefficients I got with a rudimentary "manual" coding experiment. As you can see the coefficients extracted from a logistic regression result in a nice decision boundary line:
based on the glm() results:
(Intercept) test1 test2
1.718449 4.012903 3.743903
The coefficients on the perceptron experiment are radically different:
bias test1 test2
9.131054 19.095881 20.736352
To facilitate an answer, here is the data, and here is the code:
# DATA PRE-PROCESSING:
dat = read.csv("perceptron.txt", header=F)
dat[,1:2] = apply(dat[,1:2], MARGIN = 2, FUN = function(x) scale(x)) # scaling the data
data = data.frame(rep(1,nrow(dat)), dat) # introducing the "bias" column
colnames(data) = c("bias","test1","test2","y")
data$y[data$y==0] = -1 # Turning 0/1 dependent variable into -1/1.
data = as.matrix(data) # Turning data.frame into matrix to avoid mmult problems.
# PERCEPTRON:
set.seed(62416)
no.iter = 1000 # Number of loops
theta = rnorm(ncol(data) - 1) # Starting a random vector of coefficients.
theta = theta/sqrt(sum(theta^2)) # Normalizing the vector.
h = theta %*% t(data[,1:3]) # Performing the first f(theta^T X)
for (i in 1:no.iter){ # We will recalculate 1,000 times
for (j in 1:nrow(data)){ # Each time we go through each example.
if(h[j] * data[j, 4] < 0){ # If the hypothesis disagrees with the sign of y,
theta = theta + (sign(data[j,4]) * data[j, 1:3]) # We + or - the example from theta.
}
else
theta = theta # Else we let it be.
}
h = theta %*% t(data[,1:3]) # Calculating h() after iteration.
}
theta # Final coefficients
mean(sign(h) == data[,4]) # Accuracy
QUESTION: How to plot the boundary line (as I did above using the logistic regression coefficients) if we only have the perceptron coefficients?
Well... It turns out that it is exactly the same as in the case of logistic regression, and despite the widely different coefficients: pick the minimum and maximum of the abscissa (test 1), add a slight margin, and calculate the corresponding test 2 values at the decision boundary (when 0 = theta_o + theta_1 test1 + theta_2 test2), and draw the line between the points:
palette(c("tan3","purple4"))
plot(test2 ~ test1, col = as.factor(y), pch = 20, data=data,
main="College admissions")
(x = c(min(data[,2])-.2, max(data[,2])+ .2))
(y = c((-1/theta[3]) * (theta[2] * x + theta[1])))
lines(x, y, lwd=3, col=rgb(.7,0,.2,.5))
Perceptron weights are calculated so that when theta^T X > 0, it classifies as positive, and when theta^T X < 0 it classifies as negative. This means the equation theta^T X is your decision boundary for the perceptron.
The same logic applies to logistic regression except its now sigmoid(theta^T X) > 0.5.

calculate vector valued Hessian in R

I want to calculate a variance-covariance matrix of parameters. The parameters are obtained by a non-linear least squares fit.
library(minpack.lm)
library(numDeriv)
variables
t <- seq(0.1,20,0.3)
a <- 20
b <- 14
c <- 0.4
jitter <- rnorm(length(t),0,0.5)
Hobs <- a+b*exp(-c*t)+jitter
function def
Hhat <- function(parList, t) {parList$a + parList$b*exp(-parL
Hhatde <- function(par, t) {par[1] + par[2]*exp(-par[3]*t)}st$c*t)}
residFun <- function(par, t, observed) observed - Hhat(par,t)
initial conditions
parStart = list(a = 20, b = 10 ,c = 0.5)
nls.lm
library(minpack.lm)
out1 <- nls.lm(par = parStart, fn = residFun, observed = Hobs,
t = t, control = nls.lm.control(nprint=0))
I wish to calculate manually what is given back via vcov(out1)
I tried it with: but sigma and vcov(out1) which don't seem to be the same
J <- jacobian(Hhatde, c(19.9508523,14.6586555,0.4066367 ), method="Richardson",
method.args=list(),t=t)
sigma <- solve((t(J)%*%J))
vcov(out1)
now trying to do it with the hessian, I can't get it working for error message cf below
hessian
H <- hessian(Hhatde, x = c(19.9508523,14.6586555,0.4066367 ), method="complex", method.args=list(),t=t)
Error in hessian.default(Hhatde, x = c(19.9508523, 14.6586555, 0.4066367), :
Richardson method for hessian assumes a scalar valued function.
How do I do I get my hessian() to work.
I am not very strong on the math here, hence the trial and error approach.
vcov(out1) returns an estimate of the scaled variance-covariance matrix for the parameters in your model. The inverse of the cross product of the gradient, solve(crossprod(J)) returns an estimate of the unscaled variance-covariance matrix. The scaling factor is the estimated variance of the errors. So to calculate the scaled variance-covariance matrix (with some rounding error) using the gradient and the residuals from your model:
df = length(Hobs) - length(out1$par) # degrees freedom
se_var = sum(out1$fvec^2) / df # estimated error variance
var_cov = se_var * solve(crossprod(J)) # scaled variance-covariance
print(var_cov)
print(vcov(out1))
To brush up on non-linear regression and non-linear least squares, you might wish to check out Seber & Wild's Nonlinear regression, or Bates & Watts' Nonlinear regression analysis and its applications. John Fox also has a short online appendix that you may find helpful.

Resources