How to interpret coefficients of logistic regression - r

I'm trying to figure out how the coefficients of logistic regression with a polynomial term relate to predictions. Specifically, I'm interested in the location on the x-axis where the prediction is highest. Example below:
set.seed(42)
# Setup some dummy data
x <- 1:200
y <- rep(0, length(x))
y[51:150] <- rbinom(100, 1, 0.5)
# Fit a model
family <- binomial()
model <- glm(y ~ poly(x, 2), family = family)
# Illustrate model
plot(x, y)
lines(x, family$linkinv(predict(model)), col = 2)
The model above gives me these coefficients:
coef(model)
#> (Intercept) poly(x, 2)1 poly(x, 2)2
#> -1.990317 -3.867855 -33.299893
Created on 2021-08-03 by the reprex package (v1.0.0)
The manual page for poly() states the following:
The orthogonal polynomial is summarized by the coefficients, which can be used to evaluate it via the three-term recursion given in Kennedy & Gentle (1980, pp. 343–4), and used in the predict part of the code.
However, I don't have access to the book, nor am I able to discern from the predict.glm S3 method how these coefficients are handled. Is there a way to reconstruct the location of the summit (around 100 in the example) from the coefficients alone (i.e. without using predict() to find the maximum)?

Derivation of the location of the predicted maximum from the theoretical expressions of the orthogonal polynomials
I got a copy of the "Statistical Computing" book by Kennedy and Gentle (1982) referenced in the documentation of poly and now share my findings about the calculation of the orthogonal polynomials, and how we can use them to find the location x of the maximum predicted value.
The orthogonal polynomials presented in the book (pp. 343-4) are monic (i.e. the highest order coefficient is always 1) and are obtained by the following recurrence procedure:
where q is the number of orthogonal polynomials considered.
Note the following relationship of the above terminology with the documentation of poly:
The "three-term recursion" appearing in the excerpt included in your question is the RHS of the third expression which has precisely three terms.
The rho(j+1) coefficients in the third expression are called "centering constants".
The gamma(j) coefficients in the third expression do not have a name in the documentation but they are directly related to the "normalization constants", as seen below.
For reference, here I paste the relevant excerpt of the "Value" section of the poly documentation:
A matrix with rows corresponding to points in x and columns corresponding to the degree, with attributes "degree" specifying the degrees of the columns and (unless raw = TRUE) "coefs" which contains the centering and normalization constants used in constructing the orthogonal polynomials
Going back to the recurrence, we can derive the values of parameters rho(j+1) and gamma(j) from the third expression by imposing the orthogonality condition on p(j+1) w.r.t. p(j) and p(j-1).
(It's important to note that the orthogonality condition is not an integral, but a summation on the n observed x points, so the polynomial coefficients depend on the data! --which is not the case for instance for the Tchebyshev orthogonal polynomials).
The expressions for the parameters become:
For the polynomials of orders 1 and 2 used in your regression, we get the following expressions, already written in R code:
# First we define the number of observations in the data
n = length(x)
# For p1(x):
# p1(x) = (x - rho1) p0(x) (since p_{-1}(x) = 0)
rho1 = mean(x)
# For p2(x)
# p2(x) = (x - rho2) p1(x) - gamma1
gamma1 = var(x) * (n-1)/n
rho2 = sum( x * (x - mean(x))^2 ) / (n*gamma1)
for which we get:
> c(rho1, rho2, gamma1)
[1] 100.50 100.50 3333.25
Note that coefs attribute of poly(x,2) is:
> attr(poly(x,2), "coefs")
$alpha
[1] 100.5 100.5
$norm2
[1] 1 200 666650 1777555560
where $alpha contains the centering constants, i.e. the rho values (which coincide with ours --incidentally all centering constants are equal to the average of x when the distribution of x is symmetric for any q! (observed and proved)), and $norm2 contains the normalization constants (in this case for p(-1,x), p(0,x), p(1,x), and p(2,x)), that is the constants c(j) that normalize the polynomials in the recurrence formula --by dividing them by sqrt(c(j))--, making the resulting polynomials r(j,x) satisfy sum_over_i{ r(j,x_i)^2 } = 1; note that r(j,x) are the polynomials stored in the object returned by poly().
From the expression already given above, we observe that gamma(j) is precisely the ratio of two consecutive normalization constants, namely: gamma(j) = c(j) / c(j-1).
We can check that our gamma1 value coincides with this ratio by computing:
gamma1 == attr(poly(x,2), "coefs")$norm2[3] / attr(poly(x,2), "coefs")$norm2[2]
which returns TRUE.
Going back to your problem of finding the maximum of the values predicted by your model, we can:
Express the predicted value as a function of r(1,x) and r(2,x) and the coefficients from the logistic regression, namely:
pred(x) = beta0 + beta1 * r(1,x) + beta2 * r(2,x)
Derive the expression w.r.t. x, set it to 0 and solve for x.
In R code:
# Get the normalization constants alpha(j) to obtain r(j,x) from p(j,x) as
# r(j,x) = p(j,x) / sqrt( norm(j) ) = p(j,x) / alpha(j)
alpha1 = sqrt( attr(poly(x,2), "coefs")$norm2[3] )
alpha2 = sqrt( attr(poly(x,2), "coefs")$norm2[4] )
# Get the logistic regression coefficients (beta1 and beta2)
coef1 = as.numeric( model$coeff["poly(x, 2)1"] )
coef2 = as.numeric( model$coeff["poly(x, 2)2"] )
# Compute the x at which the maximum occurs from the expression that is obtained
# by deriving the predicted expression pred(x) = beta0 + beta1*r(1,x) + beta2*r(2,x)
# w.r.t. x and setting the derivative to 0.
xmax = ( alpha2^-1 * coef2 * (rho1 + rho2) - alpha1^-1 * coef1 ) / (2 * alpha2^-1 * coef2)
which gives:
> xmax
[1] 97.501114
i.e. the same value obtained with the other "empirical" method described in my previous answer.
The full code to obtain the location x of the maximum of the predicted values, starting off from the code you provided, is:
# First we define the number of observations in the data
n = length(x)
# Parameters for p1(x):
# p1(x) = (x - rho1) p0(x) (since p_{-1}(x) = 0)
rho1 = mean(x)
# Parameters for p2(x)
# p2(x) = (x - rho2) p1(x) - gamma1
gamma1 = var(x) * (n-1)/n
rho2 = mean( x * (x - mean(x))^2 ) / gamma1
# Get the normalization constants alpha(j) to obtain r(j,x) from p(j,x) as
# r(j,x) = p(j,x) / sqrt( norm(j) ) = p(j,x) / alpha(j)
alpha1 = sqrt( attr(poly(x,2), "coefs")$norm2[3] )
alpha2 = sqrt( attr(poly(x,2), "coefs")$norm2[4] )
# Get the logistic regression coefficients (beta1 and beta2)
coef1 = as.numeric( model$coeff["poly(x, 2)1"] )
coef2 = as.numeric( model$coeff["poly(x, 2)2"] )
# Compute the x at which the maximum occurs from the expression that is obtained
# by deriving the predicted expression pred(x) = beta0 + beta1*r(1,x) + beta2*r(2,x)
# w.r.t. x and setting the derivative to 0.
( xmax = ( alpha2^-1 * coef2 * (rho1 + rho2) - alpha1^-1 * coef1 ) / (2 * alpha2^-1 * coef2) )

Assuming you want to find the maximum of the prediction analytically for this particular case where the orthogonal polynomials are of order 1 and 2, I propose the following approach:
SUMMARY
1) Infer the polynomial coefficients
This can easily be done by fitting a linear model to the respective polynomial values contained in the model matrix.
2) Derive the prediction expression w.r.t. x and set the derivative to 0
Solve for x in the prediction expression inferred from the polynomial fit in (1) and obtain the value of x at which the prediction's maximum occurs.
DETAILS
1) Polynomial coefficients
Following from the line where you fit the GLM model, we estimate the coefficients for the polynomial of order 1, p1(x) = a0 + a1*x, and the coefficients for the polynomial of order 2, p2(x) = b0 + b1*x + b2*x^2:
X = model.matrix(model)
p1 = X[, "poly(x, 2)1"]
p2 = X[, "poly(x, 2)2"]
p1.lm = lm(p1 ~ x)
a0 = p1.lm$coeff["(Intercept)"]
a1 = p1.lm$coeff["x"]
p2.lm = lm(p2 ~ x + I(x^2))
b0 = p2.lm$coeff["(Intercept)"]
b1 = p2.lm$coeff["x"]
b2 = p2.lm$coeff["I(x^2)"]
This gives:
> c(a0, a1, b0, b1, b2)
(Intercept) x (Intercept) x I(x^2)
-1.2308840e-01 1.2247602e-03 1.6050353e-01 -4.7674315e-03 2.3718565e-05
2) Derivative of the prediction to find the maximum
The expression for the prediction, z, (before the inverse link function) is:
z = Intercept + coef1 * p1(x) + coef2 * p2(x)
We derive this expression and set it to 0 to obtain:
coef1 * a1 + coef2 * (b1 + 2 * b2 * xmax) = 0
Solving for xmax we get:
xmax = - (coef1 * a1 + coef2 * b1) / (2 * coef2 * b2)
In R code, this is computed as:
coef1 = as.numeric( model$coeff["poly(x, 2)1"] )
coef2 = as.numeric( model$coeff["poly(x, 2)2"] )
(xmax = - ( coef1 * a1 + coef2 * b1 ) / (2 * coef2 * b2))
which gives:
x
97.501114
CHECK
We can verify the maximum by adding it to the prediction's curve as a green cross:
# Prediction curve computed analytically
Intercept = model$coeff["(Intercept)"]
pred.analytical = family$linkinv( Intercept + coef1 * p1 + coef2 * p2 )
# Find the prediction's maximum analytically
pred.max = family$linkinv( Intercept + coef1 * (a0 + a1 * xmax) +
coef2 * (b0 + b1 * xmax + b2 * xmax^2) )
# Plot
plot(x, y)
# The following two lines should coincide!
lines(x, pred.analytical, col = 3)
lines(x, family$linkinv(predict(model)), col = 2)
# Location of the maximum!
points(xmax, pred.max, pch="x", col="green")
which gives:

Related

Different R-squared for equivalent models

Why do I get different values of R-squared for these two models, which should be equivalent (in the second model, the intercept term is replaced by a level of z)? Is this a bug or am I missing something?
set.seed(42)
N=100
# intercepts
iA = 3
iB = 3.5
# slopes
sA = 1.5
sB = 0.5
# xs
xA = runif(0,1, n=N)
xB = runif(0,1, n=N)
# ys
yA = sA*xA + iA + rnorm(n=N)/10
yB = sB*xB + iB + rnorm(n=N)/10
data = data.frame(x=c(xA, xB), y=c(yA, yB), z=c(rep("A", times=N), rep("B", times=N)))
lm1 = lm(data=data, formula = y ~ x + z)
lm2 = lm(data=data, formula = y ~ x + z -1)
coef(lm1)
coef(lm2)
summary(lm1)$r.squared
summary(lm2)$r.squared
Output:
> coef(lm1)
(Intercept) x zB
3.23590275 1.03353472 -0.01435266
> coef(lm2)
x zA zB
1.033535 3.235903 3.221550
>
> summary(lm1)$r.squared
[1] 0.7552991
> summary(lm2)$r.squared
[1] 0.9979477
For models with an intercept summary.lm calculates an R^2 based on comparing the model to the intercept only model. For a model without an intercept that does not make sense so it compares it to the zero model. Of course in your example the intercept is actually a linear combination of columns in the model matrix, i.e. all(model.matrix(lm2) %*% c(0, 1, 1) == 1) is TRUE so it would be possible to write software to check for whether the intercept is a submodel of the full model but as it is it only looks to see if the model formula specifies an intercept or not.
In terms of calculations for models with an intercept summary.lm uses the equivalent of
1 - sum(resid(lm1)^2) / sum((data$y - mean(data$y))^2)
## [1] 0.7552991
1 - sum(resid(lm2)^2) / sum((data$y - mean(data$y))^2)
## [1] 0.7552991
but for models without an intercept summary.lm drops the mean term
1 - sum(resid(lm2)^2) / sum(data$y^2)
## [1] 0.9979477
You can compare these to
summary(lm1)$r.squared
## [1] 0.7552991
summary(lm2)$r.squared
## [1] 0.9979477
See ?summary.lm where this is mentioned.
From help("summary.lm") (emphasis added):
R², the ‘fraction of variance explained by the model’,
R^2 = 1 - \frac{\sum_i{R_i^2}}{\sum_i(y_i- y^*)^2},
where y^* is the mean of y_i
if there is an intercept and zero otherwise.
If you remove the intercept, R² is defined differently (which is sensible from the perspective of a statistician).

Better optimizer for constrained multinomial likelihood

Using R, I wish to estimate a vector of parameters a_i (of arbitrary length, i.e. i = 1,...,s) with a multinomial likelihood using a corresponding vector of observations n_i totaling a sample size of N=sum_i (n_i). The probabilities p_i of the multinomial are determined by said a parameters and measurements of variable x such that p_i = (a_i * x_i)/sum_i (a_i * x_i). I wish further to impose the constraint that sum_i a_i = 1.
I've managed to get optim() to do the job as follows --- implementing the two tricks I've seen of estimating the first a_1 as 1 - sum_{i=2} a_i and additionally renormalizing all estimates to 1 --- but the accuracy and dependability of achieving convergence remains rather variable (in addition to being sensitive to the vector of starting estimates I provide), even when N is very large.
I would appreciate guidance on more robust alternatives and/or improvements.
s <- 10 # vector length
N <- 1000 # total sample size
# variable
x_i <- round(rlnorm(n_p, 2.5, 1.5))
# true parameter values
a_i <- rbeta(s, 2, 2)
a_i <- a_i / sum(a_i)
# generate observations
n_i <- rmultinom(1, N, (a_i * x_i) / sum(a_i * x_i))
# negative log-likelihood for parameters `par'
nll = function(par) {
if (any(0 > par | par > 1)) {
return(NA)
}else{
par <- c(1 - sum(par), par) # estimate first as remainder
par <- par / sum(par) # normalize
p_i <- (par * x_i) / sum(par * x_i) # model for probabilities
- sum(dmultinom(
x = n_i,
size = N,
prob = p_i,
log = TRUE
)) }
}
# starting values (dropping first)
start = rep(1/s, s-1)
fit <- optim(par = start,
fn = nll,
control = list(maxit = 10000)
)
ests = c(1 - sum(fit$par), fit$par)
cbind(a_i, ests)
par(pty = 's')
plot(a_i, ests)
abline(0, 1)

Adding self starting values to an nls regression in R

I have existing code for fitting a sigmoid curve to data in R. How can I used selfstart (or another method) to automatically find start values for the regression?
sigmoid = function(params, x) {
params[1] / (1 + exp(-params[2] * (x - params[3])))
}
dataset = data.frame("x" = 1:53, "y" =c(0,0,0,0,0,0,0,0,0,0,0,0,0,0.1,0.18,0.18,0.18,0.33,0.33,0.33,0.33,0.41,0.41,0.41,0.41,0.41,0.41,0.5,0.5,0.5,0.5,0.68,0.58,0.58,0.68,0.83,0.83,0.83,0.74,0.74,0.74,0.83,0.83,0.9,0.9,0.9,1,1,1,1,1,1,1) )
x = dataset$x
y = dataset$y
# fitting code
fitmodel <- nls(y~a/(1 + exp(-b * (x-c))), start=list(a=1,b=.5,c=25))
# visualization code
# get the coefficients using the coef function
params=coef(fitmodel)
y2 <- sigmoid(params,x)
plot(y2,type="l")
points(y)
This is a common (and interesting) problem in non-linear curve fitting.
Background
We can find sensible starting values if we take a closer look at the function sigmoid
We first note that
So for large values of x, the function approaches a. In other words, as a starting value for a we may choose the value of y for the largest value of x.
In R language, this translates to y[which.max(x)].
Now that we have a starting value for a, we need to decide on starting values for b and c. To do that, we can make use of the geometric series
and expand f(x) = y by keeping only the first two terms
We now set a = 1 (our starting value for a), re-arrange the equation and take the logarithm on both sides
We can now fit a linear model of the form log(1 - y) ~ x to obtain estimates for the slope and offset, which in turn provide the starting values for b and c.
R implementation
Let's define a function that takes as an argument the values x and y and returns a list of parameter starting values
start_val_sigmoid <- function(x, y) {
fit <- lm(log(y[which.max(x)] - y + 1e-6) ~ x)
list(
a = y[which.max(x)],
b = unname(-coef(fit)[2]),
c = unname(-coef(fit)[1] / coef(fit)[2]))
}
Based on the data for x and y you give, we obtain the following starting values
start_val_sigmoid(x, y)
#$a
#[1] 1
#
#$b
#[1] 0.2027444
#
#$c
#[1] 15.01613
Since start_val_sigmoid returns a list we can use its output directly as the start argument in nls
nls(y ~ a / ( 1 + exp(-b * (x - c))), start = start_val_sigmoid(x, y))
#Nonlinear regression model
# model: y ~ a/(1 + exp(-b * (x - c)))
# data: parent.frame()
# a b c
# 1.0395 0.1254 29.1725
# residual sum-of-squares: 0.2119
#
#Number of iterations to convergence: 9
#Achieved convergence tolerance: 9.373e-06
Sample data
dataset = data.frame("x" = 1:53, "y" =c(0,0,0,0,0,0,0,0,0,0,0,0,0,0.1,0.18,0.18,0.18,0.33,0.33,0.33,0.33,0.41,0.41,0.41,0.41,0.41,0.41,0.5,0.5,0.5,0.5,0.68,0.58,0.58,0.68,0.83,0.83,0.83,0.74,0.74,0.74,0.83,0.83,0.9,0.9,0.9,1,1,1,1,1,1,1) )
x = dataset$x
y = dataset$y

Predicting data from a power curve manually

I have a series of data I have fit a power curve to, and I use the predict function in R to allow me predict y values based on additional x values.
set.seed(1485)
len <- 24
x <- runif(len)
y <- x^3 + rnorm(len, 0, 0.06)
ds <- data.frame(x = x, y = y)
mydata=data.frame(x,y)
z <- nls(y ~ a * x^b, data = mydata, start = list(a=1, b=1))
#z is same as M!
power <- round(summary(z)$coefficients[1], 3)
power.se <- round(summary(z)$coefficients[2], 3)
plot(y ~ x, main = "Fitted power model", sub = "Blue: fit; green: known")
s <- seq(0, 1, length = 100)
lines(s, s^3, lty = 2, col = "green")
lines(s, predict(z, list(x = s)), lty = 1, col = "blue")
text(0, 0.5, paste("y =x^ (", power, " +/- ", power.se,")", sep = ""), pos = 4)
Instead of using the predict function here, how could I manually calculate estimated y values based on additional x values based on this power function. If this were just a simple linear regression, I would calculate the slope and y intercept and calculate my y values by
y= mx + b
Is there a similar equation I can use from the output of z that will allow me to estimate y values from additional x values?
> z
Nonlinear regression model
model: y ~ a * x^b
data: mydata
a b
1.026 3.201
residual sum-of-squares: 0.07525
Number of iterations to convergence: 5
Achieved convergence tolerance: 5.162e-06
You would do it the same way except you use the power equation you modeled. You can access the parameters the model calculated using z$m$getPars()
Here is a simple example to illustrate:
predict(z, list(x = 1))
Results in: 1.026125
Which equals the results of
z$m$getPars()["a"] * 1 ^ z$m$getPars()["b"]
Which is equivalet to y = a * x^b
Here are some ways.
1) with This evaluates the formula with respect to the coefficients:
x <- 1:2 # input
with(as.list(coef(z)), a * x^b)
## [1] 1.026125 9.437504
2) attach We could also use attach although it is generally frowned upon:
attach(as.list(coef(z)))
a * x^b
## [1] 1.026125 9.437504
3) explicit Explicit definition:
a <- coef(z)[["a"]]; b <- coef(z)[["b"]]
a * x^b
## [1] 1.026125 9.437504
4) eval This one extracts the formula from z so that we don't have to specify it again. formula(z)[[3]] is the right hand side of the formula used to produce z. Use of eval is sometimes frowned upon but this does avoid
the redundant specification of the formula.
eval(formula(z)[[3]], as.list(coef(z)))
## [1] 1.026125 9.437504

How `poly()` generates orthogonal polynomials? How to understand the "coefs" returned?

My understanding of orthogonal polynomials is that they take the form
y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... up to the number of terms desired
where a1, a2 etc are coefficients to each orthogonal term (vary between fits), and c1, c2 etc are coefficients within the orthogonal terms, determined such that the terms maintain orthogonality (consistent between fits using the same x values)
I understand poly() is used to fit orthogonal polynomials. An example
x = c(1.160, 1.143, 1.126, 1.109, 1.079, 1.053, 1.040, 1.027, 1.015, 1.004, 0.994, 0.985, 0.977) # abscissae not equally spaced
y = c(1.217395, 1.604360, 2.834947, 4.585687, 8.770932, 9.996260, 9.264800, 9.155079, 7.949278, 7.317690, 6.377519, 6.409620, 6.643426)
# construct the orthogonal polynomial
orth_poly <- poly(x, degree = 5)
# fit y to orthogonal polynomial
model <- lm(y ~ orth_poly)
I would like to extract both the coefficients a1, a2 etc, as well as the orthogonal coefficients c1, c2 etc. I'm not sure how to do this. My guess is that
model$coefficients
returns the first set of coefficients, but I'm struggling with how to extract the others. Perhaps within
attributes(orth_poly)$coefs
?
Many thanks.
I have just realized that there was a closely related question Extracting orthogonal polynomial coefficients from R's poly() function? 2 years ago. The answer there is merely explaining what predict.poly does, but my answer gives a complete picture.
Section 1: How does poly represent orthogonal polynomials
My understanding of orthogonal polynomials is that they take the form
y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... up to the number of terms desired
No no, there is no such clean form. poly() generates monic orthogonal polynomials which can be represented by the following recursion algorithm. This is how predict.poly generates linear predictor matrix. Surprisingly, poly itself does not use such recursion but use a brutal force: QR factorization of model matrix of ordinary polynomials for orthogonal span. However, this is equivalent to the recursion.
Section 2: Explanation of the output of poly()
Let's consider an example. Take the x in your post,
X <- poly(x, degree = 5)
# 1 2 3 4 5
# [1,] 0.484259711 0.48436462 0.48074040 0.351250507 0.25411350
# [2,] 0.406027697 0.20038942 -0.06236564 -0.303377083 -0.46801416
# [3,] 0.327795682 -0.02660187 -0.34049024 -0.338222850 -0.11788140
# ... ... ... ... ... ...
#[12,] -0.321069852 0.28705108 -0.15397819 -0.006975615 0.16978124
#[13,] -0.357884918 0.42236400 -0.40180712 0.398738364 -0.34115435
#attr(,"coefs")
#attr(,"coefs")$alpha
#[1] 1.054769 1.078794 1.063917 1.075700 1.063079
#
#attr(,"coefs")$norm2
#[1] 1.000000e+00 1.300000e+01 4.722031e-02 1.028848e-04 2.550358e-07
#[6] 5.567156e-10 1.156628e-12
Here is what those attributes are:
alpha[1] gives the x_bar = mean(x), i.e., the centre;
alpha - alpha[1] gives alpha0, alpha1, ..., alpha4 (alpha5 is computed but dropped before poly returns X, as it won't be used in predict.poly);
The first value of norm2 is always 1. The second to the last are l0, l1, ..., l5, giving the squared column norm of X; l0 is the column squared norm of the dropped P0(x - x_bar), which is always n (i.e., length(x)); while the first 1 is just padded in order for the recursion to proceed inside predict.poly.
beta0, beta1, beta2, ..., beta_5 are not returned, but can be computed by norm2[-1] / norm2[-length(norm2)].
Section 3: Implementing poly using both QR factorization and recursion algorithm
As mentioned earlier, poly does not use recursion, while predict.poly does. Personally I don't understand the logic / reason behind such inconsistent design. Here I would offer a function my_poly written myself that uses recursion to generate the matrix, if QR = FALSE. When QR = TRUE, it is a similar but not identical implementation poly. The code is very well commented, helpful for you to understand both methods.
## return a model matrix for data `x`
my_poly <- function (x, degree = 1, QR = TRUE) {
## check feasibility
if (length(unique(x)) < degree)
stop("insufficient unique data points for specified degree!")
## centring covariates (so that `x` is orthogonal to intercept)
centre <- mean(x)
x <- x - centre
if (QR) {
## QR factorization of design matrix of ordinary polynomial
QR <- qr(outer(x, 0:degree, "^"))
## X <- qr.Q(QR) * rep(diag(QR$qr), each = length(x))
## i.e., column rescaling of Q factor by `diag(R)`
## also drop the intercept
X <- qr.qy(QR, diag(diag(QR$qr), length(x), degree + 1))[, -1, drop = FALSE]
## now columns of `X` are orthorgonal to each other
## i.e., `crossprod(X)` is diagonal
X2 <- X * X
norm2 <- colSums(X * X) ## squared L2 norm
alpha <- drop(crossprod(X2, x)) / norm2
beta <- norm2 / (c(length(x), norm2[-degree]))
colnames(X) <- 1:degree
}
else {
beta <- alpha <- norm2 <- numeric(degree)
## repeat first polynomial `x` on all columns to initialize design matrix X
X <- matrix(x, nrow = length(x), ncol = degree, dimnames = list(NULL, 1:degree))
## compute alpha[1] and beta[1]
norm2[1] <- new_norm <- drop(crossprod(x))
alpha[1] <- sum(x ^ 3) / new_norm
beta[1] <- new_norm / length(x)
if (degree > 1L) {
old_norm <- new_norm
## second polynomial
X[, 2] <- Xi <- (x - alpha[1]) * X[, 1] - beta[1]
norm2[2] <- new_norm <- drop(crossprod(Xi))
alpha[2] <- drop(crossprod(Xi * Xi, x)) / new_norm
beta[2] <- new_norm / old_norm
old_norm <- new_norm
## further polynomials obtained from recursion
i <- 3
while (i <= degree) {
X[, i] <- Xi <- (x - alpha[i - 1]) * X[, i - 1] - beta[i - 1] * X[, i - 2]
norm2[i] <- new_norm <- drop(crossprod(Xi))
alpha[i] <- drop(crossprod(Xi * Xi, x)) / new_norm
beta[i] <- new_norm / old_norm
old_norm <- new_norm
i <- i + 1
}
}
}
## column rescaling so that `crossprod(X)` is an identity matrix
scale <- sqrt(norm2)
X <- X * rep(1 / scale, each = length(x))
## add attributes and return
attr(X, "coefs") <- list(centre = centre, scale = scale, alpha = alpha[-degree], beta = beta[-degree])
X
}
Section 4: Explanation of the output of my_poly
X <- my_poly(x, 5, FALSE)
The resulting matrix is as same as what is generated by poly hence left out. The attributes are not the same.
#attr(,"coefs")
#attr(,"coefs")$centre
#[1] 1.054769
#attr(,"coefs")$scale
#[1] 2.173023e-01 1.014321e-02 5.050106e-04 2.359482e-05 1.075466e-06
#attr(,"coefs")$alpha
#[1] 0.024025005 0.009147498 0.020930616 0.008309835
#attr(,"coefs")$beta
#[1] 0.003632331 0.002178825 0.002478848 0.002182892
my_poly returns construction information more apparently:
centre gives x_bar = mean(x);
scale gives column norms (the square root of norm2 returned by poly);
alpha gives alpha1, alpha2, alpha3, alpha4;
beta gives beta1, beta2, beta3, beta4.
Section 5: Prediction routine for my_poly
Since my_poly returns different attributes, stats:::predict.poly is not compatible with my_poly. Here is the appropriate routine my_predict_poly:
## return a linear predictor matrix, given a model matrix `X` and new data `x`
my_predict_poly <- function (X, x) {
## extract construction info
coefs <- attr(X, "coefs")
centre <- coefs$centre
alpha <- coefs$alpha
beta <- coefs$beta
degree <- ncol(X)
## centring `x`
x <- x - coefs$centre
## repeat first polynomial `x` on all columns to initialize design matrix X
X <- matrix(x, length(x), degree, dimnames = list(NULL, 1:degree))
if (degree > 1L) {
## second polynomial
X[, 2] <- (x - alpha[1]) * X[, 1] - beta[1]
## further polynomials obtained from recursion
i <- 3
while (i <= degree) {
X[, i] <- (x - alpha[i - 1]) * X[, i - 1] - beta[i - 1] * X[, i - 2]
i <- i + 1
}
}
## column rescaling so that `crossprod(X)` is an identity matrix
X * rep(1 / coefs$scale, each = length(x))
}
Consider an example:
set.seed(0); x1 <- runif(5, min(x), max(x))
and
stats:::predict.poly(poly(x, 5), x1)
my_predict_poly(my_poly(x, 5, FALSE), x1)
give exactly the same result predictor matrix:
# 1 2 3 4 5
#[1,] 0.39726381 0.1721267 -0.10562568 -0.3312680 -0.4587345
#[2,] -0.13428822 -0.2050351 0.28374304 -0.0858400 -0.2202396
#[3,] -0.04450277 -0.3259792 0.16493099 0.2393501 -0.2634766
#[4,] 0.12454047 -0.3499992 -0.24270235 0.3411163 0.3891214
#[5,] 0.40695739 0.2034296 -0.05758283 -0.2999763 -0.4682834
Be aware that prediction routine simply takes the existing construction information rather than reconstructing polynomials.
Section 6: Just treat poly and predict.poly as a black box
There is rarely the need to understand everything inside. For statistical modelling it is sufficient to know that poly constructs polynomial basis for model fitting, whose coefficients can be found in lmObject$coefficients. When making prediction, predict.poly never needs be called by user since predict.lm will do it for you. In this way, it is absolutely OK to just treat poly and predict.poly as a black box.

Resources