The input I'm giving to the GLM function is:
Where the family is Gamma and the link function is identity.
I'm trying to manually reproduce the coefficients from my model where one of them is for example:
Estimate Std. Error t value Pr(>|t|)
coefficient A 480.6062 195.2952 2.461 0.013902 *
I know the equation I need for coefficient A is:
βA = (XTX)−1XTY
Where y is my dependent variable and x is my independent variable.
In R I write this to produce βA:
# x transposed multiplied by x when both are matrices
xtx <- t(x) %*% x
# x transposed multiplied by y when both are matrices
xty <- t(x) %*% y
# we need to inverse xtx
xtxinv <- solve(xtx, tol=0)
# finally we multiply the inverse of xtx by xty to get betaHat
betaHat <- xtxinv %*% xty
betaHat = 148
When I complete this calculation manually I get the coefficient that is produced when running a GLM on the default normal Gaussian family without specifying a family. Which looks like this:
So the question is how do I tailor my manual calculation to the Gamma family identity link function instead of the Gaussian identity default that is in the function in R.
The only two differences with my two runs using the glm function are:
providing the family (Gamma identity)
giving the model starting values (100 for each column in the dataframe)
I tried to recreate function manually to get out the coefficient (beta). When I didn't provide a family or starting values I got the correct answer but when I gave Gamma as the family and identity as the link with starting values I get a much different coefficient.

For linear regression, which is fit with least squares, BA is indeed (XTX)-1XTY. However, for generalized linear regression, BA is fit by iteratively weighted least squares, which is an iterative algorithm. Therefore, there is no direct formula to compute BA. However, we can compute the equivalent of the hat matrix H in linear regression. In linear regression, the hat matrix is H=X(XTX)-1XT. In generalized linear model, the analogy of the hat matrix is H=WX(XTWX)-1XT where W = diag(mu'(XB)). In both cases, Hy give the fitted values, yA. Here is code to demonstrate.
#' Test that the two parameterizations of Gamma are the same
curve(dgamma(x, 3, scale=3), xlim=c(0, 10))
grid <- seq(0, 10, length=1000)
d <- 1/grid/gamma(3)*(grid/(1/3)/9)^3*exp(-grid/3)
plot(grid, d, type='l')
#' Generate random variates according to GLM with
#' Y_i ~ Gamma(mean=mu,
#' squared coefficient of variation (variance over squared mean) = phi)
#' Y_i ~ Gamma(shape=alpha, scale=beta)
#' mu = alpha*beta
#' phi= 1/alpha
#' Let Beta = (3, 4)
X <- data.frame(x1=runif(1000, 0, 10))
mu = (3+4*X$x1)^(-1)
for (i in 1:1000) {
alpha = 1/3
beta = mu[i] * 3
y[i]=rgamma(1, alpha, scale=beta)
#' Fit the model and compute the hat matrix, then the fitted values manually
mod <- glm(y ~ ., family=Gamma(), data=X)
x <- as.matrix(cbind(1, X))
W=diag(c(-(x%*%c(3, 4))^(-2)))
#Manual fitted values
#Fitted values from model


Manually get the responses from GLM with gamma distribution and a GLM with inverse guassian distribution

I've been trying to manually get the response values given by the predict.glm function from the stats package in R. However, I'm unable to do so. I only know how to manually get the value with a binomial distribution. I would really appreciate some help. I created two small models (one with Gamma family and one with inverse Gaussian family).
#Gamma distribution model
model_gam <- glm(Rape~Murder + Assault + UrbanPop, data=USArrests, family=Gamma)
responses_gam <- model_gam %>% predict(USArrests[1,], type="response")
#Trying to manually get responses for gamma model
paste(coef(model_gam), names(coef(model_gam)), sep="*", collapse="+")
# "0.108221470842499*(Intercept)+-0.00122165587689519*Murder+-9.47425665022909e-05*Assault+-0.000467789606041651*UrbanPop"
#Murder: 13.2, Assault: 236, UrbanPop: 58
x = 0.108221470842499 - 0.00122165587689519 * 13.2 - 9.47425665022909e-05 * 236 - 0.000467789606041651 * 58
# This is wrong. Do I have to include the dispersion? (which is 0.10609)
print (exp(x)/(1+exp(x)))
# result should be (from predict function): 26.02872
# exp(x)/(1+exp(x)) gives: 0.510649
# Gaussian distribution model
model_gaus <- glm(Rape~Murder + Assault + UrbanPop, data=USArrests, family=inverse.gaussian(link="log"))
responses_gaus <- model_gaus %>% predict(USArrests[1,], type="response")
#Trying to manually get responses for gaussian model
paste(coef(model_gaus), names(coef(model_gaus)), sep="*", collapse="+")
# "0.108221470842499*(Intercept)+-0.00122165587689519*Murder+-9.47425665022909e-05*Assault+-0.000467789606041651*UrbanPop"
x = 1.70049202188329-0.0326196928618521* 13.2 -0.00234379099421488*236-0.00991369000675323*58
# Dispersion in this case is 0.004390825
# result should be (from predict function): 26.02872
# exp(x)/(1+exp(x)) it is: 0.5353866
built-in predict()
predict(model_gaus)["Alabama"] ## 3.259201
by hand
cat(paste(round(coef(model_gaus),5), names(coef(model_gaus)), sep="*", collapse="+"),"\n")
## 1.70049*(Intercept)+0.03262*Murder+0.00234*Assault+0.00991*UrbanPop
## Murder Assault UrbanPop Rape
## Alabama 13.2 236 58 21.2
The value of the intercept is always 1, so we have
## [1] 3.258094
(close enough, since I rounded some things)
You don't need to do anything with the dispersion or the inverse-link function, since the Gaussian model uses an identity link.
using the model matrix
Mathematically, the regression equation is defined as X %*% beta where beta is the vector of coefficients and X is the model matrix (for your example, it's a column of ones for the intercept plus your predictors; for models with categorical predictors or more complex terms like splines, it's a little more complicated). You can extract the model matrix from the matrix with model.matrix():
Xg <- model.matrix(model_gaus)
drop(Xg["Alabama",] %*% coef(model_gaus))
For the Gamma model, you would use exactly the same procedure, but at the end you would transform the linear expression you computed (the linear predictor) by 1/x (the inverse link function for the Gamma). (Note that you need predict(..., type = "response") to get the inverse-transformed prediction; otherwise [default type = "link"] R will give you just the plain linear expression.] If you used a log link instead you would exponentiate. More generally,
invlinkfun <- family(fitted_model)$linkinv
X <- model.matrix(fitted_model)
beta <- coef(fitted_model)
invlinkfun(X %*% beta)
The inverse Gaussian model uses a 1/mu^2 link by default; inverse.gaussian()$linkinv is function(eta) { 1/sqrt(eta) }

95% CI for survival linear combination (interaction) using vcov

i have this model
Where TD is a binary variable, and Strata is a numeric variable equals to {1,2,3}. I need to get 95% CI for this two linear combinations:
I have this function to construct confidence intervals
pwp_gt_int <- coxph(Surv(tstart2,tstop2,status==1) ~ TD+ TD:strata(event)
mod_summ <- summary(pwp_gt_int)
coefs <- modsum$coefficients
X <- model.matrix(pwp_gt_int)
dof <- nrow(X) - ncol(X)
coefs_var <- vcov(pwp_gt_int)
halfCI <- qt(0.975, dof) * sqrt(diag(coefs_var))
matrix(c(coefs - halfCI, coefs + halfCI), nrow=3)
but i need something like this
coefs[2] = coefs[1] + 2*coefs[2]
coefs[3] = coefs[1] + 3*coefs[3]
matrix(c(coefs - halfCI, coefs + halfCI), nrow=3)
But the CI's i got are not plausible, i'm think im not getting right the variance-covariance matrix for the linear combinations.
Please help.
It looks like you're asking for two different things - one is the variance of a linear combination and the other is a confidence interval (and as such, a variance) for a non-linear combination. The linear combination is relatively easy. We know that the variance of a linear combination is:
where A is a matrix of constants and V(b) is the variance-covariance matrix of the random variables (in this case, the coefficients). If your coefficient vector has three values in it, and you want to do as you suggest in your last block of code, then the you would define:
or in R as:
A = matrix(c(1,1,2,0,0,3), ncol=3)
Then, you could make the linear combinations and their variances with:
b <- matrix(coef(pwp_gt_int)[1:3], ncol=1)
V <- vcov(pwp_gt_int)[1:3,1:3]
lincom <- A %*% b
v_lincom <- A %*% V %*% t(A)
sds <- sqrt(diag(v_lincom))
crit <- qt(.975, dof)
cis <- cbind(lincom - crit*sds, sincom + crit*sds)
That would be the confidence interval for the linear combination. The problem is that there isn't such an easy formula for the variance of a non-linear combination. Further, the confidence intervals may be asymmetric. One thing you could do is an end-point transformation, where you take lincom and cis and then exponentiate all of them. Another option would be a parametric bootstrap. Here's what that would look like.
B <- MASS::mvrnorm(2500, b, V)
nlcom <- exp(A %*% b)
nlsim <- exp(A %*% t(B))
nlcis <- apply(nlsim, 1, quantile, c(.025,.975))
Now, nlcis would have the confidence bounds for the non-linear combination. This should work given your data, but without the data to try it out, I'm not sure.

mgcv: obtain predictive distribution of response given new data (negative binomial example)

In GAM (and GLM, for that matter), we're fitting a conditional likelihood model. So after fitting the model, for a new input x and response y, I should be able to compute the predictive probability or density of a specific value of y given x. I might want to do this to compare the fit of various models on validation data, for example. Is there a convenient way to do this with a fitted GAM in mgcv? Otherwise, how do I figure out the exact form of the density that is used so I can plug in the parameters appropriately?
As a specific example, consider a negative binomial GAM :
## From ?negbin
dat <- gamSim(1,n=n)
g <- exp(dat$f/5)
## negative binomial data...
dat$y <- rnbinom(g,size=3,mu=g)
## fit with theta estimation...
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(),data=dat)
And now I want to compute the predictive probability of, say, y=7, given x=(.1,.2,.3,.4).
Yes. mgcv is doing (empirical) Bayesian estimation, so you can obtain predictive distribution. For your example, here is how.
# prediction on the link (with standard error)
l <- predict(b, newdata = data.frame(x0 = 0.1, x1 = 0.2, x2 = 0.3, x3 = 0.4), = TRUE)
# Under central limit theory in GLM theory, link value is normally distributed
# for negative binomial with `log` link, the response is log-normal <- function (mu) dlnorm(mu, l[[1]], l[[2]])
# joint density of `y` and `mu` <- function (y, mu) dnbinom(y, size = 3, mu = mu) *
# marginal probability (not density as negative binomial is discrete) of `y` (integrating out `mu`)
# I have carefully written this function so it can take vector input
p.y <- function (y) {
scalar.p.y <- function (scalar.y) integrate(, lower = 0, upper = Inf, y = scalar.y)[[1]]
sapply(y, scalar.p.y)
Now since you want probability of y = 7, conditional on specified new data, use
# 0.07810065
In general, this approach by numerical integration is not easy. For example, if other link functions like sqrt() is used for negative binomial, the distribution of response is not that straightforward (though also not difficult to derive).
Now I offer a sampling based approach, or Monte Carlo approach. This is most similar to Bayesian procedure.
N <- 1000 # samples size
## draw N samples from posterior of `mu` <- b$family$linkinv(rnorm(N, l[[1]], l[[2]]))
## draw N samples from likelihood `Pr(y|mu)`
sample.y <- rnbinom(1000, size = 3, mu =
## Monte Carlo estimation for `Pr(y = 7)`
mean(sample.y == 7)
# 0.076
Remark 1
Note that as empirical Bayes, all above methods are conditional on estimated smoothing parameters. If you want something like a "full Bayes", set unconditional = TRUE in predict().
Remark 2
Perhaps some people are assuming the solution as simple as this:
mu <- predict(b, newdata = data.frame(x0 = 0.1, x1 = 0.2, x2 = 0.3, x3 = 0.4), type = "response")
dnbinom(7, size = 3, mu = mu)
Such result is conditional on regression coefficients (assumed fixed without uncertainty), thus mu becomes fixed and not random. This is not predictive distribution. Predictive distribution would integrate out uncertainty of model estimation.

Manual Perceptron example in R - are the results acceptable?

I am trying to get a perceptron algorithm for classification working but I think something is missing. This is the decision boundary achieved with logistic regression:
The red dots got into college, after performing better on tests 1 and 2.
This is the data, and this is the code for the logistic regression in R:
dat = read.csv("perceptron.txt", header=F)
colnames(dat) = c("test1","test2","y")
plot(test2 ~ test1, col = as.factor(y), pch = 20, data=dat)
fit = glm(y ~ test1 + test2, family = "binomial", data = dat)
coefs = coef(fit)
(x = c(min(dat[,1])-2, max(dat[,1])+2))
(y = c((-1/coefs[3]) * (coefs[2] * x + coefs[1])))
lines(x, y)
The code for the "manual" implementation of the perceptron is as follows:
dat = read.csv("perceptron.txt", header=F)
dat[,1:2] = apply(dat[,1:2], MARGIN = 2, FUN = function(x) scale(x)) # scaling the data
data = data.frame(rep(1,nrow(dat)), dat) # introducing the "bias" column
colnames(data) = c("bias","test1","test2","y")
data$y[data$y==0] = -1 # Turning 0/1 dependent variable into -1/1.
data = as.matrix(data) # Turning data.frame into matrix to avoid mmult problems.
no.iter = 1000 # Number of loops
theta = rnorm(ncol(data) - 1) # Starting a random vector of coefficients.
theta = theta/sqrt(sum(theta^2)) # Normalizing the vector.
h = theta %*% t(data[,1:3]) # Performing the first f(theta^T X)
for (i in 1:no.iter){ # We will recalculate 1,000 times
for (j in 1:nrow(data)){ # Each time we go through each example.
if(h[j] * data[j, 4] < 0){ # If the hypothesis disagrees with the sign of y,
theta = theta + (sign(data[j,4]) * data[j, 1:3]) # We + or - the example from theta.
theta = theta # Else we let it be.
h = theta %*% t(data[,1:3]) # Calculating h() after iteration.
theta # Final coefficients
mean(sign(h) == data[,4]) # Accuracy
With this, I get the following coefficients:
bias test1 test2
9.131054 19.095881 20.736352
and an accuracy of 88%, consistent with that calculated with the glm() logistic regression function: mean(sign(predict(fit))==data[,4]) of 89% - logically, there is no way of linearly classifying all of the points, as it is obvious from the plot above. In fact, iterating only 10 times and plotting the accuracy, a ~90% is reach after just 1 iteration:
Being in line with the training classification performance of logistic regression, it is likely that the code is not conceptually wrong.
QUESTIONS: Is it OK to get coefficients so different from the logistic regression:
(Intercept) test1 test2
1.718449 4.012903 3.743903
This is really more of a CrossValidated question than a StackOverflow question, but I'll go ahead and answer.
Yes, it's normal and expected to get very different coefficients because you can't directly compare the magnitude of the coefficients between these 2 techniques.
With the logit (logistic) model you're using a binomial distribution and logit-link based on a sigmoid cost function. The coefficients are only meaningful in this context. You've also got an intercept term in the logit.
None of this is true for the perceptron model. The interpretation of the coefficients are thus totally different.
Now, that's not saying anything about which model is better. There aren't comparable performance metrics in your question that would allow us to determine that. To determine that you should do cross-validation or at least use a holdout sample.

calculate vector valued Hessian in R

I want to calculate a variance-covariance matrix of parameters. The parameters are obtained by a non-linear least squares fit.
t <- seq(0.1,20,0.3)
a <- 20
b <- 14
c <- 0.4
jitter <- rnorm(length(t),0,0.5)
Hobs <- a+b*exp(-c*t)+jitter
function def
Hhat <- function(parList, t) {parList$a + parList$b*exp(-parL
Hhatde <- function(par, t) {par[1] + par[2]*exp(-par[3]*t)}st$c*t)}
residFun <- function(par, t, observed) observed - Hhat(par,t)
initial conditions
parStart = list(a = 20, b = 10 ,c = 0.5)
out1 <- nls.lm(par = parStart, fn = residFun, observed = Hobs,
t = t, control = nls.lm.control(nprint=0))
I wish to calculate manually what is given back via vcov(out1)
I tried it with: but sigma and vcov(out1) which don't seem to be the same
J <- jacobian(Hhatde, c(19.9508523,14.6586555,0.4066367 ), method="Richardson",
sigma <- solve((t(J)%*%J))
now trying to do it with the hessian, I can't get it working for error message cf below
H <- hessian(Hhatde, x = c(19.9508523,14.6586555,0.4066367 ), method="complex", method.args=list(),t=t)
Error in hessian.default(Hhatde, x = c(19.9508523, 14.6586555, 0.4066367), :
Richardson method for hessian assumes a scalar valued function.
How do I do I get my hessian() to work.
I am not very strong on the math here, hence the trial and error approach.
vcov(out1) returns an estimate of the scaled variance-covariance matrix for the parameters in your model. The inverse of the cross product of the gradient, solve(crossprod(J)) returns an estimate of the unscaled variance-covariance matrix. The scaling factor is the estimated variance of the errors. So to calculate the scaled variance-covariance matrix (with some rounding error) using the gradient and the residuals from your model:
df = length(Hobs) - length(out1$par) # degrees freedom
se_var = sum(out1$fvec^2) / df # estimated error variance
var_cov = se_var * solve(crossprod(J)) # scaled variance-covariance
To brush up on non-linear regression and non-linear least squares, you might wish to check out Seber & Wild's Nonlinear regression, or Bates & Watts' Nonlinear regression analysis and its applications. John Fox also has a short online appendix that you may find helpful.
