Understanding where the link function is computed in Poisson regression - r

From Poisson Regression by hand this 'manual' Poisson coefficient function is provided:
LogLike <- function(y,x, par) {
beta <- par
# the deterministic part of the model:
lambda <- exp(beta%*%t(x))
# and here comes the negative log-likelihood of the whole dataset, given the
# model:
LL <- -sum(dpois(y, lambda, log = TRUE))
return(LL)
}
PoisMod<-function(formula, data){
# # definiere Regressionsformel
form <- formula(formula)
#
# # dataFrame wird erzeugt
model <- model.frame(formula, data = data)
#
# # Designmatrix erzeugt
x <- model.matrix(formula,data = data)
#
# # Response Variable erzeugt
y <- model.response(model)
par <- rep(0,ncol(x))
erg <- list(optim(par=par,fn=LogLike,x=x,y=y)$par)
return(erg)
}
PoisMod(breaks~wool+tension, as.data.frame(daten))
glm(breaks~wool+tension, family = "poisson", data = as.data.frame(daten))
Can any one tell me exactly where the link function is computed here? What would this code look like with an identity link function? I have basic understanding from YouTube videos etc, but no one explains the actual computation.
How would this code look like with offset and weights?

The GLM is specified as:
or equivalently as
In the case of the Poisson model, the canonical link is , so
You can see that in your code where lambda = exp(beta %*% t(x)) If you wanted to estimate a model with an identity link, you would use lambda = beta %*% t(x) because in that case
Addressing the follow-up questions in the comments, with weights, often these are used to weight the likelihood contribution for each observation. So, imagine you had a variable wt that you passed into your function LogLike(), you could modify it by multiplying the likelihood contribution by the weight:
LL <- -sum(dpois(y, lambda, log = TRUE)*wt)
An offset is just a variable whose coefficient is forced to be 1 (see this post for a discussion). Lets say you had a variable called offset that you wanted to include in your model. Again, you could pass that argument to LogLike() and you would need to modify the function by:
lambda <- exp(beta%*%t(x) + offset)
Following the example in the CrossValidated post I linked, offset = log(time). Which offset is correct is more of a theoretical or substantive matter.

Related

Can glmmLasso be used with the Tweedie distribution?

I have a linear mixed effects model and I am trying to do variable selection. The model is testing the level of forest degradation in 1000 sampled points. Most points have no degradation, and so the dependent variable is highly skewed with many zeros. Therefore, I am using the Tweedie distribution to fit the model. My main question is: can the Tweedie distribution actually be used in the glmmLasso function? My second question is: do I even need to use this distribution in glmmLasso()? Any help is much appreciated!
When I run the function with family = tweedie(var.power=1.2,link.power=0) I get the following error:
Error in logLik.glmmLasso(y = y, yhelp = yhelp, mu = mu, family = family, :
object 'loglik' not found
If I change the link.power from 0 to 1 (which I think is not correct for my model, but just for the sake of figuring out the problem), I get a different error:
Error in grad.lasso[b.is.0] <- score.beta[b.is.0] - lambda.b * sign(score.beta[b.is.0]) :
NAs are not allowed in subscripted assignments
Here tweedie comes from the statmod package. A simple example:
library(tweedie)
library(tidyverse)
library(glmmLasso)
library(statmod)
power <- 2
mu <- 1
phi <- seq(2, 8, by=0.1)
set.seed(10000)
y <- rtweedie( 100, mu=mu, power=power, phi=3)
x <- rnorm(100)
z <- c(rep(1, 50), rep(2,50))
df = as.data.frame(cbind(y,x,z))
df$z = as.factor(df$z)
f = y ~ x
varSelect = glmmLasso(fix = f, rnd = list(z=~1), data = df,
lambda = 5, family = tweedie(var.power=1.2,link.power=0))
I created a hacked version of glmmLasso that incorporates the Tweedie distribution as an option and put it on Github. I had to change two aspects of the code:
add a clause to compute the log-likelihood if family$family == "Tweedie"
in a number of places where the code was essentially if (family$family in list_of_families) ..., add "Tweedie" as an option.
remotes::install_github("bbolker/glmmLasso-bmb")
packageVersion("glmmLasso")
## [1] ‘1.6.2.9000’
Your example runs for me now, but I haven't checked at all to see if the results are sensible.

Quick way to calculate a confidence interval after changing dispersion parameter

I'm teaching a modeling class in R. The students are all SAS users, and I have to create course materials that exactly match (when possible) SAS output. I'm working on the Poisson regression section and trying to match PROC GENMOD, with a "dscale" option that modifies the dispersion index so that the deviance/df==1.
Easy enough to do, but I need confidence intervals. I'd like to show the students how to do it without hand calculating them. Something akin to confint_default() or confint()
Data
skin_cancer <- data.frame(CASES=c(1,16,30,71,102,130,133,40,4,38,
119,221,259,310,226,65),
CITY=c(rep(0,8),rep(1,8)),
N=c(172875, 123065,96216,92051,72159,54722,
32185,8328,181343,146207,121374,111353,
83004,55932,29007,7583),
agegp=c(1:8,1:8))
skin_cancer$ln_n = log(skin_cancer$N)
The model
fit <- glm(CASES ~ CITY, family="poisson", offset=ln_n, data=skin_cancer)
Changing the dispersion index
summary(fit, dispersion= deviance(fit) / df.residual(fit)))
That gets me the "correct" standard errors (correct according to SAS). But obviously I can't run confint() on a summary() object.
Any ideas? Bonus points if you can tell me how to change the dispersion index within the model so I don't have to do it within the summary() call.
Thanks.
This is an interesting question, and slightly deeper than it seems.
The simplest potential answer is to use family="quasipoisson" instead of poisson:
fitQ <- update(fit, family="quasipoisson")
confint(fitQ)
However, this won't let you adjust the dispersion to be whatever you want; it specifically changes the dispersion to the estimate R calculates in summary.glm, which is based on the Pearson chi-squared (sum of squared Pearson residuals) rather than the deviance, i.e.
sum((object$weights * object$residuals^2)[object$weights > 0])/df.r
You should be aware that stats:::confint.glm() (which actually uses MASS:::confint.glm) computes profile confidence intervals rather than Wald confidence intervals (i.e., this is not just a matter of adjusting the standard deviations).
If you're satisfied with Wald confidence intervals (which are generally less accurate) you could hack stats::confint.default() as follows (note that the dispersion title is a little bit misleading, as this function basically assumes that the original dispersion of the model is fixed to 1: this won't work as expected if you use a model that estimates dispersion).
confint_wald_glm <- function(object, parm, level=0.95, dispersion=NULL) {
cf <- coef(object)
pnames <- names(cf)
if (missing(parm))
parm <- pnames
else if (is.numeric(parm))
parm <- pnames[parm]
a <- (1 - level)/2
a <- c(a, 1 - a)
pct <- stats:::format.perc(a, 3)
fac <- qnorm(a)
ci <- array(NA, dim = c(length(parm), 2L), dimnames = list(parm,
pct))
ses <- sqrt(diag(vcov(object)))[parm]
if (!is.null(dispersion)) ses <- sqrt(dispersion)*ses
ci[] <- cf[parm] + ses %o% fac
ci
}
confint_wald_glm(fit)
confint_wald_glm(fit,dispersion=2)

Compute coefficients of a saturated model by hand

I want compute a null model,saturated model and a proposed model for a poisson regression by hand. For that i designed a loglikelihood function and optimize it with the optim function. It worked well for the null and the proposed model. For the computation of the coefficients of the saturatetd model i get an error : "Error in beta %*% t(x) : non-conformable arguments". I know what the error means (dimensions of the matrices doesn't fit) but i dont know how to fix it, maybe you can help.
data <- as.data.frame(warpbreaks)
# Function for loglikelihood
LogLike <- function(y,x, par) {
beta <- par
# the deterministic part of the model:
lambda <- exp(beta%*%t(x))
# and here comes the negative log-likelihood of the whole dataset, given the
# model:
LL <- -sum(dpois(y, lambda, log = TRUE))
return(LL)
}
formula <- breaks~wool+tension
form <- formula(formula)
# dataFrame
model <- model.frame(formula, data = data)
# Designmatrix for proposed modell
x <- model.matrix(formula,data = data)
# Response Variable
y <- model.response(model)
# modelMatrix for null Modell
x1 <- as.matrix(x[,1])
# Computation Koef nullmodell
par1 <- rep(0,1)
koef <- round(optim(par=par1,fn=LogLike,x=x1,y=y)$par,4)
koef
# Computation koef proposed Modell
par2 <- rep(0,ncol(x))
koef2 <- round(optim(par=par2,fn=LogLike,x=x,y=y)$par,4)
koef2
# Computation koef saturated Modell
par3<- rep(0,length(y))
koef3 <- round(optim(par=par3,fn=LogLike,x=x,y=y)$par,4)
koef3
A saturated model is a model that have a many parameters as data points. You need to build such a x matrix and everything should work. So far, x is a 54x4 matrix while it should be a 54x54.
--- Advice (FWIW)
Avoid using functions' names as variables (beta and par in your case).
Hope this helps.

Manual Perceptron example in R - are the results acceptable?

I am trying to get a perceptron algorithm for classification working but I think something is missing. This is the decision boundary achieved with logistic regression:
The red dots got into college, after performing better on tests 1 and 2.
This is the data, and this is the code for the logistic regression in R:
dat = read.csv("perceptron.txt", header=F)
colnames(dat) = c("test1","test2","y")
plot(test2 ~ test1, col = as.factor(y), pch = 20, data=dat)
fit = glm(y ~ test1 + test2, family = "binomial", data = dat)
coefs = coef(fit)
(x = c(min(dat[,1])-2, max(dat[,1])+2))
(y = c((-1/coefs[3]) * (coefs[2] * x + coefs[1])))
lines(x, y)
The code for the "manual" implementation of the perceptron is as follows:
# DATA PRE-PROCESSING:
dat = read.csv("perceptron.txt", header=F)
dat[,1:2] = apply(dat[,1:2], MARGIN = 2, FUN = function(x) scale(x)) # scaling the data
data = data.frame(rep(1,nrow(dat)), dat) # introducing the "bias" column
colnames(data) = c("bias","test1","test2","y")
data$y[data$y==0] = -1 # Turning 0/1 dependent variable into -1/1.
data = as.matrix(data) # Turning data.frame into matrix to avoid mmult problems.
# PERCEPTRON:
set.seed(62416)
no.iter = 1000 # Number of loops
theta = rnorm(ncol(data) - 1) # Starting a random vector of coefficients.
theta = theta/sqrt(sum(theta^2)) # Normalizing the vector.
h = theta %*% t(data[,1:3]) # Performing the first f(theta^T X)
for (i in 1:no.iter){ # We will recalculate 1,000 times
for (j in 1:nrow(data)){ # Each time we go through each example.
if(h[j] * data[j, 4] < 0){ # If the hypothesis disagrees with the sign of y,
theta = theta + (sign(data[j,4]) * data[j, 1:3]) # We + or - the example from theta.
}
else
theta = theta # Else we let it be.
}
h = theta %*% t(data[,1:3]) # Calculating h() after iteration.
}
theta # Final coefficients
mean(sign(h) == data[,4]) # Accuracy
With this, I get the following coefficients:
bias test1 test2
9.131054 19.095881 20.736352
and an accuracy of 88%, consistent with that calculated with the glm() logistic regression function: mean(sign(predict(fit))==data[,4]) of 89% - logically, there is no way of linearly classifying all of the points, as it is obvious from the plot above. In fact, iterating only 10 times and plotting the accuracy, a ~90% is reach after just 1 iteration:
Being in line with the training classification performance of logistic regression, it is likely that the code is not conceptually wrong.
QUESTIONS: Is it OK to get coefficients so different from the logistic regression:
(Intercept) test1 test2
1.718449 4.012903 3.743903
This is really more of a CrossValidated question than a StackOverflow question, but I'll go ahead and answer.
Yes, it's normal and expected to get very different coefficients because you can't directly compare the magnitude of the coefficients between these 2 techniques.
With the logit (logistic) model you're using a binomial distribution and logit-link based on a sigmoid cost function. The coefficients are only meaningful in this context. You've also got an intercept term in the logit.
None of this is true for the perceptron model. The interpretation of the coefficients are thus totally different.
Now, that's not saying anything about which model is better. There aren't comparable performance metrics in your question that would allow us to determine that. To determine that you should do cross-validation or at least use a holdout sample.

Modifying a curve to prevent singular gradient matrix at initial parameter estimates

I want to use y=a^(b^x) to fit the data below,
y <- c(1.0385, 1.0195, 1.0176, 1.0100, 1.0090, 1.0079, 1.0068, 1.0099, 1.0038)
x <- c(3,4,5,6,7,8,9,10,11)
data <- data.frame(x,y)
When I use the non-linear least squares procedure,
f <- function(x,a,b) {a^(b^x)}
(m <- nls(y ~ f(x,a,b), data = data, start = c(a=1, b=0.5)))
it produces an error: singular gradient matrix at initial parameter estimates. The result is roughly a = 1.1466, b = 0.6415, so there shouldn't be a problem with intial parameter estimates as I have defined them as a=1, b=0.5.
I have read in other topics that it is convenient to modify the curve. I was thinking about something like log y=log a *(b^x), but I don't know how to deal with function specification. Any idea?
I will expand my comment into an answer.
If I use the following:
y <- c(1.0385, 1.0195, 1.0176, 1.0100, 1.0090, 1.0079, 1.0068, 1.0099, 1.0038)
x <- c(3,4,5,6,7,8,9,10,11)
data <- data.frame(x,y)
f <- function(x,a,b) {a^b^x}
(m <- nls(y ~ f(x,a,b), data = data, start = c(a=0.9, b=0.6)))
or
(m <- nls(y ~ f(x,a,b), data = data, start = c(a=1.2, b=0.4)))
I obtain:
Nonlinear regression model
model: y ~ f(x, a, b)
data: data
a b
1.0934 0.7242
residual sum-of-squares: 0.0001006
Number of iterations to convergence: 10
Achieved convergence tolerance: 3.301e-06
I always obtain an error if I use 1 as a starting value for a, perhaps because 1 raised to anything is 1.
As for automatically generating starting values, I am not familiar with a procedure to do that. One method I have read about is to simulate curves and use starting values that generate a curve that appears to approximate your data.
Here is the plot generated using the above parameter estimates using the following code. I admit that maybe the lower right portion of the line could fit a little better:
setwd('c:/users/mmiller21/simple R programs/')
jpeg(filename = "nlr.plot.jpeg")
plot(x,y)
curve(1.0934^(0.7242^x), from=0, to=11, add=TRUE)
dev.off()

Resources