Lack of convergence of glmnet when lambda=0 for family="poisson" - r

While getting a handle on glmnet versus glm, I ran into convergence problems for lambda=0 and family="poisson". My understanding is that with lambda=0 (and alpha=1, the default), the answers should be essentially the same.
Below is code changed slightly from the poisson example on the glmnet help page (?glmnet). The only change is that nzc = p so that all variables are in the true model
N=1000; p=50
nzc=p
x=matrix(rnorm(N*p),N,p)
beta=rnorm(nzc)
f = x[,seq(nzc)]%*%beta
mu=exp(f)
y=rpois(N,mu)
#With lambda=0 glmnet throws the convergence error shown below
fit=glmnet(x,y,family="poisson",lambda=0)
#It works with default lambda passed in
# but estimates are quite different from glm.
fit=glmnet(x,y,family="poisson") #use default lambdas
fit2=glm(y~x,family="poisson")
plot(coef(fit2)[2:(p+1)],
coef(fit,s=min(fit$lambda))[2:(p+1)],
xlab="glm",ylab="glmnet")
abline(0,1)
#works fine with gaussian response and lambda=0 or default lambda
#glm and glmnet identical
mu = f
y=rnorm(N,mu)
fit=glmnet(x,y,family="gaussian",lambda=0)
fit2=glm(y~x)
plot(coef(fit2)[2:(p+1)], coef(fit)[2:(p+1)])
abline(0,1)
Here's the error message
Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda value not reached after maxit=100000 iterations; solutions for larger lambdas returned
2: In getcoef(fit, nvars, nx, vnames) :an empty model has been returned; probably a convergence issue
Updated:
The problem seems to be with the intercept being estimated by glmnet when family="poisson" and not related to the setting of lambda per se.
fit=glmnet(x,y,family="poisson")
#intercept should be close to 0
coef(fit)[1,]
#but it is huge
#passing in intercept=FALSE however generates the convergence error again
fit=glmnet(x,y,family="poisson", intercept=FALSE)

I think you are confused about lambda and alpha. alpha is the penalization factor which is set to 0 will give you ridge regression. Typically it is set to something between 0.1 and 1. lambda is typically not set, and there is a warning on the help page NOT to set it to a single value:
WARNING: use with care. Do not supply a single value for lambda
I don't know why you think a lasso penalty should be the same as an unpenalized Poisson model. The whole point of a penalized model is to be less subject to the biases and constraints of an ordinary regression model.

You get the error because you try to pass lambda = 0 to glmnet.
If you want to select the coefficients from glmnet for lambda = 0, you could use:
coef(fit, s=0)
This automatically selects the last (smallest) value of lambda. I guess you've basically done that already though, with s = min(fit$lambda). If you want to go even smaller than that you might have to manually put in a lambda sequence, but this is a little bit tricky (glmnet seems a little bit stubborn about its lambda's).
Also keep in mind that there might be some bias in glmnet, so it could be slightly different from the results of glm.

Related

How is glmnet dealing with high number of unpenalized parameters?

I'm trying to fit a poisson model with glmnet where I know that a lot of the parameters have to be included in the model and then unpenalized. I'm setting their penalty.factor to 0. Unfortunately, glmnet has issue with convergence as the number of unpenalized parameters is high. Here is an example from the glmnet manual where the algorithm doesn't converge if the number of unpenalized parameters is high.
library(glmnet)
N=500; p=300
nzc=5
x=matrix(rnorm(N*p),N,p)
beta=rnorm(nzc)
f = x[,seq(nzc)]%*%beta
mu=exp(f)
y=rpois(N,mu)
pena=rep(0,ncol(x))
pena[1:10]=1 # penalized only the first ten parameters
fit=glmnet(x,y,family="poisson",penalty.factor=pena,alpha=0.9)
# fit is not converging
pena=rep(0,ncol(x))
pena[1:250]=1 # penalized the first 250 parameters
fit=glmnet(x,y,family="poisson",penalty.factor=pena,alpha=0.9)
# fit is converging
I know that the model doesn't make sense for this example but at least I can reproduce the same error as in my data. It looks like lambda_max is set to infinity. Does it mean that the algorithm can't find a minimum lambda_max where all the penalized parameters are zero? How can I fit a model where a lot of the parameters are unpenalized?
Thanks a lot

GLMNet convergence issue for penalized regression

I am working on network models for political networks. One of the things I am doing is penalized inference. I am using an adaptive lasso approach by setting a penalty factor for glmnet. I have various parameters in my model: alphas and phis. The alphas are fixed effects so I want to keep them in the model while the phis are being penalized.
I have starting coefficients from the MLE estimation process of glm() to compute the adaptive weights that are set through the penalty factor of glmnet().
This is the code:
# Generate Generalized Linear Model
GenLinMod = glm(y ~ X, family = "poisson")
# Set coefficients
coefficients = coef(GenLinMod)
# Set penalty
penalty = 1/(coefficients[-1])^2
# Protect alphas
penalty[1:(n-1)] = 0
# Generate Generalized Linear Model with adaptive lasso procedure
GenLinModNet = glmnet(XS, y, family = "poisson", penalty.factor = penalty, standardize = FALSE)
For some networks this code executes just fine, however I have certain networks for which I get these errors:
Error: Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
In addition: Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda value not reached after maxit=100000 iterations; solutions for larger lambdas returned
2: In getcoef(fit, nvars, nx, vnames) :
an empty model has been returned; probably a convergence issue
The odd thing is that they all use the same code, so I am wondering if it is a data problem.
Additional information:
+In one case I have over 500 alphas and 21 phis and these errors appear, in another case that does not work I have 200 alphas and 28 phis. But on the other hand I have a case with over 600 alphas and 28 phis and it converges nicely.
+I have tried settings for lambda.min.ratio and nlambda to no avail.
Additional question: Is the first entry of penalty the one associated with the intercept? Or is it added automatically by glmnet()? I did not find clarity about this in the glmnet vignette. My thoughts are that I shouldn't include a term for the intercept, since it's said that the penalty is internally rescaled to sum nvars and I assume the intercept isn't one of my variables.
I'm not 100% sure about this, but I think I have found the root of the problem.
I've tried to use all kinds of manual lambda sequences, even trying very large starting lambda's (1000's). This all seemed to do no good at all. However, when I tried without penalizing the alpha's, everything would converge nicely. So it probably has something to do with the amount of unpenalized variables. Maybe keeping all alpha's unpenalized forces glmnet in some divergent state. Maybe there is some sort of collinearity going on. My "solution", which is basically just doing something else, is to penalize the alpha's with the same weigth that is used for one of the phi's. This works on the assumption that some phi's are significant and the alpha's can be just as significant, instead of being fixed (which makes them infinitely significant). I'm not completely satisfied, because this is just a different approach, but it might be interesting to note that it probably has something to do with the amount of unpenalized variables.
Also, to answer my additional question: In the glmnet vignette it says that the penalty term is internally rescaled to sum to nvars. Since the intercept is not one of the variables, my guess is that it is not needed in the penalty term. Though, I've tried with both including and excluding the term, results seem to be the same. So maybe glmnet automatically removes it if it detects that the length is +1 of what it should be.

lambda sequence in glmnet in ridge vs lasso

I'm having a problem with ridge cv in glmnet calculating an unreasonable lambda sequence.
I"m running both ridge and lasso regression with glmnet using the exact same data. Lasso is fine, but ridge isn't.
ridge.cv <- cv.glmnet(preds[train.i,], resp[train.i], alpha=0, family="binomial", type.measure="class")
lasso.cv <- cv.glmnet(preds[train.i,], resp[train.i], alpha=1, family="binomial", type.measure="class")
range(lasso.cv$glmnet.fit$dev.ratio)
[1] 1.117039e-14 9.334558e-01
range(ridge.cv$glmnet.fit$dev.ratio)
[1] 1.117039e-14 1.852909e-01
> range( lasso.cv$lambda)
[1] 0.002812585 0.268474838
> range(ridge.cv$lambda)
[1] 2.812585 268.474838
So, Lasso calculates a reasonable lambda sequence, yielding a reasonable range of deviance explained. However, Ridge calculates a lambda sequence exactly 1000 times higher than that of lasso, yielding a ridiculous range of deviance explained. Dimensions of predictor matrix is 891 x 1028
Any idea why might that happen and how to fix it? I can of course input my own sequence, but I'd like to know why it happens in case it is just a symptom of a bigger problem.
From the glmnet help file:
lambda: The actual sequence of ‘lambda’ values used. When ‘alpha=0’, the largest lambda reported does not quite give the zero coefficients reported (‘lambda=inf’ would in principle). Instead, the largest ‘lambda’ for ‘alpha=0.001’ is used, and the sequence of ‘lambda’ values is derived from this.
Basically, in the case of ridge regression it's deriving lambda.max (i.e., the value of lambda that causes all coefficients to vanish) from alpha = 0.001, which would be exactly 1000 times larger than lambda.max derived from alpha = 1 (the LASSO case).
I'm not exactly sure what you mean by "fix it", since the range of meaningful lambda values changes depending on your value of alpha.

glmnet not converging for lambda.min from cv.glmnet

I ran a 20-fold cv.glmnet lasso model to obtain the "optimal" value for lambda. However, when I attempt to reproduce the results from glmnet(), the I get an error that reads:
Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda
value not reached after maxit=100000 iterations; solutions for larger
lambdas returned
2: In getcoef(fit, nvars, nx, vnames) :
an empty model has been returned; probably a convergence issue
My code reads as such:
set.seed(5)
cv.out <- cv.glmnet(x[train,],y[train],family="binomial",nfolds=20,alpha=1,parallel=TRUE)
coef(cv.out)
bestlam <- cv.out$lambda.min
lasso.mod.best <- glmnet(x[train,],y[train],alpha=1,family="binomial",lambda=bestlam)
Now, the value of bestlam above is 2.976023e-05 so perhaps this is causing the problem? Is it a rounding issue on the value of lambda? Is there a reason why I can't reproduce the results directly from the glmnet() function? If I use a vector of lambda values in the similar range to this value of bestlam, I do not have any issues.
You're passing a single lambda to your glmnet (lambda=bestlab) which is a big no-no (you're attempting to train a model just using one lambda value).
From the glmnet documentation (?glmnet):
lambda: A user supplied lambda sequence. Typical usage is to have the
program compute its own lambda sequence based on nlambda and
lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use
with care. Do not supply a single value for lambda (for predictions after CV
use predict() instead). Supply instead a decreasing sequence of lambda
values. glmnet relies on its warms starts for speed, and its often faster to
fit a whole path than compute a single fit.
glmnet is a little tricky in that respect - you'll want to run your best model with a series of lambdas (e.g., set nlambda=101), and then when you predict set s=bestlam and exact=FALSE.

Which function/package for robust linear regression works with glmulti (i.e., behaves like glm)?

Background: Multi-model inference with glmulti
glmulti is a R function/package for automated model selection for general linear models that constructs all possible general linear models given a dependent variable and a set of predictors, fits them via the classic glm function and allows then for multi-model inference (e.g., using model weights derived from AICc, BIC). glmulti works in theory also with any other function that returns coefficients, the log-likelihood of the model and the number of free parameters (and maybe other information?) in the same format that glm does.
My goal: Multi-model inference with robust errors
I would like to use glmulti with robust modeling of the errors of a quantitative dependent variable to guard against the effect out outliers.
For example, I could assume that the errors in the linear model are distributed as a t distribution instead of as a normal distribution. With its kurtosis parameter the t distribution can have heavy tails and is thus more robust to outliers (as compared to the normal distribution).
However, I'm not committed to using the t distribution approach. I'm happy with any approach that gives back a log-likelihood and thus works with the multimodel approach in glmulti. But that means, that unfortunately I cannot use the well-known robust linear models in R (e.g., lmRob from robust or lmrob from robustbase) because they do not operate under the log-likelihood framework and thus cannot work with glmulti.
The problem: I can't find a robust regression function that works with glmulti
The only robust linear regression function for R I found that operates under the log-likelihood framework is heavyLm (from the heavy package); it models the errors with a t distribution. Unfortunately, heavyLm does not work with glmulti (at least not out of the box) because it has no S3 method for loglik (and possibly other things).
To illustrate:
library(glmulti)
library(heavy)
Using the dataset stackloss
head(stackloss)
Regular Gaussian linear model:
summary(glm(stack.loss ~ ., data = stackloss))
Multi-model inference with glmulti using glm's default Gaussian link function
stackloss.glmulti <- glmulti(stack.loss ~ ., data = stackloss, level=1, crit=bic)
print(stackloss.glmulti)
plot(stackloss.glmulti)
Linear model with t distributed error (default is df=4)
summary(heavyLm(stack.loss ~ ., data = stackloss))
Multi-model inference with glmulti calling heavyLm as the fitting function
stackloss.heavyLm.glmulti <- glmulti(stack.loss ~ .,
data = stackloss, level=1, crit=bic, fitfunction=heavyLm)
gives the following error:
Initialization...
Error in UseMethod("logLik") :
no applicable method for 'logLik' applied to an object of class "heavyLm".
If I define the following function,
logLik.heavyLm <- function(x){x$logLik}
glmulti can get the log-likelihood, but then the next error occurs:
Initialization...
Error in .jcall(molly, "V", "supplyErrorDF",
as.integer(attr(logLik(fitfunc(as.formula(paste(y, :
method supplyErrorDF with signature ([I)V not found
The question: Which function/package for robust linear regression works with glmulti (i.e., behaves like glm)?
There is probably a way to define further functions to get heavyLm working with glmulti, but before embarking on this journey I wanted to ask whether anybody
knows of a robust linear regression function that (a) operates under the log-likelihood framework and (b) behaves like glm (and will thus work with glmulti out-of-the-box).
got heavyLm already working with glmulti.
Any help is very much appreciated!
Here is an answer using heavyLm. Even though this is a relatively old question, the same problem that you mentioned still occurs when using heavyLm (i.e., the error message Error in .jcall(molly, "V", "supplyErrorDF"…).
The problem is that glmulti requires the degrees of freedom of the model, to be passed as an attribute of you need to provide as an attribute of the value returned by function logLik.heavyLm; see the documentation for the function logLik for details. Moreover, it turns out that you also need to provide a function to return the number of data points that were used for fitting the model, since the information criteria (AIC, BIC, …) depend on this value too. This is done by function nobs.heavyLm in the code below.
Here is the code:
nobs.heavyLm <- function(mdl) mdl$dims[1] # the sample size (number of data points)
logLik.heavyLm <- function(mdl) {
res <- mdl$logLik
attr(res, "nobs") <- nobs.heavyLm(mdl) # this is not really needed for 'glmulti', but is included to adhere to the format of 'logLik'
attr(res, "df") <- length(mdl$coefficients) + 1 + 1 # I am also considering the scale parameter that is estimated; see mdl$family
class(res) <- "logLik"
res
}
which, when put together with the code that you provided, produces the following result:
Initialization...
TASK: Exhaustive screening of candidate set.
Fitting...
Completed.
> print(stackloss.glmulti)
glmulti.analysis
Method: h / Fitting: glm / IC used: bic
Level: 1 / Marginality: FALSE
From 8 models:
Best IC: 117.892471265874
Best model:
[1] "stack.loss ~ 1 + Air.Flow + Water.Temp"
Evidence weight: 0.709174196998897
Worst IC: 162.083142797858
2 models within 2 IC units.
1 models to reach 95% of evidence weight.
producing therefore 2 models within the 2 BIC units threshold.
An important remark though: I am not sure that the expression above for the degrees of freedom is strictly correct. For a standard linear model, the degrees of freedom would be equal to p + 1, where p is the number of parameters in the model, and the extra parameter (the + 1) is the "error" variance (which is used to calculate the likelihood). In function logLik.heavyLm above, it is not clear to me whether one should also count the "scale parameter" that is estimated by heavyLm as an extra degree of freedom, and hence the p + 1 + 1, which would be the case if the likelihood is also a function of this parameter. Unfortunately, I cannot confirm this, since I don’t have access to the reference that heavyLm cites (the paper by Dempster et al., 1980). Because of this, I am counting the scale parameter, thereby providing a (slightly more) conservative estimate of model complexity, penalizing "complex" models. This difference should be negligible, except in the small sample case.

Resources