glmnet not converging for lambda.min from cv.glmnet - r

I ran a 20-fold cv.glmnet lasso model to obtain the "optimal" value for lambda. However, when I attempt to reproduce the results from glmnet(), the I get an error that reads:
Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda
value not reached after maxit=100000 iterations; solutions for larger
lambdas returned
2: In getcoef(fit, nvars, nx, vnames) :
an empty model has been returned; probably a convergence issue
My code reads as such:
set.seed(5)
cv.out <- cv.glmnet(x[train,],y[train],family="binomial",nfolds=20,alpha=1,parallel=TRUE)
coef(cv.out)
bestlam <- cv.out$lambda.min
lasso.mod.best <- glmnet(x[train,],y[train],alpha=1,family="binomial",lambda=bestlam)
Now, the value of bestlam above is 2.976023e-05 so perhaps this is causing the problem? Is it a rounding issue on the value of lambda? Is there a reason why I can't reproduce the results directly from the glmnet() function? If I use a vector of lambda values in the similar range to this value of bestlam, I do not have any issues.

You're passing a single lambda to your glmnet (lambda=bestlab) which is a big no-no (you're attempting to train a model just using one lambda value).
From the glmnet documentation (?glmnet):
lambda: A user supplied lambda sequence. Typical usage is to have the
program compute its own lambda sequence based on nlambda and
lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use
with care. Do not supply a single value for lambda (for predictions after CV
use predict() instead). Supply instead a decreasing sequence of lambda
values. glmnet relies on its warms starts for speed, and its often faster to
fit a whole path than compute a single fit.

glmnet is a little tricky in that respect - you'll want to run your best model with a series of lambdas (e.g., set nlambda=101), and then when you predict set s=bestlam and exact=FALSE.

Related

Can dismo::evaluate() be used for a model fit with glmnet() or cv.glmnet()?

I'm using the glmnet package to create a species distribution model (SDM) based on a lasso regression. I've succesfully fit models using glmnet::cv.glmnet(), and I can use the predict() function to generate predicted probabilities for a given lambda value by setting s = lambda.min and type = "response".
I'm creating several different kinds of SDMs and had been using dismo::evaluate() to generate fit statistics (based on a testing dataset) and thresholds to convert probabilities to binary values. However, when I run dismo::evaluate() with a cv.glmnet (or glmnet) model, I get the following error:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'as.matrix': not-yet-implemented method for <data.frame> %*%
This is confusing to me as I think the x argument in evaluate() isn't needed when I'm providing a matrix with predictor values at presence locations (p) and another matrix with values at absence locations (a). I'm wondering whether evaluate() doesn't work with these types of models? Thanks, and apologies if I've missed something obvious!
After spending more time on this, I don't think dismo::evaluate() works with glmnet objects when supplying "p" and "a" as matrices of predictor values. dismo::evaluate() converts them to data.frames before calling the predict() function. To solve my problem, I was able to create a new function based on dismo::evaluate() that supplies p or a as a matrix to the predict() function.

GLMNet convergence issue for penalized regression

I am working on network models for political networks. One of the things I am doing is penalized inference. I am using an adaptive lasso approach by setting a penalty factor for glmnet. I have various parameters in my model: alphas and phis. The alphas are fixed effects so I want to keep them in the model while the phis are being penalized.
I have starting coefficients from the MLE estimation process of glm() to compute the adaptive weights that are set through the penalty factor of glmnet().
This is the code:
# Generate Generalized Linear Model
GenLinMod = glm(y ~ X, family = "poisson")
# Set coefficients
coefficients = coef(GenLinMod)
# Set penalty
penalty = 1/(coefficients[-1])^2
# Protect alphas
penalty[1:(n-1)] = 0
# Generate Generalized Linear Model with adaptive lasso procedure
GenLinModNet = glmnet(XS, y, family = "poisson", penalty.factor = penalty, standardize = FALSE)
For some networks this code executes just fine, however I have certain networks for which I get these errors:
Error: Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
In addition: Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda value not reached after maxit=100000 iterations; solutions for larger lambdas returned
2: In getcoef(fit, nvars, nx, vnames) :
an empty model has been returned; probably a convergence issue
The odd thing is that they all use the same code, so I am wondering if it is a data problem.
Additional information:
+In one case I have over 500 alphas and 21 phis and these errors appear, in another case that does not work I have 200 alphas and 28 phis. But on the other hand I have a case with over 600 alphas and 28 phis and it converges nicely.
+I have tried settings for lambda.min.ratio and nlambda to no avail.
Additional question: Is the first entry of penalty the one associated with the intercept? Or is it added automatically by glmnet()? I did not find clarity about this in the glmnet vignette. My thoughts are that I shouldn't include a term for the intercept, since it's said that the penalty is internally rescaled to sum nvars and I assume the intercept isn't one of my variables.
I'm not 100% sure about this, but I think I have found the root of the problem.
I've tried to use all kinds of manual lambda sequences, even trying very large starting lambda's (1000's). This all seemed to do no good at all. However, when I tried without penalizing the alpha's, everything would converge nicely. So it probably has something to do with the amount of unpenalized variables. Maybe keeping all alpha's unpenalized forces glmnet in some divergent state. Maybe there is some sort of collinearity going on. My "solution", which is basically just doing something else, is to penalize the alpha's with the same weigth that is used for one of the phi's. This works on the assumption that some phi's are significant and the alpha's can be just as significant, instead of being fixed (which makes them infinitely significant). I'm not completely satisfied, because this is just a different approach, but it might be interesting to note that it probably has something to do with the amount of unpenalized variables.
Also, to answer my additional question: In the glmnet vignette it says that the penalty term is internally rescaled to sum to nvars. Since the intercept is not one of the variables, my guess is that it is not needed in the penalty term. Though, I've tried with both including and excluding the term, results seem to be the same. So maybe glmnet automatically removes it if it detects that the length is +1 of what it should be.

nls error during the parameter estimation of power law with exponential cutoff distribution in R

I want to fit mydata with several known distributions, power law with exponential cutoff distribution is one of the candidates.
fitdistr function in package fitdistrplus is one of good methods to use for the parameter estimation using MLE, or MME, or QME.
But power law with exponential cutoff is not the base probability function according to CRAN Task View: Probability Distributions , so I try the nls function.
The pdf of power law with exponential cutoff is f(x;α,λ)=C*x^(−α)*exp(−λ*x)
First, I generate some random values to replace my real data:
data <- rlnorm(1000,0.6,1.23)
h <- hist(data,breaks=1000,plot=FALSE)
x <- h$mids
y <- h$density
Then, I use nls function to conduct parameter estimation:
nls(y~c*x^(-a)*exp(-b*x),start=list(a=1,b=1,c=1))
But it does not work and always throws one of these two errors:
Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model
Or: singular gradient matrix at initial parameter estimates
Before posting, I have read almost all the previous posts and google, there are several reasons for the errors:
bad start values for the nls. I tried a lot, but it does not work.
some negative values or values less than 1 or values equal to Inf may be generated. I tried to do the data cleaning, also, it does not work.
What should I do now? Or are there some other better methods to do the parameter estimation of power law with exponential cutoff? I need your help, thank you!

Lack of convergence of glmnet when lambda=0 for family="poisson"

While getting a handle on glmnet versus glm, I ran into convergence problems for lambda=0 and family="poisson". My understanding is that with lambda=0 (and alpha=1, the default), the answers should be essentially the same.
Below is code changed slightly from the poisson example on the glmnet help page (?glmnet). The only change is that nzc = p so that all variables are in the true model
N=1000; p=50
nzc=p
x=matrix(rnorm(N*p),N,p)
beta=rnorm(nzc)
f = x[,seq(nzc)]%*%beta
mu=exp(f)
y=rpois(N,mu)
#With lambda=0 glmnet throws the convergence error shown below
fit=glmnet(x,y,family="poisson",lambda=0)
#It works with default lambda passed in
# but estimates are quite different from glm.
fit=glmnet(x,y,family="poisson") #use default lambdas
fit2=glm(y~x,family="poisson")
plot(coef(fit2)[2:(p+1)],
coef(fit,s=min(fit$lambda))[2:(p+1)],
xlab="glm",ylab="glmnet")
abline(0,1)
#works fine with gaussian response and lambda=0 or default lambda
#glm and glmnet identical
mu = f
y=rnorm(N,mu)
fit=glmnet(x,y,family="gaussian",lambda=0)
fit2=glm(y~x)
plot(coef(fit2)[2:(p+1)], coef(fit)[2:(p+1)])
abline(0,1)
Here's the error message
Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda value not reached after maxit=100000 iterations; solutions for larger lambdas returned
2: In getcoef(fit, nvars, nx, vnames) :an empty model has been returned; probably a convergence issue
Updated:
The problem seems to be with the intercept being estimated by glmnet when family="poisson" and not related to the setting of lambda per se.
fit=glmnet(x,y,family="poisson")
#intercept should be close to 0
coef(fit)[1,]
#but it is huge
#passing in intercept=FALSE however generates the convergence error again
fit=glmnet(x,y,family="poisson", intercept=FALSE)
I think you are confused about lambda and alpha. alpha is the penalization factor which is set to 0 will give you ridge regression. Typically it is set to something between 0.1 and 1. lambda is typically not set, and there is a warning on the help page NOT to set it to a single value:
WARNING: use with care. Do not supply a single value for lambda
I don't know why you think a lasso penalty should be the same as an unpenalized Poisson model. The whole point of a penalized model is to be less subject to the biases and constraints of an ordinary regression model.
You get the error because you try to pass lambda = 0 to glmnet.
If you want to select the coefficients from glmnet for lambda = 0, you could use:
coef(fit, s=0)
This automatically selects the last (smallest) value of lambda. I guess you've basically done that already though, with s = min(fit$lambda). If you want to go even smaller than that you might have to manually put in a lambda sequence, but this is a little bit tricky (glmnet seems a little bit stubborn about its lambda's).
Also keep in mind that there might be some bias in glmnet, so it could be slightly different from the results of glm.

Which function/package for robust linear regression works with glmulti (i.e., behaves like glm)?

Background: Multi-model inference with glmulti
glmulti is a R function/package for automated model selection for general linear models that constructs all possible general linear models given a dependent variable and a set of predictors, fits them via the classic glm function and allows then for multi-model inference (e.g., using model weights derived from AICc, BIC). glmulti works in theory also with any other function that returns coefficients, the log-likelihood of the model and the number of free parameters (and maybe other information?) in the same format that glm does.
My goal: Multi-model inference with robust errors
I would like to use glmulti with robust modeling of the errors of a quantitative dependent variable to guard against the effect out outliers.
For example, I could assume that the errors in the linear model are distributed as a t distribution instead of as a normal distribution. With its kurtosis parameter the t distribution can have heavy tails and is thus more robust to outliers (as compared to the normal distribution).
However, I'm not committed to using the t distribution approach. I'm happy with any approach that gives back a log-likelihood and thus works with the multimodel approach in glmulti. But that means, that unfortunately I cannot use the well-known robust linear models in R (e.g., lmRob from robust or lmrob from robustbase) because they do not operate under the log-likelihood framework and thus cannot work with glmulti.
The problem: I can't find a robust regression function that works with glmulti
The only robust linear regression function for R I found that operates under the log-likelihood framework is heavyLm (from the heavy package); it models the errors with a t distribution. Unfortunately, heavyLm does not work with glmulti (at least not out of the box) because it has no S3 method for loglik (and possibly other things).
To illustrate:
library(glmulti)
library(heavy)
Using the dataset stackloss
head(stackloss)
Regular Gaussian linear model:
summary(glm(stack.loss ~ ., data = stackloss))
Multi-model inference with glmulti using glm's default Gaussian link function
stackloss.glmulti <- glmulti(stack.loss ~ ., data = stackloss, level=1, crit=bic)
print(stackloss.glmulti)
plot(stackloss.glmulti)
Linear model with t distributed error (default is df=4)
summary(heavyLm(stack.loss ~ ., data = stackloss))
Multi-model inference with glmulti calling heavyLm as the fitting function
stackloss.heavyLm.glmulti <- glmulti(stack.loss ~ .,
data = stackloss, level=1, crit=bic, fitfunction=heavyLm)
gives the following error:
Initialization...
Error in UseMethod("logLik") :
no applicable method for 'logLik' applied to an object of class "heavyLm".
If I define the following function,
logLik.heavyLm <- function(x){x$logLik}
glmulti can get the log-likelihood, but then the next error occurs:
Initialization...
Error in .jcall(molly, "V", "supplyErrorDF",
as.integer(attr(logLik(fitfunc(as.formula(paste(y, :
method supplyErrorDF with signature ([I)V not found
The question: Which function/package for robust linear regression works with glmulti (i.e., behaves like glm)?
There is probably a way to define further functions to get heavyLm working with glmulti, but before embarking on this journey I wanted to ask whether anybody
knows of a robust linear regression function that (a) operates under the log-likelihood framework and (b) behaves like glm (and will thus work with glmulti out-of-the-box).
got heavyLm already working with glmulti.
Any help is very much appreciated!
Here is an answer using heavyLm. Even though this is a relatively old question, the same problem that you mentioned still occurs when using heavyLm (i.e., the error message Error in .jcall(molly, "V", "supplyErrorDF"…).
The problem is that glmulti requires the degrees of freedom of the model, to be passed as an attribute of you need to provide as an attribute of the value returned by function logLik.heavyLm; see the documentation for the function logLik for details. Moreover, it turns out that you also need to provide a function to return the number of data points that were used for fitting the model, since the information criteria (AIC, BIC, …) depend on this value too. This is done by function nobs.heavyLm in the code below.
Here is the code:
nobs.heavyLm <- function(mdl) mdl$dims[1] # the sample size (number of data points)
logLik.heavyLm <- function(mdl) {
res <- mdl$logLik
attr(res, "nobs") <- nobs.heavyLm(mdl) # this is not really needed for 'glmulti', but is included to adhere to the format of 'logLik'
attr(res, "df") <- length(mdl$coefficients) + 1 + 1 # I am also considering the scale parameter that is estimated; see mdl$family
class(res) <- "logLik"
res
}
which, when put together with the code that you provided, produces the following result:
Initialization...
TASK: Exhaustive screening of candidate set.
Fitting...
Completed.
> print(stackloss.glmulti)
glmulti.analysis
Method: h / Fitting: glm / IC used: bic
Level: 1 / Marginality: FALSE
From 8 models:
Best IC: 117.892471265874
Best model:
[1] "stack.loss ~ 1 + Air.Flow + Water.Temp"
Evidence weight: 0.709174196998897
Worst IC: 162.083142797858
2 models within 2 IC units.
1 models to reach 95% of evidence weight.
producing therefore 2 models within the 2 BIC units threshold.
An important remark though: I am not sure that the expression above for the degrees of freedom is strictly correct. For a standard linear model, the degrees of freedom would be equal to p + 1, where p is the number of parameters in the model, and the extra parameter (the + 1) is the "error" variance (which is used to calculate the likelihood). In function logLik.heavyLm above, it is not clear to me whether one should also count the "scale parameter" that is estimated by heavyLm as an extra degree of freedom, and hence the p + 1 + 1, which would be the case if the likelihood is also a function of this parameter. Unfortunately, I cannot confirm this, since I don’t have access to the reference that heavyLm cites (the paper by Dempster et al., 1980). Because of this, I am counting the scale parameter, thereby providing a (slightly more) conservative estimate of model complexity, penalizing "complex" models. This difference should be negligible, except in the small sample case.

Resources