GLMNet convergence issue for penalized regression - r

I am working on network models for political networks. One of the things I am doing is penalized inference. I am using an adaptive lasso approach by setting a penalty factor for glmnet. I have various parameters in my model: alphas and phis. The alphas are fixed effects so I want to keep them in the model while the phis are being penalized.
I have starting coefficients from the MLE estimation process of glm() to compute the adaptive weights that are set through the penalty factor of glmnet().
This is the code:
# Generate Generalized Linear Model
GenLinMod = glm(y ~ X, family = "poisson")
# Set coefficients
coefficients = coef(GenLinMod)
# Set penalty
penalty = 1/(coefficients[-1])^2
# Protect alphas
penalty[1:(n-1)] = 0
# Generate Generalized Linear Model with adaptive lasso procedure
GenLinModNet = glmnet(XS, y, family = "poisson", penalty.factor = penalty, standardize = FALSE)
For some networks this code executes just fine, however I have certain networks for which I get these errors:
Error: Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
In addition: Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda value not reached after maxit=100000 iterations; solutions for larger lambdas returned
2: In getcoef(fit, nvars, nx, vnames) :
an empty model has been returned; probably a convergence issue
The odd thing is that they all use the same code, so I am wondering if it is a data problem.
Additional information:
+In one case I have over 500 alphas and 21 phis and these errors appear, in another case that does not work I have 200 alphas and 28 phis. But on the other hand I have a case with over 600 alphas and 28 phis and it converges nicely.
+I have tried settings for lambda.min.ratio and nlambda to no avail.
Additional question: Is the first entry of penalty the one associated with the intercept? Or is it added automatically by glmnet()? I did not find clarity about this in the glmnet vignette. My thoughts are that I shouldn't include a term for the intercept, since it's said that the penalty is internally rescaled to sum nvars and I assume the intercept isn't one of my variables.

I'm not 100% sure about this, but I think I have found the root of the problem.
I've tried to use all kinds of manual lambda sequences, even trying very large starting lambda's (1000's). This all seemed to do no good at all. However, when I tried without penalizing the alpha's, everything would converge nicely. So it probably has something to do with the amount of unpenalized variables. Maybe keeping all alpha's unpenalized forces glmnet in some divergent state. Maybe there is some sort of collinearity going on. My "solution", which is basically just doing something else, is to penalize the alpha's with the same weigth that is used for one of the phi's. This works on the assumption that some phi's are significant and the alpha's can be just as significant, instead of being fixed (which makes them infinitely significant). I'm not completely satisfied, because this is just a different approach, but it might be interesting to note that it probably has something to do with the amount of unpenalized variables.
Also, to answer my additional question: In the glmnet vignette it says that the penalty term is internally rescaled to sum to nvars. Since the intercept is not one of the variables, my guess is that it is not needed in the penalty term. Though, I've tried with both including and excluding the term, results seem to be the same. So maybe glmnet automatically removes it if it detects that the length is +1 of what it should be.

Related

How does fixest handle negative values of the demeaned dependent variable in poisson estimations?

I need to perform glm (poisson) estimations with fixed-effects (say merely unit FE) and several regressors (RHS variables). I have an unbalanced panel dataset where most (~90%) observations have missing values (NA) for some but not all regressors.
fixest::feglm() can handle this and returns my fitted model.
However, to do so, it (and fixest::demean too) removes observations that have at least one regressor missing, before constructing the fixed-effect means.
In my case, I am afraid this implies not using a significant share of available information in the data.
Therefore, I would like to demean my variables by hand, to be able to include as much information as possible in each fixed-effect dimension's mean, and then run feglm on the demeaned data. However, this implies getting negative dependent variable values, which is not compatible with Poisson. If I run feglm with "poisson" family and my manually demeaned data, I (coherently) get: "Negative values of the dependent variable are not allowed for the "poisson" family.". The same error is returned with data demeaned with the fixest::demean function.
Question:
How does feglm handle negative values of the demeaned dependent variable? Is there a way (like some data transformation) to reproduce fepois on a fixed-effect in the formula with fepois on demeaned data and a no fixed-effect formula?
To use the example from fixest::demean documentation (with two-way fixed-effects):
data(trade)
base = trade
base$ln_dist = log(base$dist_km)
base$ln_euros = log(base$Euros)
# We center the two variables ln_dist and ln_euros
# on the factors Origin and Destination
X_demean = demean(X = base[, c("ln_dist", "ln_euros")],
fe = base[, c("Origin", "Destination")])
base[, c("ln_dist_dm", "ln_euros_dm")] = X_demean
and I would like to reproduce
est_fe = fepois(ln_euros ~ ln_dist | Origin + Destination, base)
with
est = fepois(ln_euros_dm ~ ln_dist_dm, base)
I think there are two main problems.
Modelling strategy
In general, it is important to be able to formally describe the estimated model.
In this case it wouldn't be possible to write down the model with a single equation, where the fixed-effects are estimated using all the data and other variables only on the non-missing observations. And if the model is not clear, then... maybe it's not a good model.
On the other hand, if your model is well defined, then removing random observations should not change the expectation of the coefficients, only their variance. So again, if your model is well specified, you shouldn't worry too much.
By suggesting that observations with missing values are relevant to estimate the fixed-effects coefficients (or stated differently, that they are used to demean some variables) you are implying that these observations are not randomly distributed. And now you should worry.
Just using these observations to demean the variables wouldn't remove the bias on the estimated coefficients due to the selection to non-missingness. That's a deeper problem that cannot be removed by technical tricks but rather by a profound understanding of the data.
GLM
There is a misunderstanding with GLM. GLM is a super smart trick to estimate maximum likelihood models with OLS (there's a nice description here). It was developed and used at a time when regular optimization techniques were very expensive in terms of computational time, and it was a way to instead employ well developed and fast OLS techniques to perform equivalent estimations.
GLM is an iterative process where typical OLS estimations are performed at each step, the only changes at each iteration concern the weights associated to each observation. Therefore, since it's a regular OLS process, techniques to perform fast OLS estimations with multiple fixed-effects can be leveraged (as is in the fixest package).
So actually, you could do what you want... but only within the OLS step of the GLM algorithm. By no means you should demean the data before running GLM because, well, it makes no sense (the FWL theorem has absolutely no hold here).

How to deal with spatially autocorrelated residuals in GLMM

I am conducting an analysis of where on the landscape a predator encounters potential prey. My response data is binary with an Encounter location = 1 and a Random location = 0 and my independent variables are continuous but have been rescaled.
I originally used a GLM structure
glm_global <- glm(Encounter ~ Dist_water_cs+coverMN_cs+I(coverMN_cs^2)+
Prey_bio_stand_cs+Prey_freq_stand_cs+Dist_centre_cs,
data=Data_scaled, family=binomial)
but realized that this failed to account for potential spatial-autocorrelation in the data (a spline correlogram showed high residual correlation up to ~1000m).
Correlog_glm_global <- spline.correlog (x = Data_scaled[, "Y"],
y = Data_scaled[, "X"],
z = residuals(glm_global,
type = "pearson"), xmax = 1000)
I attempted to account for this by implementing a GLMM (in lme4) with the predator group as the random effect.
glmm_global <- glmer(Encounter ~ Dist_water_cs+coverMN_cs+I(coverMN_cs^2)+
Prey_bio_stand_cs+Prey_freq_stand_cs+Dist_centre_cs+(1|Group),
data=Data_scaled, family=binomial)
When comparing AIC of the global GLMM (1144.7) to the global GLM (1149.2) I get a Delta AIC value >2 which suggests that the GLMM fits the data better. However I am still getting essentially the same correlation in the residuals, as shown on the spline correlogram for the GLMM model).
Correlog_glmm_global <- spline.correlog (x = Data_scaled[, "Y"],
y = Data_scaled[, "X"],
z = residuals(glmm_global,
type = "pearson"), xmax = 10000)
I also tried explicitly including the Lat*Long of all the locations as an independent variable but results are the same.
After reading up on options, I tried running Generalized Estimating Equations (GEEs) in “geepack” thinking this would allow me more flexibility with regards to explicitly defining the correlation structure (as in GLS models for normally distributed response data) instead of being limited to compound symmetry (which is what we get with GLMM). However I realized that my data still demanded the use of compound symmetry (or “exchangeable” in geepack) since I didn’t have temporal sequence in the data. When I ran the global model
gee_global <- geeglm(Encounter ~ Dist_water_cs+coverMN_cs+I(coverMN_cs^2)+
Prey_bio_stand_cs+Prey_freq_stand_cs+Dist_centre_cs,
id=Pride, corstr="exchangeable", data=Data_scaled, family=binomial)
(using scaled or unscaled data made no difference so this is with scaled data for consistency)
suddenly none of my covariates were significant. However, being a novice with GEE modelling I don’t know a) if this is a valid approach for this data or b) whether this has even accounted for the residual autocorrelation that has been evident throughout.
I would be most appreciative for some constructive feedback as to 1) which direction to go once I realized that the GLMM model (with predator group as a random effect) still showed spatially autocorrelated Pearson residuals (up to ~1000m), 2) if indeed GEE models make sense at this point and 3) if I have missed something in my GEE modelling. Many thanks.
Taking the spatial autocorrelation into account in your model can be done is many ways. I will restrain my response to R main packages that deal with random effects.
First, you could go with the package nlme, and specify a correlation structure in your residuals (many are available : corGaus, corLin, CorSpher ...). You should try many of them and keep the best model. In this case the spatial autocorrelation in considered as continous and could be approximated by a global function.
Second, you could go with the package mgcv, and add a bivariate spline (spatial coordinates) to your model. This way, you could capture a spatial pattern and even map it. In a strict sens, this method doesn't take into account the spatial autocorrelation, but it may solve the problem. If the space is discret in your case, you could go with a random markov field smooth. This website is very helpfull to find some examples : https://www.fromthebottomoftheheap.net
Third, you could go with the package brms. This allows you to specify very complex models with other correlation structure in your residuals (CAR and SAR). The package use a bayesian approach.
I hope this help. Good luck

maximum number of covariates in R glm package is 128?

I have a dataset with 146 covariates, and am training a logistic regression.
logit = glm(Y ~ .,
data = pred.dataset[1:1000,],
family = binomial)
The model trains very quickly, but when I then try to view the Beta's with
logit
After the 128th variable the Beta's are all "NA"
I noticed this when trying to export it as pmml and noticed it stopped listing Beta's after 128 predictors.
I've gone through the documentation and can't find a reference to a maximum number of covariates, and also trained on 60k rows - I still see NAs after the 128th predictor.
Is this a limitation of glm, or a limitation of my system? I am running R 3.1.2 64 bit. How can I increase the number of predictors?
This is a question I actually just asked on Stack Exchange, which is where this question should be. See this link:
https://stats.stackexchange.com/questions/159316/logistic-regression-in-r-with-many-predictors?noredirect=1#comment303422_159316 and the subsequent links included in the thread. To answer your question though, basically that is too many predictors for logistic regression, and OLS can be used in this case, and even though it does not yield the best results for a binary outcome, the results are still valid and can be used.
You didn't provide reproducible data, so it's hard to tell exactly what is going on--is there an issue with how some of the variables are coded? Are variables that seem uniform not uniform at all? These would be a couple of situations that could be ruled out with a reproducible code example.
However, I'm answering because I think you may have a legitimate concern. What can you say about these other variables? What type are they? I have been trying to run some logits that seem to be dropping factor levels over 48.
What worked for me (at least to get the model to run in full) was going into the glm() function and changing
mf$drop.unused.levels <- TRUE
to
mf$drop.unused.levels <- FALSE
then saving the function under a different name and using that to run my analyses. (I was inspired by this answer.)
Be warned, though! It gave me some warning messages:
Warning messages:
1: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
2: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
3: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
I know that there are frequency issues in certain groups in the data; I have to analyze these separately and I will do so. But for the time being, I have achieved the prediction of all levels that I wanted.
The first step would be to check your data, though. Part of why this happens with my data is almost certainly due to issues in the data itself, but this approach let me override it. This may or may not be an appropriate solution for you.

Lack of convergence of glmnet when lambda=0 for family="poisson"

While getting a handle on glmnet versus glm, I ran into convergence problems for lambda=0 and family="poisson". My understanding is that with lambda=0 (and alpha=1, the default), the answers should be essentially the same.
Below is code changed slightly from the poisson example on the glmnet help page (?glmnet). The only change is that nzc = p so that all variables are in the true model
N=1000; p=50
nzc=p
x=matrix(rnorm(N*p),N,p)
beta=rnorm(nzc)
f = x[,seq(nzc)]%*%beta
mu=exp(f)
y=rpois(N,mu)
#With lambda=0 glmnet throws the convergence error shown below
fit=glmnet(x,y,family="poisson",lambda=0)
#It works with default lambda passed in
# but estimates are quite different from glm.
fit=glmnet(x,y,family="poisson") #use default lambdas
fit2=glm(y~x,family="poisson")
plot(coef(fit2)[2:(p+1)],
coef(fit,s=min(fit$lambda))[2:(p+1)],
xlab="glm",ylab="glmnet")
abline(0,1)
#works fine with gaussian response and lambda=0 or default lambda
#glm and glmnet identical
mu = f
y=rnorm(N,mu)
fit=glmnet(x,y,family="gaussian",lambda=0)
fit2=glm(y~x)
plot(coef(fit2)[2:(p+1)], coef(fit)[2:(p+1)])
abline(0,1)
Here's the error message
Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda value not reached after maxit=100000 iterations; solutions for larger lambdas returned
2: In getcoef(fit, nvars, nx, vnames) :an empty model has been returned; probably a convergence issue
Updated:
The problem seems to be with the intercept being estimated by glmnet when family="poisson" and not related to the setting of lambda per se.
fit=glmnet(x,y,family="poisson")
#intercept should be close to 0
coef(fit)[1,]
#but it is huge
#passing in intercept=FALSE however generates the convergence error again
fit=glmnet(x,y,family="poisson", intercept=FALSE)
I think you are confused about lambda and alpha. alpha is the penalization factor which is set to 0 will give you ridge regression. Typically it is set to something between 0.1 and 1. lambda is typically not set, and there is a warning on the help page NOT to set it to a single value:
WARNING: use with care. Do not supply a single value for lambda
I don't know why you think a lasso penalty should be the same as an unpenalized Poisson model. The whole point of a penalized model is to be less subject to the biases and constraints of an ordinary regression model.
You get the error because you try to pass lambda = 0 to glmnet.
If you want to select the coefficients from glmnet for lambda = 0, you could use:
coef(fit, s=0)
This automatically selects the last (smallest) value of lambda. I guess you've basically done that already though, with s = min(fit$lambda). If you want to go even smaller than that you might have to manually put in a lambda sequence, but this is a little bit tricky (glmnet seems a little bit stubborn about its lambda's).
Also keep in mind that there might be some bias in glmnet, so it could be slightly different from the results of glm.

Which function/package for robust linear regression works with glmulti (i.e., behaves like glm)?

Background: Multi-model inference with glmulti
glmulti is a R function/package for automated model selection for general linear models that constructs all possible general linear models given a dependent variable and a set of predictors, fits them via the classic glm function and allows then for multi-model inference (e.g., using model weights derived from AICc, BIC). glmulti works in theory also with any other function that returns coefficients, the log-likelihood of the model and the number of free parameters (and maybe other information?) in the same format that glm does.
My goal: Multi-model inference with robust errors
I would like to use glmulti with robust modeling of the errors of a quantitative dependent variable to guard against the effect out outliers.
For example, I could assume that the errors in the linear model are distributed as a t distribution instead of as a normal distribution. With its kurtosis parameter the t distribution can have heavy tails and is thus more robust to outliers (as compared to the normal distribution).
However, I'm not committed to using the t distribution approach. I'm happy with any approach that gives back a log-likelihood and thus works with the multimodel approach in glmulti. But that means, that unfortunately I cannot use the well-known robust linear models in R (e.g., lmRob from robust or lmrob from robustbase) because they do not operate under the log-likelihood framework and thus cannot work with glmulti.
The problem: I can't find a robust regression function that works with glmulti
The only robust linear regression function for R I found that operates under the log-likelihood framework is heavyLm (from the heavy package); it models the errors with a t distribution. Unfortunately, heavyLm does not work with glmulti (at least not out of the box) because it has no S3 method for loglik (and possibly other things).
To illustrate:
library(glmulti)
library(heavy)
Using the dataset stackloss
head(stackloss)
Regular Gaussian linear model:
summary(glm(stack.loss ~ ., data = stackloss))
Multi-model inference with glmulti using glm's default Gaussian link function
stackloss.glmulti <- glmulti(stack.loss ~ ., data = stackloss, level=1, crit=bic)
print(stackloss.glmulti)
plot(stackloss.glmulti)
Linear model with t distributed error (default is df=4)
summary(heavyLm(stack.loss ~ ., data = stackloss))
Multi-model inference with glmulti calling heavyLm as the fitting function
stackloss.heavyLm.glmulti <- glmulti(stack.loss ~ .,
data = stackloss, level=1, crit=bic, fitfunction=heavyLm)
gives the following error:
Initialization...
Error in UseMethod("logLik") :
no applicable method for 'logLik' applied to an object of class "heavyLm".
If I define the following function,
logLik.heavyLm <- function(x){x$logLik}
glmulti can get the log-likelihood, but then the next error occurs:
Initialization...
Error in .jcall(molly, "V", "supplyErrorDF",
as.integer(attr(logLik(fitfunc(as.formula(paste(y, :
method supplyErrorDF with signature ([I)V not found
The question: Which function/package for robust linear regression works with glmulti (i.e., behaves like glm)?
There is probably a way to define further functions to get heavyLm working with glmulti, but before embarking on this journey I wanted to ask whether anybody
knows of a robust linear regression function that (a) operates under the log-likelihood framework and (b) behaves like glm (and will thus work with glmulti out-of-the-box).
got heavyLm already working with glmulti.
Any help is very much appreciated!
Here is an answer using heavyLm. Even though this is a relatively old question, the same problem that you mentioned still occurs when using heavyLm (i.e., the error message Error in .jcall(molly, "V", "supplyErrorDF"…).
The problem is that glmulti requires the degrees of freedom of the model, to be passed as an attribute of you need to provide as an attribute of the value returned by function logLik.heavyLm; see the documentation for the function logLik for details. Moreover, it turns out that you also need to provide a function to return the number of data points that were used for fitting the model, since the information criteria (AIC, BIC, …) depend on this value too. This is done by function nobs.heavyLm in the code below.
Here is the code:
nobs.heavyLm <- function(mdl) mdl$dims[1] # the sample size (number of data points)
logLik.heavyLm <- function(mdl) {
res <- mdl$logLik
attr(res, "nobs") <- nobs.heavyLm(mdl) # this is not really needed for 'glmulti', but is included to adhere to the format of 'logLik'
attr(res, "df") <- length(mdl$coefficients) + 1 + 1 # I am also considering the scale parameter that is estimated; see mdl$family
class(res) <- "logLik"
res
}
which, when put together with the code that you provided, produces the following result:
Initialization...
TASK: Exhaustive screening of candidate set.
Fitting...
Completed.
> print(stackloss.glmulti)
glmulti.analysis
Method: h / Fitting: glm / IC used: bic
Level: 1 / Marginality: FALSE
From 8 models:
Best IC: 117.892471265874
Best model:
[1] "stack.loss ~ 1 + Air.Flow + Water.Temp"
Evidence weight: 0.709174196998897
Worst IC: 162.083142797858
2 models within 2 IC units.
1 models to reach 95% of evidence weight.
producing therefore 2 models within the 2 BIC units threshold.
An important remark though: I am not sure that the expression above for the degrees of freedom is strictly correct. For a standard linear model, the degrees of freedom would be equal to p + 1, where p is the number of parameters in the model, and the extra parameter (the + 1) is the "error" variance (which is used to calculate the likelihood). In function logLik.heavyLm above, it is not clear to me whether one should also count the "scale parameter" that is estimated by heavyLm as an extra degree of freedom, and hence the p + 1 + 1, which would be the case if the likelihood is also a function of this parameter. Unfortunately, I cannot confirm this, since I don’t have access to the reference that heavyLm cites (the paper by Dempster et al., 1980). Because of this, I am counting the scale parameter, thereby providing a (slightly more) conservative estimate of model complexity, penalizing "complex" models. This difference should be negligible, except in the small sample case.

Resources