Ordinary least squares with glmnet and lm - r

This question was asked in stackoverflow.com/q/38378118 but there was no satisfactory answer.
LASSO with λ = 0 is equivalent to ordinary least squares, but this does not seem to be the case for glmnet() and lm() in R. Why?
library(glmnet)
options(scipen = 999)
X = model.matrix(mpg ~ 0 + ., data = mtcars)
y = as.matrix(mtcars["mpg"])
coef(glmnet(X, y, lambda = 0))
lm(y ~ X)
Their regression coefficients agree by at most 2 significant figures, perhaps due to slightly different termination conditions of their optimization algorithms:
glmnet lm
(Intercept) 12.19850081 12.30337
cyl -0.09882217 -0.11144
disp 0.01307841 0.01334
hp -0.02142912 -0.02148
drat 0.79812453 0.78711
wt -3.68926778 -3.71530
qsec 0.81769993 0.82104
vs 0.32109677 0.31776
am 2.51824708 2.52023
gear 0.66755681 0.65541
carb -0.21040602 -0.19942
The difference is much worse when we add interaction terms.
X = model.matrix(mpg ~ 0 + . + . * disp, data = mtcars)
y = as.matrix(mtcars["mpg"])
coef(glmnet(X, y, lambda = 0))
lm(y ~ X)
Regression coefficients:
glmnet lm
(Intercept) 36.2518682237 139.9814651
cyl -11.9551206007 -26.0246050
disp -0.2871942149 -0.9463428
hp -0.1974440651 -0.2620506
drat -4.0209186383 -10.2504428
wt 1.3612184380 5.4853015
qsec 2.3549189212 1.7690334
vs -25.7384282290 -47.5193122
am -31.2845893123 -47.4801206
gear 21.1818220135 27.3869365
carb 4.3160891408 7.3669904
cyl:disp 0.0980253873 0.1907523
disp:hp 0.0006066105 0.0006556
disp:drat 0.0040336452 0.0321768
disp:wt -0.0074546428 -0.0228644
disp:qsec -0.0077317305 -0.0023756
disp:vs 0.2033046078 0.3636240
disp:am 0.2474491353 0.3762699
disp:gear -0.1361486900 -0.1963693
disp:carb -0.0156863933 -0.0188304

If you check out these two posts, you will get a sense as to why you are not getting the same results.
In essence, glmnet penalized maximum likelihood using a regularization path to estimate the model. lm solves the least squares problem using QR decomposition. So the estimates will never be exactly the same.
However, note in the manual for ?glmnet under "lambda":
WARNING: use with care. Do not supply a single value for lambda (for
predictions after CV use predict() instead). Supply instead a
decreasing sequence of lambda values. glmnet relies on its warms
starts for speed, and its often faster to fit a whole path than
compute a single fit.
You can do (at least) three things to get the coefficients closer so the difference is trivial--(1) have a range of values for lambda, (2) decrease the threshold value thres, and (3) increase the max number of iterations.
library(glmnet)
options(scipen = 999)
X = model.matrix(mpg ~ 0 + ., data = mtcars)
y = as.matrix(mtcars["mpg"])
lfit <- glmnet(X, y, lambda = rev(0:99), thres = 1E-10)
lmfit <- lm(y ~ X)
coef(lfit, s = 0) - coef(lmfit)
11 x 1 Matrix of class "dgeMatrix"
1
(Intercept) 0.004293053125
cyl -0.000361655351
disp -0.000002631747
hp 0.000006447138
drat -0.000065394578
wt 0.000180943607
qsec -0.000079480187
vs -0.000462099248
am -0.000248796353
gear -0.000222035415
carb -0.000071164178
X = model.matrix(mpg ~ 0 + . + . * disp, data = mtcars)
y = as.matrix(mtcars["mpg"])
lfit <- glmnet(X, y, lambda = rev(0:99), thres = 1E-12, maxit = 10^7)
lmfit <- glm(y ~ X)
coef(lfit, s = 0) - coef(lmfit)
20 x 1 Matrix of class "dgeMatrix"
1
(Intercept) -0.3174019115228
cyl 0.0414909318817
disp 0.0020032493403
hp 0.0001834076765
drat 0.0188376047769
wt -0.0120601219002
qsec 0.0019991131315
vs 0.0636756040430
am 0.0439343002375
gear -0.0161102501755
carb -0.0088921918062
cyl:disp -0.0002714213271
disp:hp -0.0000001211365
disp:drat -0.0000859742667
disp:wt 0.0000462418947
disp:qsec -0.0000175276420
disp:vs -0.0004657059892
disp:am -0.0003517289096
disp:gear 0.0001629963377
disp:carb 0.0000085312911
Some of the differences for the interacted model are probably non-trivial, but closer.

Related

feols with step-wise fixed effects not working with custom formula

I need to feed a custom formula to feols and have it estimate multiple models (various fixed effects), but this seems to break the sw() function. Compare:
library(fixest)
feols(mpg ~ disp | sw(gear, gear + carb), data = mtcars)
(That works.) This does not work:
feols(as.formula("mpg ~ disp") | sw(gear, gear + carb), data = mtcars)
Why? And how can I get that structure of formula input to work?
We may need paste to paste the strings togeher
library(fixest)
feols(as.formula(paste("mpg ~ disp", "sw(gear, gear + carb)",
sep = "|")), data = mtcars)
-output
Standard-errors: Clustered (gear)
Fixed-effects: gear
Estimate Std. Error t value Pr(>|t|)
disp -0.040774 0.015421 -2.64404 0.11821
---
Fixed-effects: gear + carb
Estimate Std. Error t value Pr(>|t|)
disp -0.018388 0.016009 -1.14861 0.36955
which gives same as
> feols(mpg ~ disp | sw(gear, gear + carb), data = mtcars)
Standard-errors: Clustered (gear)
Fixed-effects: gear
Estimate Std. Error t value Pr(>|t|)
disp -0.040774 0.015421 -2.64404 0.11821
---
Fixed-effects: gear + carb
Estimate Std. Error t value Pr(>|t|)
disp -0.018388 0.016009 -1.14861 0.36955
NOTE: paste is more efficient compared to reformulate. If it is a single expression, formula can be used an expression i.e.
fmla <- mpg ~ disp | sw(gear, gear + carb)
> feols(fmla, data = mtcars)
Standard-errors: Clustered (gear)
Fixed-effects: gear
Estimate Std. Error t value Pr(>|t|)
disp -0.040774 0.015421 -2.64404 0.11821
---
Fixed-effects: gear + carb
Estimate Std. Error t value Pr(>|t|)
disp -0.018388 0.016009 -1.14861 0.36955
In general, I marginally prefer reformulate() to as.formula(paste()) syntax.
fml <- reformulate(
"disp | sw(gear, gear + carb)",
response = "mpg"
)
feols(fml, data = mtcars)
From the reformulate docs:
reformulate creates a formula from a character vector.
Although in this particular case the issue is simply that you cannot mix as.formula() into the same function call as a formula object. You could do this without paste() simply by including the whole term in as.formula():
feols(as.formula("mpg ~ disp | sw(gear, gear + carb)"), data = mtcars)
Output for both is:
Standard-errors: Clustered (gear)
Fixed-effects: gear
Estimate Std. Error t value Pr(>|t|)
disp -0.040774 0.015421 -2.64404 0.11821
---
Fixed-effects: gear + carb
Estimate Std. Error t value Pr(>|t|)
disp -0.018388 0.016009 -1.14861 0.36955

AIC/AICc/BIC Formula in R for GLM

I'm trying to check that I understand how R calculates the statistic AIC, AICc (corrected AIC) and BIC for a glm() model object (so that I can perform the same calculations on revoScaleR::rxGlm() objects - particularly the AICc, which isn't available by default)
I had understood that these were defined as follows:
let p = number of model parameters
let n = number of data points
AIC = deviance + 2p
AICc = AIC + (2p^2 + 2p)/(n-p-1)
BIC = deviance + 2p.log(n)
So I tried to replicate these numbers and compare them to the corresponding R function calls. It didn't work:
library(AICcmodavg) # for the AICc() function
data(mtcars)
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
summary(glm_a1)
n <- nrow(glm_a1$data) # 32
p <- glm_a1$rank # 11
dev <- glm_a1$deviance# 147.49
my_AIC <- dev + 2 * p
my_AICc <- my_AIC + (2 * p^2 + 2 * p)/(n - p - 1)
my_BIC <- dev + 2 * p * log(n)
AIC(glm_a1) # 163.71
my_AIC # 169.49
AICc(glm_a1) # 180.13 (from AICcmodavg package)
my_AICc # 182.69
BIC(glm_a1) # 181.30
my_BIC # 223.74
By using debug(AIC) I can see that the calculation is different. It's based on 12 parameters (one extra for the estimated dispersion/scale parameter?). Also the log likelihood is obtained using logLik() which brings back a number -69.85, which suggests to me that the model deviance would be -2*-69.85 = 139.71 (which it isn't).
Does anyone know what I've done wrong please?
Thank you.
in the extractAIC manual page
Where :
L is the likelihood and edf the equivalent degrees of freedom (i.e., the number of parameters for usual parametric models) of fit.
For generalized linear models (i.e., for lm, aov, and glm), -2log L is the deviance, as computed by deviance(fit).
k = 2 corresponds to the traditional AIC, using k = log(n) provides the BIC (Bayes IC) instead.
Thus
Edits following discussion in the comments and input of #user20650
glm_a1$ranks returns the number of fitted parameter without accounting for the fitted variance used in gaussian families.
?glm states
deviance: up to a constant, minus twice the maximized log-likelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero.
that's why -2*logLik(glm_a1) - deviance(glm_a1) = 7.78 > 0
summary(glm_a1) returns the following line Dispersion parameter for gaussian family taken to be 7.023544 approximately the difference between -2 log likelihood and the deviance.
library(AICcmodavg)
#> Warning: package 'AICcmodavg' was built under R version 3.6.2
#> Warning: no function found corresponding to methods exports from 'raster' for:
#> 'wkt'
data(mtcars)
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
#> Deviance = 147.4944 Iterations - 1
#> Deviance = 147.4944 Iterations - 2
(loglik <- logLik(glm_a1))
#> 'log Lik.' -69.85491 (df=12)
# thus the degrees of freedom r uses are 12 instead of 11
n <- attributes(loglik)$nobs # following user20650 recommendation
p <- attributes(loglik)$df # following user20650 recommendation
dev <- -2*as.numeric(loglik)
my_AIC <- dev + 2 * p
my_AICc <- my_AIC + (2 * p^2 + 2 * p)/(n - p - 1)
my_BIC <- dev + p * log(n)
BIC(glm_a1)
#> [1] 181.2986
my_BIC
#> [1] 181.2986
AIC(glm_a1)
#> [1] 163.7098
my_AIC
#> [1] 163.7098
AICc(glm_a1)
#> [1] 180.1309
my_AICc
#> [1] 180.1309
Function to calculate these quantities for an rxGlm() object consistent with treatment of glm() (adjusting for the "up to a constant" difference in deviance):
wrc_information_criteria <- function(rx_glm) # an object created by rxGlm()
{
# add 1 to parameter count for cases where the GLM scale parameter needs to be estimated (notably Gamma/gaussian)
extra_parameter_flag <- case_when(
rx_glm$family$family == "gaussian" ~ 1,
rx_glm$family$family == "Gamma" ~ 1,
rx_glm$family$family == "poisson" ~ 0,
rx_glm$family$family == "binomial" ~ 0,
TRUE ~ 999999999
)
n <- rx_glm$nValidObs
p <- rx_glm$rank + extra_parameter_flag
dev <- rx_glm$deviance
cat("\n")
cat("n :", n, "\n")
cat("p :", p, "\n")
cat("deviance:", dev, "\n")
AIC <- dev + 2 * p
AICc <- AIC + (2 * p^2 + 2 * p)/(n - p - 1)
BIC <- dev + p * log(n)
# make a constant adjustment to AIC/AICc/BIC to give consistency with R's built in AIC/BIC functions applied to glm objects
# can do this because rxGlm() supplies AIC already (consistent with R/glm()) - as long as computeAIC = TRUE in the function call
deviance_constant_adjustment <- rx_glm$aic[1] - AIC
AIC <- AIC + deviance_constant_adjustment
AICc <- AICc + deviance_constant_adjustment
BIC <- BIC + deviance_constant_adjustment
cat("\n")
cat("AIC: ", AIC , "\n")
cat("AICc:", AICc, "\n")
cat("BIC: ", BIC , "\n")
}
Let's test it...
data(mtcars)
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
glm_b1 <- rxGlm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
verbose = 1,
computeAIC = TRUE)
AIC(glm_a1)
AICc(glm_a1)
BIC(glm_a1)
wrc_information_criteria(glm_b1) # gives same results for glm_b1 as I got for glm_a1

map() model output to a dataframe

I've been using map() to calculate and extract certain statistics from multiple lm() models.
To give a reproducible example, using the mtcars dataset, I start with an input vector of formulae to be estimated using lm() models:
library(tidyverse)
df <- mtcars
input_char <- c("mpg ~ disp",
"mpg ~ disp + hp")
input_formula <- map(input_char, formula)
I've then got a function that calculates and extracts the relevant statistics for each model. For simplicity and reproducibility, here's a simplified function that just extracts the R-squared of the model.
get_rsquared <- function(a_formula) {
model1 <- lm(a_formula, data = df)
rsquared <- summary(model1)$r.squared
c(model = a_formula, rsquared = rsquared)
}
I've then used map to iterate through the formulae and extract the R-squared from each model.
models <- map(input_formula, get_rsquared)
models
which gives the output:
[[1]]
[[1]]$model
mpg ~ disp
<environment: 0x7f98987f4000>
[[1]]$rsquared
[1] 0.7183433
[[2]]
[[2]]$model
mpg ~ disp + hp
<environment: 0x7f98987f4000>
[[2]]$rsquared
[1] 0.7482402
My question is regarding the output being a list.
Is there a simple way to make the output a dataframe?
My desired output is:
#> model rsquared
#> 1 mpg ~ disp 0.7183433
#> 2 mpg ~ disp + hp 0.7482402
Keep the formulas as character strings and use as.formula() as part of the the get_rsquared() function as it's easier to work with them as character strings than formula objects.
library(purrr)
library(dplyr)
df <- mtcars
input_char <- c("mpg ~ disp",
"mpg ~ disp + hp")
get_rsquared <- function(a_formula) {
model1 <- lm(as.formula(a_formula), data = df)
rsquared <- summary(model1)$r.squared
list(model = a_formula, rsquared = rsquared)
}
map_df(input_char, get_rsquared)
# A tibble: 2 x 2
model rsquared
<chr> <dbl>
1 mpg ~ disp 0.718
2 mpg ~ disp + hp 0.748

calculating log likelihood for multivariate linear regression using R

I want to calculate the loglikelihood for multivariate linear regression. I'm not sure whether this code is true or not.
I’ve been calculated the log likelihood using dmvnorm function in mvtnorm r package.
sdmvn_mle <- function(obj){
sdmvn_mle_1 <- apply(obj$residuals^2,2,mean)
sdmvn_mle_2 <- mean(residuals(obj)[,1] * residuals(obj)[,2])
return(matrix(c(sdmvn_mle_1[1], sdmvn_mle_2, sdmvn_mle_2, sdmvn_mle_1[2]), nrow = 2))
}
llmvn <- function(obj, sd){
lr <- c()
for( i in 1: nrow(obj$fitted.values)){
lr <- c(lr, mvtnorm::dmvnorm(model.response(model.frame(obj))[i,], mean=fitted(obj)[i,], sigma=sd, log=TRUE))
}
return(sum(lr))
}
Y <- as.matrix(mtcars[,c("mpg","disp")])
(mvmod <- lm(Y ~ hp + drat + wt, data=mtcars))
# Call:
# lm(formula = Y ~ hp + drat + wt, data = mtcars)
# Coefficients:
# mpg disp
# (Intercept) 29.39493 64.52984
# hp -0.03223 0.66919
# drat 1.61505 -40.10238
# wt -3.22795 65.97577
llmvn(mvmod, sdmvn_mle(mvmod))
# [1] -238.7386
I’m not sure the result is correct or not.
Additionally, Please let me know if there is another strategies for calculating log likelihood for multivariate linear regression.

R - how to pass formula to a with(df, glm(y ~ x)) construction inside a function

I'm using the mice package in R to multiply-impute some missing data. I need to be able to specify a formula that is passed to a with(df, glm(y ~ x)) construction inside of a function. This with() construction is the format used by the mice package to fit the regression model separately within each of the imputed datasets.
However, I cannot figure out the scoping problems preventing me from successfully passing the formula as an argument. Here is a reproducible example:
library(mice)
data(mtcars)
mtcars[5, 5] <- NA # introduce a missing value to be imputed
mtcars.imp = mice(mtcars, m = 5)
# works correctly outside of function
with(mtcars.imp, glm(mpg ~ cyl))
fit_model_mi = function(formula) {
with(mtcars.imp, glm(formula))
}
# doesn't work when trying to pass formula into function
fit_model_mi("mpg ~ cyl")
Also see here for the same question being asked on R help, although it does not receive an answer.
Try wrapping the formula in as.formula
fit_model_mi = function(formula) {
with(mtcars.imp, glm(as.formula(formula)) )
}
Seems to work:
> fit_model_mi("mpg ~ cyl")
call :
with.mids(data = mtcars.imp, expr = glm(as.formula(formula)))
call1 :
mice(data = mtcars, m = 5)
nmis :
mpg cyl disp hp drat wt qsec vs am gear carb
0 0 0 0 1 0 0 0 0 0 0
analyses :
[[1]]
Call: glm(formula = as.formula(formula))
Coefficients:
(Intercept) cyl
37.885 -2.876
Degrees of Freedom: 31 Total (i.e. Null); 30 Residual
Null Deviance: 1126
Residual Deviance: 308.3 AIC: 169.3
You can also attach your data by
attach(mtcars)
Result shown
fit_model_mi("mpg ~ cyl")
call :
with.mids(data = mtcars.imp, expr = glm(formula))
call1 :
mice(data = mtcars, m = 5)
nmis :
mpg cyl disp hp drat wt qsec vs am gear carb
0 0 0 0 1 0 0 0 0 0 0
analyses :
[[1]]
Call: glm(formula = formula)
Coefficients:
(Intercept) cyl
37.885 -2.876
Degrees of Freedom: 31 Total (i.e. Null); 30 Residual
Null Deviance: 1126
Residual Deviance: 308.3 AIC: 169.3

Resources