I want to calculate the loglikelihood for multivariate linear regression. I'm not sure whether this code is true or not.
I’ve been calculated the log likelihood using dmvnorm function in mvtnorm r package.
sdmvn_mle <- function(obj){
sdmvn_mle_1 <- apply(obj$residuals^2,2,mean)
sdmvn_mle_2 <- mean(residuals(obj)[,1] * residuals(obj)[,2])
return(matrix(c(sdmvn_mle_1[1], sdmvn_mle_2, sdmvn_mle_2, sdmvn_mle_1[2]), nrow = 2))
}
llmvn <- function(obj, sd){
lr <- c()
for( i in 1: nrow(obj$fitted.values)){
lr <- c(lr, mvtnorm::dmvnorm(model.response(model.frame(obj))[i,], mean=fitted(obj)[i,], sigma=sd, log=TRUE))
}
return(sum(lr))
}
Y <- as.matrix(mtcars[,c("mpg","disp")])
(mvmod <- lm(Y ~ hp + drat + wt, data=mtcars))
# Call:
# lm(formula = Y ~ hp + drat + wt, data = mtcars)
# Coefficients:
# mpg disp
# (Intercept) 29.39493 64.52984
# hp -0.03223 0.66919
# drat 1.61505 -40.10238
# wt -3.22795 65.97577
llmvn(mvmod, sdmvn_mle(mvmod))
# [1] -238.7386
I’m not sure the result is correct or not.
Additionally, Please let me know if there is another strategies for calculating log likelihood for multivariate linear regression.
Related
If my model looks like this, Y=β0+β1X1+β2X2+β3X3+β4X4, and I want to perform an F test (5%) in R for β1=β2, how do I do it?
The only tutorials I can find online deal with β1=β2=0, but that's not what I'm looking for here.
Here's an example in R testing whether the coefficient for vs is the same as the coefficient for am:
data(mtcars)
mod <- lm(mpg ~ hp + disp + vs + am, data=mtcars)
library(car)
linearHypothesis(mod, "vs=am")
# Linear hypothesis test
#
# Hypothesis:
# vs - am = 0
#
# Model 1: restricted model
# Model 2: mpg ~ hp + disp + vs + am
#
# Res.Df RSS Df Sum of Sq F Pr(>F)
# 1 28 227.07
# 2 27 213.52 1 13.547 1.7131 0.2016
The glht function from multcomp package can do this (among others). For example, if your model is
mod1 <-lm( y ~ x1 + x2 + x3 + x4)
then you can use:
summary(multcomp::glht(mod1, "x1-x2=0"))
Run the model with and without the constraint and then use anova to compare them. No packages are used.
mod1 <- lm(mpg ~ cyl + disp + hp + drat, mtcars)
mod2 <- lm(mpg ~ I(cyl + disp) + hp + drat, mtcars) # constraint imposed
anova(mod2, mod1)
giving:
Analysis of Variance Table
Model 1: mpg ~ I(cyl + disp) + hp + drat
Model 2: mpg ~ cyl + disp + hp + drat
Res.Df RSS Df Sum of Sq F Pr(>F)
1 28 252.95
2 27 244.90 1 8.0513 0.8876 0.3545
The underlying calculation is the following. It gives the same result as above.
L <- matrix(c(0, 1, -1, 0, 0), 1) # hypothesis is L %*% beta == 0
q <- nrow(L) # 1
co <- coef(mod1)
resdf <- df.residual(mod1) # = nobs(mod1) - length(co) = 32 - 5 = 27
SSH <- t(L %*% co) %*% solve(L %*% vcov(mod1) %*% t(L)) %*% L %*% co
SSH/q # F value
## [,1]
## [1,] 0.8876363
pf(SSH/q, q, resdf, lower.tail = FALSE) # p value
## [,1]
## [1,] 0.3544728
I'm trying to check that I understand how R calculates the statistic AIC, AICc (corrected AIC) and BIC for a glm() model object (so that I can perform the same calculations on revoScaleR::rxGlm() objects - particularly the AICc, which isn't available by default)
I had understood that these were defined as follows:
let p = number of model parameters
let n = number of data points
AIC = deviance + 2p
AICc = AIC + (2p^2 + 2p)/(n-p-1)
BIC = deviance + 2p.log(n)
So I tried to replicate these numbers and compare them to the corresponding R function calls. It didn't work:
library(AICcmodavg) # for the AICc() function
data(mtcars)
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
summary(glm_a1)
n <- nrow(glm_a1$data) # 32
p <- glm_a1$rank # 11
dev <- glm_a1$deviance# 147.49
my_AIC <- dev + 2 * p
my_AICc <- my_AIC + (2 * p^2 + 2 * p)/(n - p - 1)
my_BIC <- dev + 2 * p * log(n)
AIC(glm_a1) # 163.71
my_AIC # 169.49
AICc(glm_a1) # 180.13 (from AICcmodavg package)
my_AICc # 182.69
BIC(glm_a1) # 181.30
my_BIC # 223.74
By using debug(AIC) I can see that the calculation is different. It's based on 12 parameters (one extra for the estimated dispersion/scale parameter?). Also the log likelihood is obtained using logLik() which brings back a number -69.85, which suggests to me that the model deviance would be -2*-69.85 = 139.71 (which it isn't).
Does anyone know what I've done wrong please?
Thank you.
in the extractAIC manual page
Where :
L is the likelihood and edf the equivalent degrees of freedom (i.e., the number of parameters for usual parametric models) of fit.
For generalized linear models (i.e., for lm, aov, and glm), -2log L is the deviance, as computed by deviance(fit).
k = 2 corresponds to the traditional AIC, using k = log(n) provides the BIC (Bayes IC) instead.
Thus
Edits following discussion in the comments and input of #user20650
glm_a1$ranks returns the number of fitted parameter without accounting for the fitted variance used in gaussian families.
?glm states
deviance: up to a constant, minus twice the maximized log-likelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero.
that's why -2*logLik(glm_a1) - deviance(glm_a1) = 7.78 > 0
summary(glm_a1) returns the following line Dispersion parameter for gaussian family taken to be 7.023544 approximately the difference between -2 log likelihood and the deviance.
library(AICcmodavg)
#> Warning: package 'AICcmodavg' was built under R version 3.6.2
#> Warning: no function found corresponding to methods exports from 'raster' for:
#> 'wkt'
data(mtcars)
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
#> Deviance = 147.4944 Iterations - 1
#> Deviance = 147.4944 Iterations - 2
(loglik <- logLik(glm_a1))
#> 'log Lik.' -69.85491 (df=12)
# thus the degrees of freedom r uses are 12 instead of 11
n <- attributes(loglik)$nobs # following user20650 recommendation
p <- attributes(loglik)$df # following user20650 recommendation
dev <- -2*as.numeric(loglik)
my_AIC <- dev + 2 * p
my_AICc <- my_AIC + (2 * p^2 + 2 * p)/(n - p - 1)
my_BIC <- dev + p * log(n)
BIC(glm_a1)
#> [1] 181.2986
my_BIC
#> [1] 181.2986
AIC(glm_a1)
#> [1] 163.7098
my_AIC
#> [1] 163.7098
AICc(glm_a1)
#> [1] 180.1309
my_AICc
#> [1] 180.1309
Function to calculate these quantities for an rxGlm() object consistent with treatment of glm() (adjusting for the "up to a constant" difference in deviance):
wrc_information_criteria <- function(rx_glm) # an object created by rxGlm()
{
# add 1 to parameter count for cases where the GLM scale parameter needs to be estimated (notably Gamma/gaussian)
extra_parameter_flag <- case_when(
rx_glm$family$family == "gaussian" ~ 1,
rx_glm$family$family == "Gamma" ~ 1,
rx_glm$family$family == "poisson" ~ 0,
rx_glm$family$family == "binomial" ~ 0,
TRUE ~ 999999999
)
n <- rx_glm$nValidObs
p <- rx_glm$rank + extra_parameter_flag
dev <- rx_glm$deviance
cat("\n")
cat("n :", n, "\n")
cat("p :", p, "\n")
cat("deviance:", dev, "\n")
AIC <- dev + 2 * p
AICc <- AIC + (2 * p^2 + 2 * p)/(n - p - 1)
BIC <- dev + p * log(n)
# make a constant adjustment to AIC/AICc/BIC to give consistency with R's built in AIC/BIC functions applied to glm objects
# can do this because rxGlm() supplies AIC already (consistent with R/glm()) - as long as computeAIC = TRUE in the function call
deviance_constant_adjustment <- rx_glm$aic[1] - AIC
AIC <- AIC + deviance_constant_adjustment
AICc <- AICc + deviance_constant_adjustment
BIC <- BIC + deviance_constant_adjustment
cat("\n")
cat("AIC: ", AIC , "\n")
cat("AICc:", AICc, "\n")
cat("BIC: ", BIC , "\n")
}
Let's test it...
data(mtcars)
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
glm_b1 <- rxGlm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
verbose = 1,
computeAIC = TRUE)
AIC(glm_a1)
AICc(glm_a1)
BIC(glm_a1)
wrc_information_criteria(glm_b1) # gives same results for glm_b1 as I got for glm_a1
I've been using map() to calculate and extract certain statistics from multiple lm() models.
To give a reproducible example, using the mtcars dataset, I start with an input vector of formulae to be estimated using lm() models:
library(tidyverse)
df <- mtcars
input_char <- c("mpg ~ disp",
"mpg ~ disp + hp")
input_formula <- map(input_char, formula)
I've then got a function that calculates and extracts the relevant statistics for each model. For simplicity and reproducibility, here's a simplified function that just extracts the R-squared of the model.
get_rsquared <- function(a_formula) {
model1 <- lm(a_formula, data = df)
rsquared <- summary(model1)$r.squared
c(model = a_formula, rsquared = rsquared)
}
I've then used map to iterate through the formulae and extract the R-squared from each model.
models <- map(input_formula, get_rsquared)
models
which gives the output:
[[1]]
[[1]]$model
mpg ~ disp
<environment: 0x7f98987f4000>
[[1]]$rsquared
[1] 0.7183433
[[2]]
[[2]]$model
mpg ~ disp + hp
<environment: 0x7f98987f4000>
[[2]]$rsquared
[1] 0.7482402
My question is regarding the output being a list.
Is there a simple way to make the output a dataframe?
My desired output is:
#> model rsquared
#> 1 mpg ~ disp 0.7183433
#> 2 mpg ~ disp + hp 0.7482402
Keep the formulas as character strings and use as.formula() as part of the the get_rsquared() function as it's easier to work with them as character strings than formula objects.
library(purrr)
library(dplyr)
df <- mtcars
input_char <- c("mpg ~ disp",
"mpg ~ disp + hp")
get_rsquared <- function(a_formula) {
model1 <- lm(as.formula(a_formula), data = df)
rsquared <- summary(model1)$r.squared
list(model = a_formula, rsquared = rsquared)
}
map_df(input_char, get_rsquared)
# A tibble: 2 x 2
model rsquared
<chr> <dbl>
1 mpg ~ disp 0.718
2 mpg ~ disp + hp 0.748
This question was asked in stackoverflow.com/q/38378118 but there was no satisfactory answer.
LASSO with λ = 0 is equivalent to ordinary least squares, but this does not seem to be the case for glmnet() and lm() in R. Why?
library(glmnet)
options(scipen = 999)
X = model.matrix(mpg ~ 0 + ., data = mtcars)
y = as.matrix(mtcars["mpg"])
coef(glmnet(X, y, lambda = 0))
lm(y ~ X)
Their regression coefficients agree by at most 2 significant figures, perhaps due to slightly different termination conditions of their optimization algorithms:
glmnet lm
(Intercept) 12.19850081 12.30337
cyl -0.09882217 -0.11144
disp 0.01307841 0.01334
hp -0.02142912 -0.02148
drat 0.79812453 0.78711
wt -3.68926778 -3.71530
qsec 0.81769993 0.82104
vs 0.32109677 0.31776
am 2.51824708 2.52023
gear 0.66755681 0.65541
carb -0.21040602 -0.19942
The difference is much worse when we add interaction terms.
X = model.matrix(mpg ~ 0 + . + . * disp, data = mtcars)
y = as.matrix(mtcars["mpg"])
coef(glmnet(X, y, lambda = 0))
lm(y ~ X)
Regression coefficients:
glmnet lm
(Intercept) 36.2518682237 139.9814651
cyl -11.9551206007 -26.0246050
disp -0.2871942149 -0.9463428
hp -0.1974440651 -0.2620506
drat -4.0209186383 -10.2504428
wt 1.3612184380 5.4853015
qsec 2.3549189212 1.7690334
vs -25.7384282290 -47.5193122
am -31.2845893123 -47.4801206
gear 21.1818220135 27.3869365
carb 4.3160891408 7.3669904
cyl:disp 0.0980253873 0.1907523
disp:hp 0.0006066105 0.0006556
disp:drat 0.0040336452 0.0321768
disp:wt -0.0074546428 -0.0228644
disp:qsec -0.0077317305 -0.0023756
disp:vs 0.2033046078 0.3636240
disp:am 0.2474491353 0.3762699
disp:gear -0.1361486900 -0.1963693
disp:carb -0.0156863933 -0.0188304
If you check out these two posts, you will get a sense as to why you are not getting the same results.
In essence, glmnet penalized maximum likelihood using a regularization path to estimate the model. lm solves the least squares problem using QR decomposition. So the estimates will never be exactly the same.
However, note in the manual for ?glmnet under "lambda":
WARNING: use with care. Do not supply a single value for lambda (for
predictions after CV use predict() instead). Supply instead a
decreasing sequence of lambda values. glmnet relies on its warms
starts for speed, and its often faster to fit a whole path than
compute a single fit.
You can do (at least) three things to get the coefficients closer so the difference is trivial--(1) have a range of values for lambda, (2) decrease the threshold value thres, and (3) increase the max number of iterations.
library(glmnet)
options(scipen = 999)
X = model.matrix(mpg ~ 0 + ., data = mtcars)
y = as.matrix(mtcars["mpg"])
lfit <- glmnet(X, y, lambda = rev(0:99), thres = 1E-10)
lmfit <- lm(y ~ X)
coef(lfit, s = 0) - coef(lmfit)
11 x 1 Matrix of class "dgeMatrix"
1
(Intercept) 0.004293053125
cyl -0.000361655351
disp -0.000002631747
hp 0.000006447138
drat -0.000065394578
wt 0.000180943607
qsec -0.000079480187
vs -0.000462099248
am -0.000248796353
gear -0.000222035415
carb -0.000071164178
X = model.matrix(mpg ~ 0 + . + . * disp, data = mtcars)
y = as.matrix(mtcars["mpg"])
lfit <- glmnet(X, y, lambda = rev(0:99), thres = 1E-12, maxit = 10^7)
lmfit <- glm(y ~ X)
coef(lfit, s = 0) - coef(lmfit)
20 x 1 Matrix of class "dgeMatrix"
1
(Intercept) -0.3174019115228
cyl 0.0414909318817
disp 0.0020032493403
hp 0.0001834076765
drat 0.0188376047769
wt -0.0120601219002
qsec 0.0019991131315
vs 0.0636756040430
am 0.0439343002375
gear -0.0161102501755
carb -0.0088921918062
cyl:disp -0.0002714213271
disp:hp -0.0000001211365
disp:drat -0.0000859742667
disp:wt 0.0000462418947
disp:qsec -0.0000175276420
disp:vs -0.0004657059892
disp:am -0.0003517289096
disp:gear 0.0001629963377
disp:carb 0.0000085312911
Some of the differences for the interacted model are probably non-trivial, but closer.
I am interested in calculating estimates and standard errors for linear combinations of coefficients after a linear regression in R. For example, suppose I have the regression and test:
data(mtcars)
library(multcomp)
lm1 <- lm(mpg ~ cyl + hp, data = mtcars)
summary(glht(lm1, linfct = 'cyl + hp = 0'))
This will estimate the value of the sum of the coefficients on cyl and hp, and provide the standard error based on the covariance matrix produced by lm.
But, suppose I want to cluster my standard errors, on a third variable:
data(mtcars)
library(multcomp)
library(lmtest)
library(multiwayvcov)
lm1 <- lm(mpg ~ cyl + hp, data = mtcars)
vcv <- cluster.vcov(lm1, cluster = mtcars$am)
ct1 <- coeftest(lm1,vcov. = vcv)
ct1 contains the SEs for my clustering by am. However, if I try to use the ct1 object in glht, you get an error saying
Error in modelparm.default(model, ...) :
no ‘coef’ method for ‘model’ found!
Any advice on how to do the linear hypothesis with the clustered variance covariance matrix?
Thanks!
glht(ct1, linfct = 'cyl + hp = 0') won't work, because ct1 is not a glht object and can not be coerced to such via as.glht. I don't know whether there is a package or an existing function to do this, but this is not a difficult job to work out ourselves. The following small function does it:
LinearCombTest <- function (lmObject, vars, .vcov = NULL) {
## if `.vcov` missing, use the one returned by `lm`
if (is.null(.vcov)) .vcov <- vcov(lmObject)
## estimated coefficients
beta <- coef(lmObject)
## sum of `vars`
sumvars <- sum(beta[vars])
## get standard errors for sum of `vars`
se <- sum(.vcov[vars, vars]) ^ 0.5
## perform t-test on `sumvars`
tscore <- sumvars / se
pvalue <- 2 * pt(abs(tscore), lmObject$df.residual, lower.tail = FALSE)
## return a matrix
matrix(c(sumvars, se, tscore, pvalue), nrow = 1L,
dimnames = list(paste0(paste0(vars, collapse = " + "), " = 0"),
c("Estimate", "Std. Error", "t value", "Pr(>|t|)")))
}
Let's have a test:
data(mtcars)
lm1 <- lm(mpg ~ cyl + hp, data = mtcars)
library(multiwayvcov)
vcv <- cluster.vcov(lm1, cluster = mtcars$am)
If we leave .vcov unspecified in LinearCombTest, it is as same as multcomp::glht:
LinearCombTest(lm1, c("cyl","hp"))
# Estimate Std. Error t value Pr(>|t|)
#cyl + hp = 0 -2.283815 0.5634632 -4.053175 0.0003462092
library(multcomp)
summary(glht(lm1, linfct = 'cyl + hp = 0'))
#Linear Hypotheses:
# Estimate Std. Error t value Pr(>|t|)
#cyl + hp == 0 -2.2838 0.5635 -4.053 0.000346 ***
If we provide a covariance, it does what you want:
LinearCombTest(lm1, c("cyl","hp"), vcv)
# Estimate Std. Error t value Pr(>|t|)
#cyl + hp = 0 -2.283815 0.7594086 -3.00736 0.005399071
Remark
LinearCombTest is upgraded at Get p-value for group mean difference without refitting linear model with a new reference level, where we can test any combination with combination coefficients alpha:
alpha[1] * vars[1] + alpha[2] * vars[2] + ... + alpha[k] * vars[k]
rather than just the sum
vars[1] + vars[2] + ... + vars[k]