linearHypothesis equivalent for ols command (rms package) in R - r

I am trying to use "linearHypothesis" function from "car" package to test coefficients of a model estimated with "ols" from "rms" package. The function works with "lrm" objects but not with "ols" objects. Have you got any alternatives? I know that using "lm" would sort the issue but I want to use "ols" since it is easier getting clustered standard errors there.

You can use glht from the multcomp package.
library(rms)
library(multcomp)
d <- datadist(swiss); options(datadist="d")
fit <- ols(Fertility ~ ., data = swiss)
summary(fit)
test <- glht(fit, linfct = "Agriculture = 0")
summary(test)
# Fit: ols(formula = Fertility ~ ., data = swiss, x = TRUE)
#
# Linear Hypotheses:
# Estimate Std. Error z value Pr(>|z|)
# Agriculture == 0 -0.1721 0.0703 -2.448 0.0144 *
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Related

Difference between glm with sandwich package and glmrob for Poisson distribution

I´m having some doubts with glmrob(package: robustbase). I want to use glmrob to get the same results as I was having with glm + sandwich.
I was writing:
p_3 <- glm(formula = var1~ var2,
family = poisson(link=log),
data = p3,
na.action = na.omit)
coeftest(p_3, vcov = sandwich)
Both variables are categorical. var1 has two categories and var2 has four.
Now I'm trying to use glmrob to get everything in the same step:
p_2 <- glmrob(formula = var1~ var2,
family = poisson (link=log),
data = p3,
na.action = na.omit,
method= "Mqle",
control = glmrobMqle.control(tcc= 1.2)
)
summary(p2)and summary(p_3)don´t yield the same results so I think that I need to make some changes to this two lines method= "Mqle",control = glmrobMqle.control(tcc= 1.2)but I don´t really know which ones.
Maybe I have to use method="MT" as it works for Poisson models, but I´m not sure.
The outputs:
With glmrob:
summary(p2)
>Call: glmrob(formula = dummy28_n ~ coexistencia, family = poisson(link = log), data = p3, na.action = na.omit, method = "Mqle", control = glmrobMqle.control(tcc = 1.2))
>Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7020 0.5947 -2.862 0.00421 **
coexistenciaPobreza energética 1.1627 0.9176 1.267 0.20514
coexistenciaInseguridad residencial 0.7671 0.6930 1.107 0.26830
coexistenciaCoexistencia de inseguridades 1.3688 0.6087 2.249 0.02453 *
---
>Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Robustness weights w.r * w.x:
143 weights are ~= 1. The remaining 3 ones are
71 124 145
0.6266 0.6266 0.6266
>Number of observations: 146
Fitted by method ‘Mqle’ (in 8 iterations)
>(Dispersion parameter for poisson family taken to be 1)
>No deviance values available
Algorithmic parameters:
acc tcc
0.0001 1.2000
maxit
50
test.acc
"coef"
with glm and sandwich:
coef <- coeftest(p_3, vcov = sandwich)
coef
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.79176 0.52705 -3.3996 0.0006748 ***
coexistenciaPobreza energética 1.09861 0.72648 1.5122 0.1304744
coexistenciaInseguridad residencial 0.69315 0.60093 1.1535 0.2487189
coexistenciaCoexistencia de inseguridades 1.32972 0.53259 2.4967 0.0125349 *
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
´´´

R plm vs fixest package - different results?

I'm trying to understand why R packages "plm" and "fixest" give me different standard errors when I'm estimating a panel model using heteroscedasticity-robust standard errors ("HC1") and state fixed effects.
Does anyone have a hint for me?
Here is the code:
library(AER) # For the Fatality Dataset
library(plm) # PLM
library(fixest) # Fixest
library(tidyverse) # Data Management
data("Fatalities")
# Create new variable : fatality rate
Fatalities <- Fatalities %>%
mutate(fatality_rate = (fatal/pop)*10000)
# Estimate Fixed Effects model using the plm package
plm_reg <- plm(fatality_rate ~ beertax,
data = Fatalities,
index = c("state", "year"),
effect = "individual")
# Print Table with adjusted standard errors
coeftest(plm_reg, vcov. = vcovHC, type = "HC1")
# Output
>t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
beertax -0.65587 0.28880 -2.271 0.02388 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Estimate the very same model using the fixest package
# fixest is much faster and user friendly (in my opinion)
fixest_reg <- feols(fatality_rate ~ beertax | state ,
data = Fatalities,
vcov = "HC1",
panel.id = ~ state + year)
# print table
etable(fixest_reg)
#output
> fixest_reg
Dependent Var.: fatality_rate
beertax -0.6559** (0.2033)
Fixed-Effects: ------------------
state Yes
_______________ __________________
S.E. type Heteroskedas.-rob.
Observations 336
R2 0.90501
Within R2 0.04075
In this example, the standard error is larger when using plm compared to the fixest results (the same is true if state+year fixed effects are used). Does anyone know the reason for this to happen?
Actually the VCOVs are different.
In plm vcovHC defaults to Arellano (1987) which also takes into account serial correlation. See documentation here.
If you add the argument method = "white1", you end up with the same type of VCOV.
Finally, you also need to change how the fixed-effects are accounted for in fixest to obtain the same standard-errors (see details on small sample correction here).
Here are the results:
# Requesting "White" VCOV
coeftest(plm_reg, vcov. = vcovHC, type = "HC1", method = "white1")
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> beertax -0.65587 0.18815 -3.4858 0.0005673 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Changing the small sample correction in fixest (discarding the fixed-effects)
etable(fixest_reg, vcov = list("hc1", hc1 ~ ssc(fixef.K = "none")), fitstat = NA)
#> fixest_reg fixest_reg
#> Dependent Var.: fatality_rate fatality_rate
#>
#> beertax -0.6559** (0.2033) -0.6559*** (0.1882)
#> Fixed-Effects: ------------------ -------------------
#> state Yes Yes
#> _______________ __________________ ___________________
#> S.E. type Heteroskedas.-rob. Heteroskedast.-rob.
# Final comparison
rbind(se(vcovHC(plm_reg, type = "HC1", method = "white1")),
se(fixest_reg, hc1 ~ ssc(fixef.K = "none")))
#> beertax
#> [1,] 0.1881536
#> [2,] 0.1881536

GAM with binomial distribution and with spatial autocorrelation in R

I am using gam (from mgcv package in R) to model presence/absence data in 3355 cells of 1x1km (151 presences and 3204 absences). Even though I include a smooth with the spatial locations in the model to address the spatial dependence in my data, the results from a variogram show spatial autocorrelation in the residuals of my gam (range=6000 meters). Since I am modelling a binary response, using a gamm with a correlation structure is not advisable because it "performs poorly with binary data", neither gamm4 because (although is supposed to be appropriate for binary data) it has "no facility for nlme style correlation structures".
The alternative I have found is to fit my model using the function magic from the same mgcv package. Because I found no examples of how to use magic for spatially correlated data I have adapted the ?magic example for temporally correlated data. The results of the output (see below) change the coefficients of the model but does not remove the spatial autocorrelation and the smooth plots show the same effect.
> summary(data)
Object of class SpatialPolygonsDataFrame
Coordinates:
min max
x 670000 780000
y 140000 234000
Is projected: TRUE
proj4string :
[+proj=tmerc +lat_0=0 +lon_0=19 +k=0.9993 +x_0=500000 +y_0=-5300000 +datum=WGS84 +units=m
+no_defs +ellps=WGS84 +towgs84=0,0,0]
Data attributes:
f_edge lat long dam
Min. : 0.0 Min. :670500 Min. :140500 0:3204
1st Qu.: 0.0 1st Qu.:722500 1st Qu.:165500 1: 151
Median : 8.0 Median :743500 Median :181500
Mean :11.5 Mean :736992 Mean :180949
3rd Qu.:21.0 3rd Qu.:756500 3rd Qu.:194500
Max. :50.0 Max. :779500 Max. :233500
> gam1x1 <- gam(dam ~ s(lat, long) + s(f_edge),
+ family = binomial, data=data,
+ method = "ML", select = TRUE, na.action = "na.fail")
> # generate standadized residuals
> res <- residuals(gam1x1, type = "person")
> # calculate the sample variogram and fit the variogram model
> var <- variogram(res ~ long + lat, data=data)
> fit <- fit.variogram(var, vgm(c("Sph", "Exp"))); plot(var, fit)
> # extract the nugget and range from the variogram model to create the correlation structure
> # to include in the magic formula
> fit
model psill range
1 Nug 0.5324180 0.000
2 Sph 0.2935823 6002.253
> cs1Exp <- corSpher(c(6002.253, 0.5324180), form = ~ long + lat, nugget=TRUE)
> cs1Exp <- Initialize(cs1Exp, dataPOD1x1s)
> # follow instructions from mgcv 'magic' function
> V <- corMatrix(cs1Exp)
> Cv <- chol(V)
> ## gam ignoring correlation
> b <- gam(dam ~ s(lat, long) +
+ s(f_edge),
+ family = binomial,
+ data=data,
+ method = "ML",
+ select = TRUE,
+ na.action = "na.fail")
> ## Fit smooth, taking account of *known* correlation...
> w <- solve(t(Cv)) # V^{-1} = w'w
> ## Use `gam' to set up model for fitting...
> G <- gam(dam10_15 ~ s(lat, long) +
+ s(f_edge),
+ family = binomial,
+ data=dataPOD1x1s,
+ method = "ML",
+ select = TRUE,
+ na.action = "na.fail",
+ fit=FALSE)
> ## fit using magic, with weight *matrix*
> mgfit <- magic(G$y,G$X,G$sp,G$S,G$off,rank=G$rank,C=G$C,w=w)
> mg.stuff <- magic.post.proc(G$X,mgfit,w)
> b$edf <- mg.stuff$edf;b$Vp <- mg.stuff$Vb
> b$coefficients <- mgfit$b
> summary(gam1x1)
Family: binomial
Link function: logit
Formula:
dam ~ s(lat, long) + s(f_edge)
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.6634 0.1338 -27.37 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(lat,long) 15.208 29 83.35 2.70e-14 ***
s(f_edge) 3.176 9 54.05 2.85e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.0595 Deviance explained = 15.6%
-ML = 551.92 Scale est. = 1 n = 3355
> summary(b)
Family: binomial
Link function: logit
Formula:
dam ~ s(lat, long) + s(f_edge)
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
[1,] 0.06537 0.01131 5.778 7.56e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(lat,long) 3.716 29 0.046 1
s(f_edge) 6.753 9 0.295 1
R-sq.(adj) = 0.0618 Deviance explained = 15.6%
-ML = 551.92 Scale est. = 1 n = 3355
plot(b, select=2)
plot(gam1x1, select =2)
> resMag <- residuals.gam(b, type = "person")
> varMag <- variogram(resMag ~ long + lat, data=dataPOD1x1s)
> fitMag <- fit.variogram(varMag, vgm(c("Sph", "Exp"))); plot(varMag, fitMag)
> fitMag
model psill range
1 Nug 0.5324180 0.000
2 Sph 0.2935823 6002.253
Is something wrong in my script? Is there any other alternative to remove the residuals' spatial autocorrelation from my binomial gam?
Thanks a lot!

Contrasts for geeglm ((binomial(link="logit") using glht (multcomp package)

I’m using the multcomp package to generate contrasts for a geeglm (binomial(link="logit") ) model in R. I am running the geeglm model running the following script.
Library(geepack)
u1<-geeglm(outcome~ px_race_jama,id=npi_gp, family=binomial(link="logit"),data=mf)
Summary(u1)
Call:
geeglm(formula = outcome ~ px_race_jama, family = binomial(link = "logit"),
data = mf, id = npi_gp)
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) -0.4671 0.1541 9.19 0.0024 **
px_race_jama1 0.0959 0.1155 0.69 0.4067
px_race_jama2 -0.0293 0.1503 0.04 0.8453
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 1 0.0506
Correlation: Structure = independenceNumber of clusters: 83 Maximum cluster size: 792
To get the contrasts for the model I run the script
Library(multcomp)
glht(u1,mcp(px_race_jama="Tukey"))
I receive the error:
Error in match.arg(type) :
'arg' should be one of “pearson”, “working”, “response”
Error in modelparm.default(model, ...) :
no ‘vcov’ method for ‘model’ found!
Alternatively, I have tried creating a contrast matrix:
contrast.matrix <- rbind(
`Other-Black` = c(0, -1, 1))
comps <- glht(u1, contrast.matrix)
summary(comps)
However, I receive the same error. Any help on how to correctly generate the contrasts would be greatly appreciated.
Respectfully,
Jdukes
Something like this?
contrast.matrix <- matrix(c(0,-1,1,
0,1,-1),nrow=2,byrow=TRUE)
contrasts_geeglm <- function(fit,model_matrix,vcov_type = "robust"){
vcov_gee = if(vcov_type =="robust"){
fit$geese$vbeta}else{fit$geese$vbeta.naiv}
contrast_est = coef(fit)%*%t(model_matrix)
contrast_se = sqrt(model_matrix%*%vcov_gee%*% t(model_matrix))
output = data.frame(Estimate = contrast_est[1,],
SE = diag(contrast_se)) %>%
mutate(LCI = Estimate - 1.96*SE,
UCI = Estimate + 1.96*SE)
return(output)
}
contrasts_geeglm(u1,contrast.matrix,vcov_type="robust")

Error comparing linear mixed effects models

I want to see whether the fixed effect Group2 in my model is significant. The model is:
Response ~ Group1 + Group2 + Gender + Age + BMI + (1 | Subject)
To check the significance I create a null model not containing the effect Group2:
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
and the full model containing the effect Group2:
Resp.model = lmer(Response~Group1+Group2+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
Then I use anova() to compare the two, but I get an error:
anova(Resp.null, Resp.model)
## Error in anova.merMod(Resp.null, Resp.model) :
## models were not all fitted to the same size of dataset
I think that the problem is that Group1 contains NaN, but I thought that linear mixed models were robust to missing data.
How can I solve this problem and compare the two models?
Do I have to delete the rows corresponding to NaN and fit Resp.null without these rows?
The data can be downloaded here.
Please note that you should replace "<undefined>" with NaN like this:
mydata = read.csv("mydata.csv")
mydata[mydata == "<undefined>"] <- NA
To avoid the "models were not all fitted to the same size of dataset" error in anova, you must fit both models on the exact same subset of data.
There are two simple ways to do this, and while this reproducible example uses lm and update, for lmer objects the same approach should work:
# 1st approach
# define a convenience wrapper
update_nested <- function(object, formula., ..., evaluate = TRUE){
update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}
# prepare data with NAs
data(mtcars)
for(i in 1:ncol(mtcars)) mtcars[i,i] <- NA
xa <- lm(mpg~cyl+disp, mtcars)
xb <- update_nested(xa, .~.-cyl)
anova(xa, xb)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 2nd approach
xc <- update(xa, .~.-cyl, data=na.omit(mtcars[ , all.vars(formula(xa))]))
anova(xa, xc)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If however you're only interested in testing a single variable (e.g. Group2), then perhaps the Anova() or linearHypothesis() in car would work as well for this usecase.
See also:
How to update `lm` or `glm` model on same subset of data?
R error which says "Models were not all fitted to the same size of dataset"
Fit Resp.model first, then use Resp.model#frame as data argument.
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=Resp.model#frame,REML=FALSE)

Resources