I have fitted a binominal logistic glm with a three-way interaction between sex (male & female), tree cover including a quadratic term (1-100%), and the mean tree cover of an area (1-100%).
(case is 1 used and 0 for not used)
glm.winter.3 <- glm(case ~
sex. * mean95 * poly(tree.cover,2),
data = rsf.winter.3, family = binomial (link = "logit"))
I found a nice plot in a paper. I would like to do something similar but I can not find a way to approach it :
My data set is large. So it's hard to share it. Maybe somebody has an idea how to approach it anyway? Thanks
Here's an example with some built-in data. Using the mtcars data, I made a binary variable identifying gas guzzlers gg which is 1 for those cars that get less than 19 MPG and 0 for all others. I then modelled it as a multiplicative function of wt (weight), hp (horsepower) and vs whether or not it is a v-shaped engine. Here are the results:
library(lattice)
library(dplyr)
data(mtcars)
mtcars <- mtcars %>%
mutate(fast = as.numeric(qsec < mean(qsec)),
gg = as.numeric(mpg< 19),
carb2 = ifelse(carb <= 2, 0, 1))
mod <- glm(gg ~ hp* wt * vs, data=mtcars, family=binomial)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mod)
#>
#> Call:
#> glm(formula = gg ~ hp * wt * vs, family = binomial, data = mtcars)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -2.24867 -0.00002 0.00000 0.28763 1.17741
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -3.299e+01 4.336e+01 -0.761 0.447
#> hp 1.346e-01 2.341e-01 0.575 0.565
#> wt 8.744e+00 1.271e+01 0.688 0.491
#> vs -1.447e+03 5.600e+05 -0.003 0.998
#> hp:wt -3.231e-02 6.798e-02 -0.475 0.635
#> hp:vs 9.144e+00 6.074e+03 0.002 0.999
#> wt:vs 4.512e+02 1.701e+05 0.003 0.998
#> hp:wt:vs -2.906e+00 1.815e+03 -0.002 0.999
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 44.236 on 31 degrees of freedom
#> Residual deviance: 11.506 on 24 degrees of freedom
#> AIC: 27.506
#>
#> Number of Fisher Scoring iterations: 22
The next step is to make sequences of the two continuous variables that go from their minima to maxima across some number of values - usually 25 is enough to give a smooth looking graph.
hp_seq <- seq(min(mtcars$hp, na.rm=TRUE),
max(mtcars$hp, na.rm=TRUE),
length=25)
wt_seq <- seq(min(mtcars$wt, na.rm=TRUE),
max(mtcars$wt, na.rm=TRUE),
length=25)
Next, we make all combinations of the two sequences of values and the binary variable (in this case vs).
eg <- expand.grid(hp = hp_seq,
wt = wt_seq,
vs = c(0,1))
We can then make predictions from those data and save them back into the data object. We also make vs a factor - the levels of the factor will show up in the strip above each plot.
eg$fit <- predict(mod, newdata=eg, type="response")
eg$vs <- factor(eg$vs, labels=c("VS = 0", "VS = 1"))
We use wireframe from the lattice package to make the plots
wireframe(fit ~ hp + wt | vs, data=eg, drape=TRUE,
default.scales = list(arrows=FALSE))
Created on 2022-05-06 by the reprex package (v2.0.1)
Note, there is another answer about making 3D surface plots here that demonstrates perp from base R and plot_ly from the plotly package and persp3D from the plot3D package.
Related
I have the following multinomial logit regression
FLAG ~ Cash + Debt + Other variables
Where FLAG is a dummy with three levels, that distinguishes three different types of firms, while Cash and Debt are some variables regarding those firms.
The results of the regression are the following
Variables Type 0 vs Type 1 Type 0 vs Type 2
Cash 0.543*** 0.321***
Debt -0.124*** 0.452***
I now want to test whether the difference between these coefficients is statistically significant (in this case, between 0.543 and 0.321, and between -0.124 and 0.452)
I know I have to use the Wald test (as it was used in some papers I looked at, that performed the same analysis), but I don't know how to implement it on R.
What's the R code to implement a Wald test of the difference between the coefficients of the model?
Here's an example using the Chile data from the carData package.
data(Chile, package="carData")
mod <- nnet::multinom(vote ~ sex + I(income/10000), data=Chile)
#> # weights: 16 (9 variable)
#> initial value 3395.034890
#> iter 10 value 3105.730891
#> final value 3043.519706
#> converged
summary(mod)
#> Call:
#> nnet::multinom(formula = vote ~ sex + I(income/10000), data = Chile)
#>
#> Coefficients:
#> (Intercept) sexM I(income/10000)
#> N 1.212092 0.5676465119 0.02101608
#> U 1.430206 -0.2235933346 -0.06761914
#> Y 1.477171 -0.0003631522 0.02018734
#>
#> Std. Errors:
#> (Intercept) sexM I(income/10000)
#> N 0.1326298 0.1655421 0.02108768
#> U 0.1352691 0.1738994 0.02478598
#> Y 0.1300724 0.1657089 0.02108419
#>
#> Residual Deviance: 6087.039
#> AIC: 6105.039
Let's say that in the model above, you wanted to compare the coefficients for sexM for the categories N and U. You could save the variance-covariance matrix of the estimators in v and the coefficients themselves in a vector called b.
v <- vcov(mod)
b <- c(t(coef(mod)))
names(b) <- rownames(v)
Then, you could create the relevant z-statistic by dividing the difference in coefficients by its standard error (the square root of the sum of the two variances minus two times the covariance of the relevant parameters):
z_wt <- (b["N:sexM"] - b["U:sexM"])/sqrt(v["N:sexM", "N:sexM"] + v["U:sexM", "U:sexM"] - 2*v["N:sexM", "U:sexM"])
Then, you could use pnorm() to calculate the p-value.
2*pnorm(abs(z_wt), lower.tail=FALSE)
#> N:sexM
#> 1.255402e-12
Another option would be to just change the reference category. For example if I wanted the U-N comparison, I could change the reference to either U or N and re-estimate the model:
Chile$vote2 <- relevel(Chile$vote, "U")
mod2 <- nnet::multinom(vote2 ~ sex + I(income/10000), data=Chile, maxit=1000)
#> # weights: 16 (9 variable)
#> initial value 3395.034890
#> iter 10 value 3081.860210
#> final value 3043.519705
#> converged
z_wt2 <- coef(mod2)["N", "sexM"]/summary(mod2)$standard.errors["N", "sexM"]
2*pnorm(abs(z_wt2), lower.tail=FALSE)
#> [1] 1.256475e-12
Note that the two p-values are very close (though not exactly the same) as are the differences in the coefficients from the two methods, which are not printed.
I wrote a package called factorplot which will calculate all of the pairwise differences for a single variable's coefficients in the MNL model:
library(factorplot)
f <- factorplot(mod, variable="sexM")
print(f)
#> Difference SE p.val
#> A - N -0.568 0.166 0.001
#> A - U 0.224 0.174 0.199
#> N - U 0.791 0.111 0.000
#> A - Y 0.000 0.166 0.998
#> N - Y 0.568 0.098 0.000
#> U - Y -0.223 0.112 0.046
Created on 2022-09-29 by the reprex package (v2.0.1)
This method is a bit easier and is based on the same calculations I proposed in the first method above.
Consider I have a linear mixed model with two continuous variables and use contrast coding for two factors with each two categories respectively (A,B). A random effect is optional.
contrasts(data$fac1) <- c(-.5,.5)
contrasts(data$fac2) <- c(-.5,.5)
model<-lme(Y~x1+x2+x1:fac1+x2:fac1+x1:fac2+x2:fac2+fac1+fac2+fac1:fac2, random=~1|group,data)
then the output will give me the main effects for x1 and x2 and the difference between slopes for fac1 and fac2.
But how can I calculate individual p-values for say the slope of x1 fac1=="A" and fac2=="B" ?
Is there an R package or do I have to calculate them manually ?
And if yes how? -following calls to vcov() adding up respective matrix entries and call to pt() (which df to use)?
Thanks!
You could try the marginaleffects package. (Disclaimer: I am the author.)
There are many vignettes on the website, including one with simple examples of mixed effects models with the lme4 package: https://vincentarelbundock.github.io/marginaleffects/articles/lme4.html
You can specify the values of covariates using the newdata argument and the datagrid function. The covariates you do not specify in datagrid will be held at their means or modes:
library(lme4)
library(marginaleffects)
mod <- glmer(am ~ mpg * hp + (1 | gear),
data = mtcars,
family = binomial)
marginaleffects(mod, newdata = datagrid(hp = c(100, 110), gear = 4))
#> rowid type term dydx std.error statistic p.value
#> 1 1 response mpg 0.077446700 0.33253683 0.2328966 0.8158417
#> 2 2 response mpg 0.337725702 0.90506056 0.3731526 0.7090349
#> 3 1 response hp 0.006199167 0.02647471 0.2341543 0.8148652
#> 4 2 response hp 0.025604198 0.06770870 0.3781522 0.7053175
#> conf.low conf.high mpg hp gear
#> 1 -0.57431351 0.72920691 20.09062 100 4
#> 2 -1.43616041 2.11161181 20.09062 110 4
#> 3 -0.04569032 0.05808865 20.09062 100 4
#> 4 -0.10710242 0.15831082 20.09062 110 4
I'd like to create an LM with HC3 corrected standard errors and a fixest model with cluster robust standard errors in the same table.
see my MRE below:
df <- mtcars
models <- list()
models[[1]] <- lm(cyl~disp, data = df)
models[[2]] <- feols(cyl~disp|as.factor(gear), data = df)
library(modelsummary)
# this works
modelsummary::modelsummary(models)
# but these do not
modelsummary::modelsummary(models, vcov = c("HC3", "cluster"))
modelsummary::modelsummary(models, vcov = c(HC3, cluster))
modelsummary::modelsummary(models, vcov = list(HC3, cluster))
modelsummary::modelsummary(models, vcov = list(vcovHC, cluster))
modelsummary::modelsummary(models, vcov = list(vcovHC, vcovHC))
modelsummary::modelsummary(models, vcov = c(vcovHC, vcovHC))
Okay--I figured out a hack, but still leaving question open in case someone finds out a more slick solution.
df <- mtcars
models <- list()
fit <- lm(cyl~disp, data = df)
models[[1]] <- coeftest(fit, vcovHC(fit, type = "HC3"))
models[[2]] <- summary(feols(cyl~disp|as.factor(gear), data = df), "cluster")
library(modelsummary)
# this works
modelsummary::modelsummary(models)
By default fixest automatically computes cluster-robust standard errors when you include fixed effects. If you set vcov=NULL, modelsummary will return the default standard errors, which will then be cluster-robust.
Alternatively, you can set vcov=~gear to compute the standard errors using the sandwich::vcovCL function under the hood.
Using version 0.10.0 of modelsummary and 0.10.4 of fixest, we can do:
library(modelsummary)
library(fixest)
models <- list(
lm(cyl~disp, data = mtcars),
feols(cyl ~ disp | gear, data = mtcars),
feols(cyl ~ disp | gear, data = mtcars))
modelsummary(models,
fmt = 6,
vcov = list("HC3", NULL, ~gear))
Model 1
Model 2
Model 3
(Intercept)
3.188568
(0.289130)
disp
0.012998
0.012061
0.012061
(0.001310)
(0.002803)
(0.002803)
Num.Obs.
32
32
32
R2
0.814
0.819
0.819
R2 Adj.
0.807
0.800
0.800
R2 Within
0.614
0.614
R2 Pseudo
AIC
79.1
80.2
80.2
BIC
83.5
86.1
86.1
Log.Lik.
-36.573
-36.107
-36.107
F
98.503
Std.Errors
HC3
by: gear
by: gear
FE: gear
X
X
Notice that the results are the same in the 2nd and 3rd column, and also equal to the plain fixest summary:
summary(models[[2]])
#> OLS estimation, Dep. Var.: cyl
#> Observations: 32
#> Fixed-effects: gear: 3
#> Standard-errors: Clustered (gear)
#> Estimate Std. Error t value Pr(>|t|)
#> disp 0.012061 0.002803 4.30299 0.049993 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.747808 Adj. R2: 0.799623
#> Within R2: 0.614334
In some past versions of modelsummary there were issues with labelling of the standard errors at the bottom of the table. I will soon be working on a more robust system to make sure the label matches the standard errors. In most cases, modelsummary calculates the robust standard errors itself, so it has full control over labelling and computation. Packages like fixest and estimatr make things a bit more tricky because they sometimes hold several SEs, and because the default is not always “classical”.
I am using R to conduct a quantile regression with bootstrapped standard errors to test if one variable is higher than a second variable at the 5th, 50th, and 95th percentiles of the distributions. The output does not include degrees of freedom for the t stat. How can I calculate this?
summary(rq(data$var1~data$var2, tau=.05), se="boot")
summary(rq(data$var1~data$var2, tau=.5), se="boot")
Assuming you used the library quantreg, if you were to call rq() by itself, you get the degrees of freedom.
It looks like you're fairly new to SO; welcome to the community! If you want great answers quickly, it's best to make your question reproducible. This includes the libraries you used or sample data like the output from dput(head(dataObject))). Check it out: making R reproducible questions.
Capturing the degrees of freedom, in this case, should be relatively easy.
In truth, the number of observations and subtract the number of observations is total degrees of freedom. The residual degrees of freedom are the number of observations minus the number of variables in the formula.
The degrees of freedom for each t-statistic is the number of variables that are represented for that t-statistic (typically one).
If you call the regression directly (instead of nested in the summary function), it gives you information about the degrees of freedom, as well. That being said, if you don't run the model independently, it is more difficult to test the assumptions that the data must meet for the analysis. Lastly, in this form, you can't test the model for overfitting, either.
library(quantreg)
data(mtcars)
(fit <- rq(mpg ~ wt, data = mtcars, tau = .05))
# Call:
# rq(formula = mpg ~ wt, tau = 0.05, data = mtcars)
#
# Coefficients:
# (Intercept) wt
# 37.561538 -6.515837
#
# Degrees of freedom: 32 total; 30 residual
(fit2 <- rq(mpg ~ wt, data = mtcars, tau = .5))
# Call:
# rq(formula = mpg ~ wt, tau = 0.5, data = mtcars)
#
# Coefficients:
# (Intercept) wt
# 34.232237 -4.539474
#
# Degrees of freedom: 32 total; 30 residual
summary(fit, se = "boot")
#
# Call: rq(formula = mpg ~ wt, tau = 0.05, data = mtcars)
#
# tau: [1] 0.05
#
# Coefficients:
# Value Std. Error t value Pr(>|t|)
# (Intercept) 37.56154 5.30762 7.07690 0.00000
# wt -6.51584 1.58456 -4.11208 0.00028
summary(fit2, se = "boot")
#
# Call: rq(formula = mpg ~ wt, tau = 0.5, data = mtcars)
#
# tau: [1] 0.5
#
# Coefficients:
# Value Std. Error t value Pr(>|t|)
# (Intercept) 34.23224 3.20718 10.67362 0.00000
# wt -4.53947 1.04645 -4.33798 0.00015
I would like to point out that se = "boot" doesn't appear to be doing anything. Additionally, you can run both tau settings in the same model. The Quantreg package has several tools for comparing the models when ran as together.
I have a glm model for which I use coeftest from the lmtest package to estimate robust standard errors. When I use stargazer to produce regression tables I get the correct results but without the number of observations and other relevant statistics like the null deviance and the model deviance.
Here's an example:
library(lmtest)
library(stargazer)
m1 <- glm(am ~ mpg + cyl + disp, mtcars, family = binomial)
# Simple binomial regression
# For whatever reason, let's say I want to use coeftest to estimate something
m <- coeftest(m1)
stargazer(m, type = "text", single.row = T) # This is fine, but I want to also include the number of observations
# the null deviance and the model deviance.
I'm specifically interested in the number of observations, the null deviance and the residual deviance.
I thought that If I replaced the old coefficient matrix with the new one, I'd get the correct estimates with the correct statistics and stargazer would recognize the model and print it correctly. For that, I've tried substituting the coefficients, SE's, z statistic and p values from the coeftest model in the m1 model but some of these statistics are computed with summary.glm and are not included in the m1 output. I could easily substitute these coefficients in the summary output but stargazer doesn't recognize summary type class. I've tried adding attributes to the m object with the specific statistics but they don't show up in the output and stargazer doesn't recognize it.
Note: I know stargazer can compute robust SE's but I'm also doing other computations, so the example needs to include the coeftest output.
Any help is appreciated.
It may be easiest to pass the original models into stargazer, and then use coeftest to pass in custom values for standard errors (se = ), confidence intervals (ci.custom = ) and/or p values (p = ). See below for how to easily handle a list containing multiple models.
suppressPackageStartupMessages(library(lmtest))
suppressPackageStartupMessages(library(stargazer))
mdls <- list(
m1 = glm(am ~ mpg, mtcars, family = poisson),
m2 = glm(am ~ mpg + cyl + disp, mtcars, family = poisson)
)
# Calculate robust confidence intervals
se_robust <- function(x)
coeftest(x, vcov. = sandwich::sandwich)[, "Std. Error"]
# Original SE
stargazer(mdls, type = "text", single.row = T, report = "vcsp")
#>
#> ===============================================
#> Dependent variable:
#> -----------------------------
#> am
#> (1) (2)
#> -----------------------------------------------
#> mpg 0.106 (0.042) 0.028 (0.083)
#> p = 0.012 p = 0.742
#> cyl 0.435 (0.496)
#> p = 0.381
#> disp -0.014 (0.009)
#> p = 0.151
#> Constant -3.247 (1.064) -1.488 (3.411)
#> p = 0.003 p = 0.663
#> -----------------------------------------------
#> Observations 32 32
#> Log Likelihood -21.647 -20.299
#> Akaike Inf. Crit. 47.293 48.598
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
# With robust SE
stargazer(
mdls, type = "text", single.row = TRUE, report = "vcsp",
se = lapply(mdls, se_robust))
#>
#> ===============================================
#> Dependent variable:
#> -----------------------------
#> am
#> (1) (2)
#> -----------------------------------------------
#> mpg 0.106 (0.025) 0.028 (0.047)
#> p = 0.00002 p = 0.560
#> cyl 0.435 (0.292)
#> p = 0.137
#> disp -0.014 (0.007)
#> p = 0.042
#> Constant -3.247 (0.737) -1.488 (2.162)
#> p = 0.00002 p = 0.492
#> -----------------------------------------------
#> Observations 32 32
#> Log Likelihood -21.647 -20.299
#> Akaike Inf. Crit. 47.293 48.598
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
Created on 2020-11-09 by the reprex package (v0.3.0)
If I get you right, you could try the following:
First, assign your stargazer analysis to an object like this
stargazer.values <- stargazer(m, type = "text", single.row = T)
then check the code of the stargazer command with body(stargazer).
Hopefully you can find objects for values that stargazers uses but does not report. You can then address them like this (if there is, for example, an object named "null.deviance"
stargazers.values$null.deviance
Or, if it is part of another data frame, say df, it could go like this
stargazers.values$df$null.deviance
maybe a code like this could be helpful
print(null.deviance <- stargazers.values$null.deviance)
Hope this helps!