Marginal Effects of conditional logit model in R using, "clogit," function - r

I am trying to figure out how to calculate the marginal effects of my model using the, "clogit," function in the survival package. The margins package does not seem to work with this type of model, but does work with "multinom" and "mclogit." However, I am investigating the affects of choice characteristics, and not individual characteristics, so it needs to be a conditional logit model. The mclogit function works with the margins package, but these results are widely different from the results using the clogit function, why is that? Any help calculating the marginal effects from the clogit function would be greatly appreciated.
mclogit output:
Call:
mclogit(formula = cbind(selected, caseID) ~ SysTEM + OWN + cost +
ENVIRON + NEIGH + save, data = atl)
Estimate Std. Error z value Pr(>|z|)
SysTEM 0.139965 0.025758 5.434 5.51e-08 ***
OWN 0.008931 0.026375 0.339 0.735
cost -0.103012 0.004215 -24.439 < 2e-16 ***
ENVIRON 0.675341 0.037104 18.201 < 2e-16 ***
NEIGH 0.419054 0.031958 13.112 < 2e-16 ***
save 0.532825 0.023399 22.771 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 18380
Residual Deviance: 16670
Number of Fisher Scoring iterations: 4
Number of observations: 8364
clogit output:
Call:
coxph(formula = Surv(rep(1, 25092L), selected) ~ SysTEM + OWN +
cost + ENVIRON + NEIGH + save + strata(caseID), data = atl,
method = "exact")
n= 25092, number of events= 8364
coef exp(coef) se(coef) z Pr(>|z|)
SysTEM 0.133184 1.142461 0.034165 3.898 9.69e-05 ***
OWN -0.015884 0.984241 0.036346 -0.437 0.662
cost -0.179833 0.835410 0.005543 -32.442 < 2e-16 ***
ENVIRON 1.186329 3.275036 0.049558 23.938 < 2e-16 ***
NEIGH 0.658657 1.932195 0.042063 15.659 < 2e-16 ***
save 0.970051 2.638079 0.031352 30.941 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
SysTEM 1.1425 0.8753 1.0685 1.2216
OWN 0.9842 1.0160 0.9166 1.0569
cost 0.8354 1.1970 0.8264 0.8445
ENVIRON 3.2750 0.3053 2.9719 3.6091
NEIGH 1.9322 0.5175 1.7793 2.0982
save 2.6381 0.3791 2.4809 2.8053
Concordance= 0.701 (se = 0.004 )
Rsquare= 0.103 (max possible= 0.688 )
Likelihood ratio test= 2740 on 6 df, p=<2e-16
Wald test = 2465 on 6 df, p=<2e-16
Score (logrank) test = 2784 on 6 df, p=<2e-16
margins output for mclogit
margins(model2A)
SysTEM OWN cost ENVIRON NEIGH save
0.001944 0.000124 -0.001431 0.00938 0.00582 0.0074
margins output for clogit
margins(model2A)
Error in match.arg(type) :
'arg' should be one of “risk”, “expected”, “lp”

Related

Changing base category in latent class analysis

I'm using the glca package to run a latent class analysis. I want to see how covariates (other than indicators used to construct latent classes) affect the probability of class assignment. I understand this is a multinomial logistic regression, and thus, my question is, is there a way I can change the base reference latent class? For example, my model is currently a 4-class model, and the output shows the effect of covariates on class prevalence with respect to Class-4 (base category) as default. I want to change this base category to, for example, Class-2.
My code is as follows
fc <- item(intrst, respect, expert, inclu, contbt,secure,pay,bonus, benft, innov, learn, rspons, promote, wlb, flex) ~ atenure+super+sal+minority+female+age40+edu+d_bpw+d_skill
lca4_cov <- glca(fc, data = bpw, nclass = 4, seed = 1)
and I get the following output.
> coef(lca4_cov)
Class 1 / 4 :
Odds Ratio Coefficient Std. Error t value Pr(>|t|)
(Intercept) 1.507537 0.410477 0.356744 1.151 0.24991
atenure 0.790824 -0.234679 0.102322 -2.294 0.02183 *
super 1.191961 0.175600 0.028377 6.188 6.29e-10 ***
sal 0.937025 -0.065045 0.035490 -1.833 0.06686 .
minority 2.002172 0.694233 0.060412 11.492 < 2e-16 ***
female 1.210653 0.191160 0.059345 3.221 0.00128 **
age40 1.443603 0.367142 0.081002 4.533 5.89e-06 ***
edu 1.069771 0.067444 0.042374 1.592 0.11149
d_bpw 0.981104 -0.019077 0.004169 -4.576 4.78e-06 ***
d_skill 1.172218 0.158898 0.036155 4.395 1.12e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Class 2 / 4 :
Odds Ratio Coefficient Std. Error t value Pr(>|t|)
(Intercept) 3.25282 1.17952 0.43949 2.684 0.00729 **
atenure 0.95131 -0.04992 0.12921 -0.386 0.69926
super 1.16835 0.15559 0.03381 4.602 4.22e-06 ***
sal 1.01261 0.01253 0.04373 0.287 0.77450
minority 0.72989 -0.31487 0.08012 -3.930 8.55e-05 ***
female 0.45397 -0.78971 0.07759 -10.178 < 2e-16 ***
age40 1.26221 0.23287 0.09979 2.333 0.01964 *
edu 1.29594 0.25924 0.05400 4.801 1.60e-06 ***
d_bpw 0.97317 -0.02720 0.00507 -5.365 8.26e-08 ***
d_skill 1.16223 0.15034 0.04514 3.330 0.00087 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Class 3 / 4 :
Odds Ratio Coefficient Std. Error t value Pr(>|t|)
(Intercept) 0.218153 -1.522557 0.442060 -3.444 0.000575 ***
atenure 0.625815 -0.468701 0.123004 -3.810 0.000139 ***
super 1.494112 0.401532 0.031909 12.584 < 2e-16 ***
sal 1.360924 0.308164 0.044526 6.921 4.72e-12 ***
minority 0.562590 -0.575205 0.081738 -7.037 2.07e-12 ***
female 0.860490 -0.150253 0.072121 -2.083 0.037242 *
age40 1.307940 0.268453 0.100376 2.674 0.007495 **
edu 1.804949 0.590532 0.054522 10.831 < 2e-16 ***
d_bpw 0.987353 -0.012727 0.004985 -2.553 0.010685 *
d_skill 1.073519 0.070942 0.045275 1.567 0.117163
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I would appreciate it if anyone let me know codes/references to address my problem. Thanks in advance.
Try using the decreasing option.
lca4_cov <- glca(fc, data = bpw, nclass = 4, seed = 1, decreasing = T)

How to do r square for glmmTMB negative binomial mixed model with zero-inflation in r

I made a zero-inflated negative binomial model with glmTMB as below
M2<- glmmTMB(psychological100~ (1|ID) + time*MNM01, data=mnmlong,
ziformula=~ (1|ID) + time*MNM01, family=nbinom2())
summary(M2)
Here is the output
Family: nbinom2 ( log )
Formula: psychological100 ~ (1 | ID) + time * MNM01
Zero inflation: ~(1 | ID) + time * MNM01
Data: mnmlong
AIC BIC logLik deviance df.resid
3507.0 3557.5 -1742.5 3485.0 714
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
ID (Intercept) 0.2862 0.535
Number of obs: 725, groups: ID, 337
Zero-inflation model:
Groups Name Variance Std.Dev.
ID (Intercept) 0.5403 0.7351
Number of obs: 725, groups: ID, 337
Overdispersion parameter for nbinom2 family (): 3.14
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.89772 0.09213 31.451 < 2e-16 ***
time -0.08724 0.01796 -4.858 1.18e-06 ***
MNM01 0.02094 0.12433 0.168 0.866
time:MNM01 -0.01193 0.02420 -0.493 0.622
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.29940 0.17298 -1.731 0.083478 .
time 0.12204 0.03338 3.656 0.000256 ***
MNM01 0.06771 0.24217 0.280 0.779790
time:MNM01 -0.02821 0.04462 -0.632 0.527282
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I wanted to know the R square of the model and tried the following 2 methods but not successful
MuMIn::r.squaredGLMM(M2)
Error in r.squaredGLMM.glmmTMB(M2) : r.squaredGLMM cannot (yet)
handle 'glmmTMB' object with zero-inflation
performance::r2_zeroinflated(M2)
Error in residuals.glmmTMB(model, type = "pearson") : pearson
residuals are not implemented for models with zero-inflation or
variable dispersion
what do you advise me?
Try with the pseudo-R^2 based on a likelihood-ratio (MuMIn::r.squaredLR). You may need to provide a null model for comparison explicitly.

Do I drop this variable from my GLM? The variable is not significant but its interaction with another is

I'm creating a GLM with quasipoisson distribution and when I do an analysis of deviance one of my variables is not significant, but its interaction with another is. It's my understanding that you include interactions when you expect a relationship between the two, so as one goes up the other will also go up.
Worked.out.vol.hours is Total Time.
AAB...BW is the organisers.
Sorry about the terrible variable names.
Call:
glm(formula = total.debris ~ Beach.Region + Volunteers..n. *
worked.out.vol.hour + Survey.Window + AAB...BW, family = quasipoisson,
data = ltype.all)
Deviance Residuals:
Min 1Q Median 3Q Max
-128.45 -22.71 -10.72 7.98 242.77
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.298e+00 4.650e-01 13.544 < 2e-16 ***
Beach.RegionNorth East 5.523e-01 1.142e-01 4.838 1.36e-06 ***
Beach.RegionNorth West 7.873e-01 1.233e-01 6.385 1.92e-10 ***
Beach.RegionNorthern Ireland 6.919e-01 1.554e-01 4.452 8.77e-06 ***
Beach.RegionScotland 6.168e-01 1.023e-01 6.030 1.80e-09 ***
Beach.RegionSouth East 7.663e-01 9.997e-02 7.665 2.27e-14 ***
Beach.RegionSouth West 8.261e-01 1.008e-01 8.196 3.38e-16 ***
Beach.RegionWales 6.714e-01 1.104e-01 6.079 1.33e-09 ***
Volunteers..n. 1.710e-02 1.235e-03 13.852 < 2e-16 ***
worked.out.vol.hour 3.579e-03 6.620e-04 5.407 6.83e-08 ***
Survey.Window2000 3.944e-01 1.893e-01 2.083 0.0373 *
Survey.Window2001 1.199e-01 1.851e-01 0.647 0.5174
Survey.Window2002 1.804e-01 1.773e-01 1.017 0.3090
Survey.Window2003 2.789e-01 1.747e-01 1.596 0.1106
Survey.Window2004 1.441e-01 1.738e-01 0.829 0.4069
Survey.Window2005 1.008e-01 1.722e-01 0.586 0.5581
Survey.Window2006 8.810e-02 1.718e-01 0.513 0.6081
Survey.Window2007 7.097e-02 1.726e-01 0.411 0.6809
AAB...BWAAB Combined -7.903e-01 6.679e-01 -1.183 0.2368
AAB...BWAdopt a Beach -6.070e-01 4.234e-01 -1.434 0.1517
AAB...BWBeachwatch Only -4.539e-01 4.227e-01 -1.074 0.2829
AAB...BWBW Combined -6.548e-01 4.863e-01 -1.347 0.1782
Volunteers..n.:worked.out.vol.hour -2.232e-05 1.586e-06 -14.071 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasipoisson family taken to be 1238.943)
Null deviance: 3637808 on 3737 degrees of freedom
Residual deviance: 2952919 on 3715 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
When I run the code to see which variables are significantanova(actmod1, test="Chisq")
Analysis of Deviance Table
Model: quasipoisson, link: log
Response: total.debris
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 3737 3637808
Beach.Region 7 141546 3730 3496262 < 2.2e-16 ***
Volunteers..n. 1 255212 3729 3241050 < 2.2e-16 ***
worked.out.vol.hour 1 1227 3728 3239823 0.3196126
Survey.Window 8 17788 3720 3222035 0.0729141 .
AAB...BW 4 27536 3716 3194499 0.0001807 ***
Volunteers..n.:worked.out.vol.hour 1 241579 3715 2952919 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
worked.out.vol.hours is not significant in the analysis of deviance, but its interaction with Volunteers..n. is, which is expected since the total hours surveyed will naturally increase with more volunteers. I, however want to keep these values separate in the model. How do I go about this issue? Do i just drop the variable altogether? Or do I keep it in because the interaction is significant?
Also, any help with how to succintly report these values would be greatly appreciated since I am quite new to this.

How to Interpret binomial GLMM results

I have a large dataset (24765 obs)
I am trying to look at how does cleaning method effect emergence success(ES).
I have several factors: beach(4 levels), cleaning method(3 levels) -->fixed
I also have a few random variables: Zone (128 levels),Year(18 years) and Index(24765)
This is an ORLE model to account for overdispersion.
My best fit model based on AIC scores is:
mod8a<-glmer(ES.test~beach+method+(1|Year)+(1|index),data=y5,weights=egg.total,family=binomial)
The summary showed:
summary(mod8a)#AIC=216732.9, same affect at every beach
Generalized linear mixed model fit by maximum likelihood (LaplaceApproximation) ['glmerMod']
Family: binomial ( logit )
Formula: ES.test ~ beach + method + (1 | Year) + (1 | index)
Data: y5
Weights: egg.total
AIC BIC logLik deviance df.resid
214834.2 214891.0 -107410.1 214820.2 24758
Scaled residuals:
Min 1Q Median 3Q Max
-1.92900 -0.09344 0.00957 0.14682 1.62327
Random effects:
Groups Name Variance Std.Dev.
index (Intercept) 1.6541 1.286
Year (Intercept) 0.6512 0.807
Number of obs: 24765, groups: index, 24765; Year, 19
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.65518 0.18646 3.514 0.000442 ***
beachHillsboro -0.06770 0.02143 -3.159 0.001583 **
beachHO/HA 0.31927 0.03716 8.591 < 2e-16 ***
methodHTL only 0.18106 0.02526 7.169 7.58e-13 ***
methodno clean 0.05989 0.03170 1.889 0.058853 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) bchHll bHO/HA mtHTLo
beachHllsbr -0.002
beachHO/HA -0.054 0.047
mthdHTLonly -0.107 -0.242 0.355
methodnclen -0.084 -0.060 0.265 0.628
What is my "intercept" (as seen above)? I am missing levels of fixed factors, is that because R could not compute it?
I tested for Overdispersion:
overdisp_fun <- function(mod8a) {
+ ## number of variance parameters in
+ ## an n-by-n variance-covariance matrix
+ vpars <- function(m) {
+ nrow(m)*(nrow(m)+1)/2
+ }
+
+ model8a.df <- sum(sapply(VarCorr(mod8a),vpars))+length(fixef(mod8a))
+ rdf <- nrow(model.frame(mod8a))-model8a.df
+ rp <- residuals(mod8a,type="pearson")
+ Pearson.chisq <- sum(rp^2)
+ prat <- Pearson.chisq/rdf
+ pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
+ c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
+ }
> overdisp_fun(mod8a)
chisq ratio rdf p
2.064765e+03 8.339790e-02 2.475800e+04 1.000000e+00
This shows the plot of mod8a
I would like to know why I am getting such a curve and what it means
Lastly I did a multicomparion analysis using multcomp
ls1<- glht(mod8a, mcp(beach = "Tukey"))$linfct
ls2 <- glht(mod8a, mcp(method= "Tukey"))$linfct
summary(glht(mod8a, linfct = rbind(ls1, ls2)))
Simultaneous Tests for General Linear Hypotheses
Fit: glmer(formula = ES.test ~ beach + method + (1 | Year) + (1 |
index), data = y5, family = binomial, weights = egg.total)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
Hillsboro - FTL/P == 0 -0.06770 0.02143 -3.159 0.00821 **
HO/HA - FTL/P == 0 0.31927 0.03716 8.591 < 0.001 ***
HO/HA - Hillsboro == 0 0.38696 0.04201 9.211 < 0.001 ***
HTL only - HTL and SB == 0 0.18106 0.02526 7.169 < 0.001 ***
no clean - HTL and SB == 0 0.05989 0.03170 1.889 0.24469
no clean - HTL only == 0 -0.12117 0.02524 -4.800 < 0.001 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)
At this point help with interpreting for analysis would help and be greatly appreciated. (Especially with that sigmoid curve for my residuals)

How to extract adjusted R squared in vars package?

This question is highly correlated with the question from this link. How to extract p-value in var package?
I just would like to take adjusted R squared from VARS package..
Even though there is a similar question, I don't have any idea to modify to take adjusted r square.. please help me.
I just followed previous example.
library(vars)
symbols=c('^N225','^FTSE','^GSPC')
getSymbols(symbols,src='yahoo', from="2003-04-28", to="2007-10-29")
period="daily"
A1=periodReturn(N225$N225.Adjusted,period=period)
B1=periodReturn(FTSE$FTSE.Adjusted,period=period)
C1=periodReturn(GSPC$GSPC.Adjusted,period=period)
datap_1<-cbind(A1,B1,C1)
datap_1<-na.omit(datap_1)
datap_1<-(datap_1)^2
vardatap_3<-VAR(datap_1,p=3,type="none")
summary(vardatap_3)
Then the summary can be presented like..
VAR Estimation Results:
=========================
Endogenous variables: N225, FTSE, SP500
Deterministic variables: none
Sample size: 1055
Log Likelihood: 23637.848
Roots of the characteristic polynomial:
0.8639 0.6224 0.6224 0.5711 0.5711 0.5471 0.5471 0.4683 0.4683
Call:
VAR(y = datap_1, p = 3, type = "none")
Estimation results for equation N225:
=====================================
N225 = N225.l1 + FTSE.l1 + SP500.l1 + N225.l2 + FTSE.l2 + SP500.l2 + N225.l3 + FTSE.l3 + SP500.l3
Estimate Std. Error t value Pr(>|t|)
N225.l1 0.03436 0.03116 1.103 0.270
FTSE.l1 0.47025 0.06633 7.089 2.48e-12 ***
SP500.l1 0.60717 0.07512 8.083 1.74e-15 ***
N225.l2 0.14938 0.03057 4.886 1.19e-06 ***
FTSE.l2 -0.05440 0.06744 -0.807 0.420
SP500.l2 -0.09024 0.07782 -1.160 0.246
N225.l3 0.16809 0.02924 5.749 1.18e-08 ***
FTSE.l3 0.04480 0.06597 0.679 0.497
SP500.l3 -0.01007 0.07941 -0.127 0.899
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0002397 on 1046 degrees of freedom
Multiple R-Squared: 0.3099, Adjusted R-squared: 0.304
F-statistic: 52.2 on 9 and 1046 DF, p-value: < 2.2e-16
Adjusted r squared values can be accessed in output of function summary() and list element varresult. varresult contains summary tables for each of daily returns.
> lapply(summary(vardatap_3)$varresult, "[", "adj.r.squared")
$daily.returns
$daily.returns$adj.r.squared
[1] 0.3039812
$daily.returns.1
$daily.returns.1$adj.r.squared
[1] 0.3201587
$daily.returns.2
$daily.returns.2$adj.r.squared
[1] 0.1972104

Resources