I have ran a quasipoisson GLM with the following code:
Output3 <- glm(GCN ~ DHSI + N + P, PondsTask2, family = quasipoisson(link = "log"))
and received this output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.69713 0.56293 -3.015 0.00272 **
DHSI 3.44795 0.74749 4.613 0.00000519 ***
N -0.59648 0.36357 -1.641 0.10157
P -0.01964 0.37419 -0.052 0.95816
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
With the DHSI being statistically significant, but the other two variables not being significant. How do I go about dropping variables until I have the minimum model?
Related
I'm using 'gamlss' from the package 'gamlss' (version 5.4-1) in R for a generalized additive model for location scale and shape.
My model looks like this
propvoc3 = gamlss(proporcion.voc ~ familiaridad * proporcion)
When I want to see the Anova table I get this output
Mu link function: identity
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.625e-01 9.476e-02 5.936 1.9e-06 ***
familiaridaddesconocido -1.094e-01 1.059e-01 -1.032 0.31042
proporcionmayor 4.375e-01 1.340e-01 3.265 0.00281 **
proporcionmenor 1.822e-17 1.340e-01 0.000 1.00000
familiaridaddesconocido:proporcionmayor -3.281e-01 1.708e-01 -1.921 0.06464 .
familiaridaddesconocido:proporcionmenor 5.469e-01 1.708e-01 3.201 0.00331 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
------------------------------------------------------------------
I just want to know if there is a way to get the values just by variable and not by every term?
I'm using the glca package to run a latent class analysis. I want to see how covariates (other than indicators used to construct latent classes) affect the probability of class assignment. I understand this is a multinomial logistic regression, and thus, my question is, is there a way I can change the base reference latent class? For example, my model is currently a 4-class model, and the output shows the effect of covariates on class prevalence with respect to Class-4 (base category) as default. I want to change this base category to, for example, Class-2.
My code is as follows
fc <- item(intrst, respect, expert, inclu, contbt,secure,pay,bonus, benft, innov, learn, rspons, promote, wlb, flex) ~ atenure+super+sal+minority+female+age40+edu+d_bpw+d_skill
lca4_cov <- glca(fc, data = bpw, nclass = 4, seed = 1)
and I get the following output.
> coef(lca4_cov)
Class 1 / 4 :
Odds Ratio Coefficient Std. Error t value Pr(>|t|)
(Intercept) 1.507537 0.410477 0.356744 1.151 0.24991
atenure 0.790824 -0.234679 0.102322 -2.294 0.02183 *
super 1.191961 0.175600 0.028377 6.188 6.29e-10 ***
sal 0.937025 -0.065045 0.035490 -1.833 0.06686 .
minority 2.002172 0.694233 0.060412 11.492 < 2e-16 ***
female 1.210653 0.191160 0.059345 3.221 0.00128 **
age40 1.443603 0.367142 0.081002 4.533 5.89e-06 ***
edu 1.069771 0.067444 0.042374 1.592 0.11149
d_bpw 0.981104 -0.019077 0.004169 -4.576 4.78e-06 ***
d_skill 1.172218 0.158898 0.036155 4.395 1.12e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Class 2 / 4 :
Odds Ratio Coefficient Std. Error t value Pr(>|t|)
(Intercept) 3.25282 1.17952 0.43949 2.684 0.00729 **
atenure 0.95131 -0.04992 0.12921 -0.386 0.69926
super 1.16835 0.15559 0.03381 4.602 4.22e-06 ***
sal 1.01261 0.01253 0.04373 0.287 0.77450
minority 0.72989 -0.31487 0.08012 -3.930 8.55e-05 ***
female 0.45397 -0.78971 0.07759 -10.178 < 2e-16 ***
age40 1.26221 0.23287 0.09979 2.333 0.01964 *
edu 1.29594 0.25924 0.05400 4.801 1.60e-06 ***
d_bpw 0.97317 -0.02720 0.00507 -5.365 8.26e-08 ***
d_skill 1.16223 0.15034 0.04514 3.330 0.00087 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Class 3 / 4 :
Odds Ratio Coefficient Std. Error t value Pr(>|t|)
(Intercept) 0.218153 -1.522557 0.442060 -3.444 0.000575 ***
atenure 0.625815 -0.468701 0.123004 -3.810 0.000139 ***
super 1.494112 0.401532 0.031909 12.584 < 2e-16 ***
sal 1.360924 0.308164 0.044526 6.921 4.72e-12 ***
minority 0.562590 -0.575205 0.081738 -7.037 2.07e-12 ***
female 0.860490 -0.150253 0.072121 -2.083 0.037242 *
age40 1.307940 0.268453 0.100376 2.674 0.007495 **
edu 1.804949 0.590532 0.054522 10.831 < 2e-16 ***
d_bpw 0.987353 -0.012727 0.004985 -2.553 0.010685 *
d_skill 1.073519 0.070942 0.045275 1.567 0.117163
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I would appreciate it if anyone let me know codes/references to address my problem. Thanks in advance.
Try using the decreasing option.
lca4_cov <- glca(fc, data = bpw, nclass = 4, seed = 1, decreasing = T)
I was working on my GAM model:
y <- c(0.0000943615,0.0074918919,0.0157332851,0.0783308615,
0.1546375803,0.5558444681,0.8583806898,0.9617216854,
0.9848004112,0.9964662546)
x <- log(c(0.05, 0.1, 0.15, 0.2, 0.4, 0.8, 1.6, 3.2, 4.5, 6.4))
fit.gam <- mgcv::gam(y ~ s(x,k=-1, bs="cr"))
summary(fit.gam)
coef(fit.gam)
The model summary tells me that the edf of s(x) is 6.893 with p-value = 0.0017:
> summary(fit.gam)
Family: gaussian
Link function: identity
Formula:
y ~ s(x, k = -1, bs = "cr")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.46135 0.00629 73.34 0.000126 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 6.893 7.902 585.7 0.0017 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.998 Deviance explained = 100%
GCV = 0.0018783 Scale est. = 0.0003957 n = 10
The coefficients of my model still contain 9 coefficients for s(x).
> coef(fit.gam)
(Intercept) s(x).1 s(x).2 s(x).3 s(x).4 s(x).5 s(x).6
0.4613501 -0.3450787 -0.3229509 -0.2895761 -0.1783854 0.1976228 0.5040469
s(x).7 s(x).8 s(x).9
0.6135856 0.6338979 0.6470116
My question is I understand that GAM penalized the variable x in some extent so that the estimated degree of freedom of x = 6.893 < 9, but from the coefficients of s(x), it is hard for me to tell which basis is penalized. How should I understand the relationship between edf and coefficients of s(x)? Thanks!
I made a zero-inflated negative binomial model with glmTMB as below
M2<- glmmTMB(psychological100~ (1|ID) + time*MNM01, data=mnmlong,
ziformula=~ (1|ID) + time*MNM01, family=nbinom2())
summary(M2)
Here is the output
Family: nbinom2 ( log )
Formula: psychological100 ~ (1 | ID) + time * MNM01
Zero inflation: ~(1 | ID) + time * MNM01
Data: mnmlong
AIC BIC logLik deviance df.resid
3507.0 3557.5 -1742.5 3485.0 714
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
ID (Intercept) 0.2862 0.535
Number of obs: 725, groups: ID, 337
Zero-inflation model:
Groups Name Variance Std.Dev.
ID (Intercept) 0.5403 0.7351
Number of obs: 725, groups: ID, 337
Overdispersion parameter for nbinom2 family (): 3.14
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.89772 0.09213 31.451 < 2e-16 ***
time -0.08724 0.01796 -4.858 1.18e-06 ***
MNM01 0.02094 0.12433 0.168 0.866
time:MNM01 -0.01193 0.02420 -0.493 0.622
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.29940 0.17298 -1.731 0.083478 .
time 0.12204 0.03338 3.656 0.000256 ***
MNM01 0.06771 0.24217 0.280 0.779790
time:MNM01 -0.02821 0.04462 -0.632 0.527282
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I wanted to know the R square of the model and tried the following 2 methods but not successful
MuMIn::r.squaredGLMM(M2)
Error in r.squaredGLMM.glmmTMB(M2) : r.squaredGLMM cannot (yet)
handle 'glmmTMB' object with zero-inflation
performance::r2_zeroinflated(M2)
Error in residuals.glmmTMB(model, type = "pearson") : pearson
residuals are not implemented for models with zero-inflation or
variable dispersion
what do you advise me?
Try with the pseudo-R^2 based on a likelihood-ratio (MuMIn::r.squaredLR). You may need to provide a null model for comparison explicitly.
I produced the below graph using ggplot2.
PlotEchi = ggplot(data=Echinoidea,
aes(x=Year, y=mean, group = aspect, linetype = aspect, shape=aspect)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.025, position=pd) +
geom_point(position=pd, size=2) +
geom_smooth(method = "gam", formula = y~s(x, k=3), se=F, size = 0.5,colour="black") +
xlab("") +
ylab("Abundance (mean +/- SE)") +
facet_wrap(~ species, scales = "free", ncol=1) +
scale_y_continuous(limits=c(min(y=0), max(Echinoidea$mean+Echinoidea$se))) +
scale_x_continuous(limits=c(min(Echinoidea$Year-0.125), max(Echinoidea$Year+0.125)))
What I would like to do is easily retrieve the adjusted R-square for each of the fitted lines without doing an individual mgcv::gam for each plotted line using model<-gam(df, formula = y~s(x1)....). Any ideas?
This is not really possible, because ggplot2 throws away the fitted object. You can see this in the source here.
1. Solving the problem by patching ggplot2
One ugly workaround is to patch the ggplot2 code on the fly to print out the results. You can do this as follows. The initial assignment throws an error but things work anyways. To undo this just restart your R session.
library(ggplot2)
# assignInNamespace patches `predictdf.glm` from ggplot2 and adds
# a line that prints the summary of the model. For some reason, this
# creates an error, but things work nonetheless.
assignInNamespace("predictdf.glm", function(model, xseq, se, level) {
pred <- stats::predict(model, newdata = data.frame(x = xseq), se.fit = se,
type = "link")
print(summary(model)) # this is the line I added
if (se) {
std <- stats::qnorm(level / 2 + 0.5)
data.frame(
x = xseq,
y = model$family$linkinv(as.vector(pred$fit)),
ymin = model$family$linkinv(as.vector(pred$fit - std * pred$se.fit)),
ymax = model$family$linkinv(as.vector(pred$fit + std * pred$se.fit)),
se = as.vector(pred$se.fit)
)
} else {
data.frame(x = xseq, y = model$family$linkinv(as.vector(pred)))
}
}, "ggplot2")
Now we can make a plot with the patched ggplot2:
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point() + geom_smooth(se = F, method = "gam", formula = y ~ s(x, bs = "cs"))
Console output:
Family: gaussian
Link function: identity
Formula:
y ~ s(x, bs = "cs")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.4280 0.0365 93.91 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 1.546 9 5.947 5.64e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.536 Deviance explained = 55.1%
GCV = 0.070196 Scale est. = 0.066622 n = 50
Family: gaussian
Link function: identity
Formula:
y ~ s(x, bs = "cs")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.77000 0.03797 72.96 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 1.564 9 1.961 8.42e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.268 Deviance explained = 29.1%
GCV = 0.075969 Scale est. = 0.072074 n = 50
Family: gaussian
Link function: identity
Formula:
y ~ s(x, bs = "cs")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.97400 0.04102 72.5 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 1.279 9 1.229 0.001 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.191 Deviance explained = 21.2%
GCV = 0.088147 Scale est. = 0.08413 n = 50
Note: I do not recommend this approach.
2. Solving the problem by fitting models via tidyverse
I think it's better to just run your models separately. Doing so is quite easy with tidyverse and broom, so I'm not sure why you wouldn't want to do it.
library(tidyverse)
library(broom)
iris %>% nest(-Species) %>%
mutate(fit = map(data, ~mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data = .)),
results = map(fit, glance),
R.square = map_dbl(fit, ~ summary(.)$r.sq)) %>%
unnest(results) %>%
select(-data, -fit)
# Species R.square df logLik AIC BIC deviance df.residual
# 1 setosa 0.5363514 2.546009 -1.922197 10.93641 17.71646 3.161460 47.45399
# 2 versicolor 0.2680611 2.563623 -3.879391 14.88603 21.69976 3.418909 47.43638
# 3 virginica 0.1910916 2.278569 -7.895997 22.34913 28.61783 4.014793 47.72143
As you can see, the extracted R squared values are exactly the same in both cases.