Post hoc for binary GLMM (lme4) and plot - r
So I'm an R novice attempting a GLMM and post hoc analysis... help! I've collected binary data on 9 damselflys under 6 light levels, 1=response to movement of optomotor drum, 0=no response. My data was imported into R with the headings 'Animal_ID, light_intensity, response'. Animal ID (1-9) repeated for each light intensity (3.36-0.61) (see below)
Using the following code (lme4 package), I've performed a GLMM and found a light level to have a significant effect on response:
d = data.frame(id = data[,1], var = data$Light_Intensity, Response = data$Response)
model <- glmer(Response~var+(1|id),family="binomial",data=d)
summary(model)
Returns
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod]
Family: binomial ( logit )
Formula: Response ~ var + (1 | Animal_ID)
Data: d
AIC BIC logLik deviance df.resid
66 72 -30 60 51
Scaled residuals:
Min 1Q Median 3Q Max
-3.7704 -0.6050 0.3276 0.5195 1.2463
Random effects:
Groups Name Variance Std.Dev.
Animal_ID (Intercept) 1.645 1.283
Number of obs: 54, groups: Animal_ID, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7406 1.0507 -1.657 0.0976 .
var 1.1114 0.4339 2.561 0.0104 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
var -0.846
Then running:
m1 <- update(model, ~.-var)
anova(model, m1, test = 'Chisq')
Returns
Data: d
Models:
m1: Response ~ (1 | Animal_ID)
model: Response ~ var + (1 | Animal_ID)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m1 2 72.555 76.533 -34.278 68.555
model 3 66.017 71.983 -30.008 60.017 8.5388 1 0.003477 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I've installed the multcomp and lsmeans packages in an attempt at performing a Tukey post hoc to see where the difference is, but have run into difficulties with both.
Running:
summary(glht(m1,linfct=mcp("Animal_ID"="Tukey")))
Returns:
"Error in mcp2matrix(model, linfct = linfct) :
Variable(s) ‘Animal_ID’ have been specified in ‘linfct’ but cannot be found in ‘model’! "
Running:
lsmeans(model,pairwise~Animal_ID,adjust="tukey")
Returns:
"Error in lsmeans.character.ref.grid(object = new("ref.grid", model.info = list( :
No variable named Animal_ID in the reference grid"
I'm aware that I'm probably being very stupid here, but any help would be very much appreciated. My confusion is snowballing.
Also, does anyone have any suggestions as to how I might best visualize my results (and how to do this)?
Thank you very much in advance!
UPDATE:
New code-
Light <- c("3.36","3.36","3.36","3.36","3.36","3.36","3.36","3.36","3.36","2.98","2.98","2.98","2.98","2.98","2.98","2.98","2.98","2.98","2.73","2.73","2.73","2.73","2.73","2.73","2.73","2.73","2.73","2.15","2.15","2.15","2.15","2.15","2.15","2.15","2.15","2.15","1.72","1.72","1.72","1.72","1.72","1.72","1.72","1.72","1.72","0.61","0.61","0.61","0.61","0.61","0.61","0.61","0.61","0.61")
Subject <- c("1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9")
Value <- c("1","0","1","0","1","1","1","0","1","1","0","1","1","1","1","1","1","1","0","1","1","1","1","1","1","0","1","0","0","1","1","1","1","1","1","1","0","0","0","1","0","0","1","0","1","0","0","0","1","1","0","1","0","0")
data <- data.frame(Light, Subject, Value)
library(lme4)
model <- glmer(Value~Light+(1|Subject),family="binomial",data=data)
summary(model)
Returns:
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
glmerMod]
Family: binomial ( logit )
Formula: Value ~ Light + (1 | Subject)
Data: data
AIC BIC logLik deviance df.resid
67.5 81.4 -26.7 53.5 47
Scaled residuals:
Min 1Q Median 3Q Max
-2.6564 -0.4884 0.2193 0.3836 1.2418
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 2.687 1.639
Number of obs: 54, groups: Subject, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.070e+00 1.053e+00 -1.016 0.3096
Light1.72 -7.934e-06 1.227e+00 0.000 1.0000
Light2.15 2.931e+00 1.438e+00 2.038 0.0416 *
Light2.73 2.931e+00 1.438e+00 2.038 0.0416 *
Light2.98 4.049e+00 1.699e+00 2.383 0.0172 *
Light3.36 2.111e+00 1.308e+00 1.613 0.1067
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Lg1.72 Lg2.15 Lg2.73 Lg2.98
Light1.72 -0.582
Light2.15 -0.595 0.426
Light2.73 -0.595 0.426 0.555
Light2.98 -0.534 0.361 0.523 0.523
Light3.36 -0.623 0.469 0.553 0.553 0.508
Then running:
m1 <- update(model, ~.-Light)
anova(model, m1, test= 'Chisq')
Returns:
Data: data
Models:
m1: Value ~ (1 | Subject)
model: Value ~ Light + (1 | Subject)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m1 2 72.555 76.533 -34.278 68.555
model 7 67.470 81.393 -26.735 53.470 15.086 5 0.01 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Finally, running:
library(lsmeans)
lsmeans(model,list(pairwise~Light),adjust="tukey")
Returns (it actually works now!):
$`lsmeans of Light`
Light lsmean SE df asymp.LCL asymp.UCL
0.61 -1.070208 1.053277 NA -3.1345922 0.9941771
1.72 -1.070216 1.053277 NA -3.1345997 0.9941687
2.15 1.860339 1.172361 NA -0.4374459 4.1581244
2.73 1.860332 1.172360 NA -0.4374511 4.1581149
2.98 2.978658 1.443987 NA 0.1484964 5.8088196
3.36 1.040537 1.050317 NA -1.0180467 3.0991215
Results are given on the logit (not the response) scale.
Confidence level used: 0.95
$`pairwise differences of contrast`
contrast estimate SE df z.ratio p.value
0.61 - 1.72 7.933829e-06 1.226607 NA 0.000 1.0000
0.61 - 2.15 -2.930547e+00 1.438239 NA -2.038 0.3209
0.61 - 2.73 -2.930539e+00 1.438237 NA -2.038 0.3209
0.61 - 2.98 -4.048866e+00 1.699175 NA -2.383 0.1622
0.61 - 3.36 -2.110745e+00 1.308395 NA -1.613 0.5897
1.72 - 2.15 -2.930555e+00 1.438239 NA -2.038 0.3209
1.72 - 2.73 -2.930547e+00 1.438238 NA -2.038 0.3209
1.72 - 2.98 -4.048874e+00 1.699175 NA -2.383 0.1622
1.72 - 3.36 -2.110753e+00 1.308395 NA -1.613 0.5897
2.15 - 2.73 7.347728e-06 1.357365 NA 0.000 1.0000
2.15 - 2.98 -1.118319e+00 1.548539 NA -0.722 0.9793
2.15 - 3.36 8.198019e-01 1.302947 NA 0.629 0.9889
2.73 - 2.98 -1.118326e+00 1.548538 NA -0.722 0.9793
2.73 - 3.36 8.197945e-01 1.302947 NA 0.629 0.9889
2.98 - 3.36 1.938121e+00 1.529202 NA 1.267 0.8029
Results are given on the log odds ratio (not the response) scale.
P value adjustment: tukey method for comparing a family of 6 estimates
Your model specifies Animal_ID as a random effect. The glht and lsmeans functions work only for fixed-effect comparisons.
Related
Is there any way to split interaction effects in a linear model up?
I have a 2x2 factorial design: control vs enriched, and strain1 vs strain2. I wanted to make a linear model, which I did as follows: anova(lmer(length ~ Strain + Insect + Strain:Insect + BW_final + (1|Pen), data = mydata)) Where length is one of the dependent variables I want to analyse, Strain and Insect as treatments, Strain:Insect as interaction effect, BW_final as covariate, and Pen as random effect. As output I get this: Sum Sq Mean Sq NumDF DenDF F value Pr(>F) Strain 3.274 3.274 1 65 0.1215 0.7285 Insect 14.452 14.452 1 65 0.5365 0.4665 BW_final 45.143 45.143 1 65 1.6757 0.2001 Strain:Insect 52.813 52.813 1 65 1.9604 0.1662 As you can see, I only get 1 interaction term: Strain:Insect. However, I'd like to see 4 interaction terms: Strain1:Control, Strain1:Enriched, Strain2:Control, Strain2:Enriched. Is there any way to do this in R? Using summary instead of anova I get: > summary(linearmer) Linear mixed model fit by REML. t-tests use Satterthwaite's method [lmerModLmerTest] Formula: length ~ Strain + Insect + Strain:Insect + BW_final + (1 | Pen) Data: mydata_young REML criterion at convergence: 424.2 Scaled residuals: Min 1Q Median 3Q Max -1.95735 -0.52107 0.07014 0.43928 2.13383 Random effects: Groups Name Variance Std.Dev. Pen (Intercept) 0.00 0.00 Residual 26.94 5.19 Number of obs: 70, groups: Pen, 27 Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 101.646129 7.530496 65.000000 13.498 <2e-16 *** StrainRoss 0.648688 1.860745 65.000000 0.349 0.729 Insect 0.822454 2.062696 65.000000 0.399 0.691 BW_final -0.005188 0.004008 65.000000 -1.294 0.200 StrainRoss:Insect -3.608430 2.577182 65.000000 -1.400 0.166 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) StrnRs Insect BW_fnl StrainRoss 0.253 Insect -0.275 0.375 BW_final -0.985 -0.378 0.169 StrnRss:Ins 0.071 -0.625 -0.775 0.016 convergence code: 0 boundary (singular) fit: see ?isSingular```
Discrepancy between summary() and anova() of linear mixed model
I have fitted a linear mixed model with split-plot design to assess the effects of water, nitrogen and phosphorus on BWC (biomass-weighted 2c-value, achieved by summing the product of each species' 2C-value(DNA content) with its biomass fraction (species subplot biomass/total subplot biomass): model1.1<-lmer(log(BWC)~W*N*P+(1|year)+(1|W:Block),data=BWC) There are two levels for W(0,1), N(0,1) and p(0,1) I would like to use boxplot to report my results with the output of the linear mixed model. However, I'm confused with the output of the linear mixed model. The estimated value (slope) for WNP in model1.1 is negative, Does that mean WNP treatment will decrease BWC comparing to control plot? But we can see the BWC was highest in boxplot under the WNP treatment. There is a discrepancy between summary() and anova(), for example, the significance for N and P effects. Estimate value for N is-4.0911 which means N addition decreased BWC But N effect was insignificant. How can I report the treatment effects like N? Many thanks for any comments. Boxplot of WNP treatment on BWC: enter image description here https://i.stack.imgur.com/cKOFt.png (Sorry for the links,it seem I need at least 10 reputations to post images) The summary() and anova() output: > summary(model1) Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest'] Formula: BWC ~ W * N * P + (1 | year) + (1 | W:Block) Data: BWC REML criterion at convergence: 2969.1 Scaled residuals: Min 1Q Median 3Q Max -2.93847 -0.71228 -0.07573 0.68191 2.92589 Random effects: Groups Name Variance Std.Dev. W:Block (Intercept) 0.9169 0.9575 year (Intercept) 0.8346 0.9136 Residual 18.2966 4.2774 Number of obs: 515, groups: W:Block, 14; year, 10 Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 10.8498 0.6985 46.5200 15.532 < 2e-16 *** W1 2.0844 0.8969 45.9613 2.324 0.02460 * N1 -4.0911 0.7364 486.0288 -5.556 4.56e-08 *** P1 -2.0460 0.7600 490.1120 -2.692 0.00734 ** W1:N1 4.6738 1.0394 485.9800 4.497 8.65e-06 *** W1:P1 0.9695 1.0687 485.9809 0.907 0.36478 N1:P1 5.7550 1.0687 485.9773 5.385 1.13e-07 *** W1:N1:P1 -3.3306 1.5100 485.9541 -2.206 0.02788 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) W1 N1 P1 W1:N1 W1:P1 N1:P1 W1 -0.645 N1 -0.531 0.414 P1 -0.515 0.401 0.488 W1:N1 0.376 -0.582 -0.708 -0.346 W1:P1 0.366 -0.566 -0.347 -0.706 0.488 N1:P1 0.366 -0.285 -0.689 -0.706 0.488 0.502 W1:N1:P1 -0.259 0.400 0.488 0.499 -0.688 -0.708 -0.708 > anova(model1) Type III Analysis of Variance Table with Satterthwaite's method Sum Sq Mean Sq NumDF DenDF F value Pr(>F) W 750.15 750.15 1 11.90 40.9995 3.519e-05 *** N 10.84 10.84 1 485.95 0.5926 0.44177 P 29.14 29.14 1 494.92 1.5926 0.20755 W:N 290.51 290.51 1 485.95 15.8778 7.793e-05 *** W:P 15.54 15.54 1 485.96 0.8493 0.35721 N:P 536.85 536.85 1 485.95 29.3415 9.562e-08 *** W:N:P 89.01 89.01 1 485.95 4.8648 0.02788 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > emmeans::emmeans(model1,pairwise~N*P*W) $emmeans N P W emmean SE df lower.CL upper.CL 0 0 0 10.85 0.699 46.9 9.44 12.26 1 0 0 6.76 0.696 46.2 5.36 8.16 0 1 0 8.80 0.721 52.1 7.36 10.25 1 1 0 10.47 0.721 52.1 9.02 11.91 0 0 1 12.93 0.696 46.2 11.53 14.33 1 0 1 13.52 0.696 46.2 12.12 14.92 0 1 1 11.86 0.721 52.1 10.41 13.30 1 1 1 14.86 0.721 52.1 13.42 16.31 Degrees-of-freedom method: kenward-roger Confidence level used: 0.95
how to calculate over dispersion in a non poisson lmer model
Hello everyone, I'm looking to calculate the over dispersion in the following model : lmer(R_ger_b~espece*traitement+(1|pop), data=d) Here "espece" means species, "traitement" means treatment and "pop" means population. My variable called R_ger_b come from a binary variable on which I took the residu out (R_ger_b) to correct this variable from an other one (that was a 0 (non sprouted), 1 (sprouted) variable). By doing the summary of this model I have this output : Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest'] Formula: R_ger_b ~ espece * traitement + (1 | pop) Data: d REML criterion at convergence: 2381.2 Scaled residuals: Min 1Q Median 3Q Max -2.0555 -0.8182 0.2951 0.6854 1.6788 Random effects: Groups Name Variance Std.Dev. pop (Intercept) 0.05762 0.24 Residual 1.12301 1.06 Number of obs: 800, groups: pop, 4 Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 0.40383 0.20010 3.20699 2.018 0.13097 especemac -0.88576 0.28298 3.20699 -3.130 0.04756 * traitementmi 0.16897 0.14987 790.00000 1.127 0.25989 traitementpeu -0.06180 0.14987 790.00000 -0.412 0.68018 traitementtemoin -0.06635 0.14987 790.00000 -0.443 0.65811 especemac:traitementmi -0.13861 0.21194 790.00000 -0.654 0.51330 especemac:traitementpeu 0.18556 0.21194 790.00000 0.876 0.38155 especemac:traitementtemoin 0.74192 0.21194 790.00000 3.501 0.00049 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) espcmc trtmntm trtmntp trtmntt espcmc:trtmntm espcmc:trtmntp especemac -0.707 traitementm -0.374 0.265 traitementp -0.374 0.265 0.500 traitmnttmn -0.374 0.265 0.500 0.500 espcmc:trtmntm 0.265 -0.374 -0.707 -0.354 -0.354 espcmc:trtmntp 0.265 -0.374 -0.354 -0.707 -0.354 0.500 espcmc:trtmntt 0.265 -0.374 -0.354 -0.354 -0.707 0.500 0.500 But I don't really know how to calculate over dispersion there, I saw a solution about this problem on a poisson lmer but not in the case I'm working on. Thank you for your help, I hope I asked my question well Germain VITAL
Different outputs using ggpredict for glmer and glmmTMB model
I am trying to predict and graph models with species presence as the response. However I've run into the following problem: the ggpredict outputs are wildly different for the same data in glmer and glmmTMB. However, the estimates and AIC are very similar. These are simplified models only including date (which has been centered and scaled), which seems to be the most problematic to predict. yntest<- glmer(MYOSOD.P~ jdate.z + I(jdate.z^2) + I(jdate.z^3) + (1|area/SiteID), family = binomial, data = sodpYN) > summary(yntest) Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod'] Family: binomial ( logit ) Formula: MYOSOD.P ~ jdate.z + I(jdate.z^2) + I(jdate.z^3) + (1 | area/SiteID) Data: sodpYN AIC BIC logLik deviance df.resid 1260.8 1295.1 -624.4 1248.8 2246 Scaled residuals: Min 1Q Median 3Q Max -2.0997 -0.3218 -0.2013 -0.1238 9.4445 Random effects: Groups Name Variance Std.Dev. SiteID:area (Intercept) 1.6452 1.2827 area (Intercept) 0.6242 0.7901 Number of obs: 2252, groups: SiteID:area, 27; area, 9 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.96778 0.39190 -7.573 3.65e-14 *** jdate.z -0.72258 0.17915 -4.033 5.50e-05 *** I(jdate.z^2) 0.10091 0.08068 1.251 0.21102 I(jdate.z^3) 0.25025 0.08506 2.942 0.00326 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) jdat.z I(.^2) jdate.z 0.078 I(jdat.z^2) -0.222 -0.154 I(jdat.z^3) -0.071 -0.910 0.199 The glmmTMB model + summary: Tyntest<- glmmTMB(MYOSOD.P ~ jdate.z + I(jdate.z^2) + I(jdate.z^3) + (1|area/SiteID), family = binomial("logit"), data = sodpYN) > summary(Tyntest) Family: binomial ( logit ) Formula: MYOSOD.P ~ jdate.z + I(jdate.z^2) + I(jdate.z^3) + (1 | area/SiteID) Data: sodpYN AIC BIC logLik deviance df.resid 1260.8 1295.1 -624.4 1248.8 2246 Random effects: Conditional model: Groups Name Variance Std.Dev. SiteID:area (Intercept) 1.6490 1.2841 area (Intercept) 0.6253 0.7908 Number of obs: 2252, groups: SiteID:area, 27; area, 9 Conditional model: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.96965 0.39638 -7.492 6.78e-14 *** jdate.z -0.72285 0.18250 -3.961 7.47e-05 *** I(jdate.z^2) 0.10096 0.08221 1.228 0.21941 I(jdate.z^3) 0.25034 0.08662 2.890 0.00385 ** --- ggpredict outputs testg<-ggpredict(yntest, terms ="jdate.z[all]") > testg # Predicted probabilities of MYOSOD.P # x = jdate.z x predicted std.error conf.low conf.high -1.95 0.046 0.532 0.017 0.120 -1.51 0.075 0.405 0.036 0.153 -1.03 0.084 0.391 0.041 0.165 -0.58 0.072 0.391 0.035 0.142 -0.14 0.054 0.390 0.026 0.109 0.35 0.039 0.399 0.018 0.082 0.79 0.034 0.404 0.016 0.072 1.72 0.067 0.471 0.028 0.152 Adjusted for: * SiteID = 0 (population-level) * area = 0 (population-level) Standard errors are on link-scale (untransformed). testgTMB<- ggpredict(Tyntest, "jdate.z[all]") > testgTMB # Predicted probabilities of MYOSOD.P # x = jdate.z x predicted std.error conf.low conf.high -1.95 0.444 0.826 0.137 0.801 -1.51 0.254 0.612 0.093 0.531 -1.03 0.136 0.464 0.059 0.280 -0.58 0.081 0.404 0.038 0.163 -0.14 0.054 0.395 0.026 0.110 0.35 0.040 0.402 0.019 0.084 0.79 0.035 0.406 0.016 0.074 1.72 0.040 0.444 0.017 0.091 Adjusted for: * SiteID = NA (population-level) * area = NA (population-level) Standard errors are on link-scale (untransformed). The estimates are completely different and I have no idea why. I did try to use both the ggeffects package from CRAN and the developer version in case that changed anything. It did not. I am using the most up to date version of glmmTMB. This is my first time asking a question here so please let me know if I should provide more information to help explain the problem. I checked and the issue is the same when using predict instead of ggpredict, which would imply that it is a glmmTMB issue? GLMER: dayplotg<-expand.grid(jdate.z=seq(min(sodp$jdate.z), max(sodp$jdate.z), length=92)) Dfitg<-predict(yntest, re.form=NA, newdata=dayplotg, type='response') dayplotg<-data.frame(dayplotg, Dfitg) head(dayplotg) > head(dayplotg) jdate.z Dfitg 1 -1.953206 0.04581691 2 -1.912873 0.04889584 3 -1.872540 0.05195598 4 -1.832207 0.05497553 5 -1.791875 0.05793307 6 -1.751542 0.06080781 glmmTMB: dayplot<-expand.grid(jdate.z=seq(min(sodp$jdate.z), max(sodp$jdate.z), length=92), SiteID=NA, area=NA) Dfit<-predict(Tyntest, newdata=dayplot, type='response') head(Dfit) dayplot<-data.frame(dayplot, Dfit) head(dayplot) > head(dayplot) jdate.z SiteID area Dfit 1 -1.953206 NA NA 0.4458236 2 -1.912873 NA NA 0.4251926 3 -1.872540 NA NA 0.4050944 4 -1.832207 NA NA 0.3855801 5 -1.791875 NA NA 0.3666922 6 -1.751542 NA NA 0.3484646
I contacted the ggpredict developer and figured out that if I used poly(jdate.z,3) rather than jdate.z + I(jdate.z^2) + I(jdate.z^3) in the glmmTMB model, the glmer and glmmTMB predictions were the same. I'll leave this post up even though I was able to answer my own question in case someone else has this question later.
How to prevent summary output of glm model from showing all levels of categorical variable
I am running a logistic regression, with Gender as the predictor. My issue is that when including "School", which has levels A-X, into the model I obtain this in the summary output: > glm.1=glm(Gender~Math.Scaled.Scores.2011+Math.Scaled.Scores.2012+Math.Scaled.Scores.2013+School, data= Ed, family=binomial) > summary(glm.1) Call: glm(formula = Gender ~ Math.Scaled.Scores.2011 + Math.Scaled.Scores.2012 + Math.Scaled.Scores.2013 + School, family = binomial, data = Ed) Deviance Residuals: Min 1Q Median 3Q Max -1.389 -1.212 1.058 1.138 1.376 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.331e-02 2.223e-01 0.150 0.8809 Math.Scaled.Scores.2011 -7.837e-04 5.401e-04 -1.451 0.1468 Math.Scaled.Scores.2012 5.279e-05 6.298e-04 0.084 0.9332 Math.Scaled.Scores.2013 9.878e-04 6.258e-04 1.579 0.1144 SchoolB 5.198e-03 2.091e-01 0.025 0.9802 SchoolC -3.341e-02 2.120e-01 -0.158 0.8748 SchoolD -6.354e-02 2.348e-01 -0.271 0.7867 SchoolE 9.032e-03 2.159e-01 0.042 0.9666 SchoolF -3.553e-01 2.322e-01 -1.530 0.1260 SchoolG -1.845e-01 2.325e-01 -0.794 0.4274 SchoolH -2.358e-01 2.308e-01 -1.022 0.3069 SchoolI 1.351e-02 2.162e-01 0.062 0.9502 SchoolJ 1.220e-01 2.395e-01 0.509 0.6105 SchoolK -3.845e-02 2.388e-01 -0.161 0.8721 SchoolL -1.637e-02 2.018e-01 -0.081 0.9354 SchoolML 1.051e-01 2.304e-01 0.456 0.6483 SchoolN 4.214e-02 2.310e-01 0.182 0.8552 SchoolO -1.764e-02 2.248e-01 -0.078 0.9374 SchoolP 3.455e-02 2.258e-01 0.153 0.8784 SchoolQ -2.496e-01 2.066e-01 -1.208 0.2270 SchoolR -4.046e-01 2.187e-01 -1.851 0.0642 . SchoolS 1.483e-02 2.139e-01 0.069 0.9447 SchoolT -2.566e-01 2.334e-01 -1.100 0.2714 SchoolU -4.166e-02 2.088e-01 -0.199 0.8419 SchoolV -4.073e-01 2.246e-01 -1.813 0.0698 . SchoolW 1.074e-03 2.203e-01 0.005 0.9961 SchoolX -1.056e-01 2.190e-01 -0.482 0.6298 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 5997.2 on 4327 degrees of freedom Residual deviance: 5971.4 on 4301 degrees of freedom AIC: 6025.4 Number of Fisher Scoring iterations: 3 It gives all the coeffiecients for each school, but I want it to be "School" in general as a whole, not schoolA-Schoolz. So it looks like I have 24 predictors of school, when I really only want 1.