Testing the difference between marginal effects calculated across factors - r

I'm trying to test the difference between two marginal effects. I can get R to calculate the effects, but I can't find any resource explaining how to test their difference.
I've looked in the margins documentations and other marginal effects packages but have not been able to find something that tests the difference.
data("mtcars")
mod<-lm(mpg~as.factor(am)*disp,data=mtcars)
(marg<-margins(model = mod,at = list(am = c("0","1"))))
at(am) disp am1
0 -0.02758 0.4518
1 -0.05904 0.4518
summary(marg)
factor am AME SE z p lower upper
am1 1.0000 0.4518 1.3915 0.3247 0.7454 -2.2755 3.1791
am1 2.0000 0.4518 1.3915 0.3247 0.7454 -2.2755 3.1791
disp 1.0000 -0.0276 0.0062 -4.4354 0.0000 -0.0398 -0.0154
disp 2.0000 -0.0590 0.0096 -6.1353 0.0000 -0.0779 -0.0402
I want to produce a test that decides whether or not the marginal effects in each row of marg are significantly different; i.e., that the slopes in the marginal effects plots are different. This appears to be true because the confidence intervals do not overlap -- indicating that the effect of displacement is different for am=0 vs am=1.
We discuss in the comments below that we can test contrasts using emmeans, but that is a test of the average response across am=0 and am=1.
emm<-emmeans(mod,~ as.factor(am)*disp)
emm
am disp emmean SE df lower.CL upper.CL
0 231 18.8 0.763 28 17.2 20.4
1 231 19.2 1.164 28 16.9 21.6
cont<-contrast(emm,list(`(0-1)`=c(1,-1)))
cont
contrast estimate SE df t.ratio p.value
(0-1) -0.452 1.39 28 -0.325 0.7479
Here the p-value is large indicating that average response when am=0 is not significantly different than when am=1.
Is it reasonable to do this (like testing the difference of two means)?
smarg<-summary(marg)
(z=as.numeric((smarg$AME[3]-smarg$AME[4])/sqrt(smarg$SE[3]^2+smarg$SE[4]^2)))
[1] 2.745
2*pnorm(-abs(z))
[1] 0.006044
This p-value appears to agree with the analysis of non overlapping confidence intervals.

If I understand your question, it can be answered using emtrends:
library(emmeans)
emt = emtrends(mod, "am", var = "disp")
emt # display the estimated slopes
## am disp.trend SE df lower.CL upper.CL
## 0 -0.0276 0.00622 28 -0.0403 -0.0148
## 1 -0.0590 0.00962 28 -0.0787 -0.0393
##
## Confidence level used: 0.95
pairs(emt) # test the difference of slopes
## contrast estimate SE df t.ratio p.value
## 0 - 1 0.0315 0.0115 28 2.745 0.0104

For the question of "Are the slopes statistically different, indicating that the effect of displacement is different for am=0 vs am=1?" question, you can get the p-value associated with the comparison directly from the summary of the lm() fit.
> summary(mod)
Call:
lm(formula = mpg ~ as.factor(am) * disp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.6056 -2.1022 -0.8681 2.2894 5.2315
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.157064 1.925053 13.068 1.94e-13 ***
as.factor(am)1 7.709073 2.502677 3.080 0.00460 **
disp -0.027584 0.006219 -4.435 0.00013 ***
as.factor(am)1:disp -0.031455 0.011457 -2.745 0.01044 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.907 on 28 degrees of freedom
Multiple R-squared: 0.7899, Adjusted R-squared: 0.7674
F-statistic: 35.09 on 3 and 28 DF, p-value: 1.27e-09
Notice that the p-value for the as.factor(am)1:disp term is 0.01044, which matches the output from pairs(emt) in Russ Lenth's answer.
(posting as an answer because insufficient reputation to post as a comment, yet)

Not sure, but probably you're looking at contrasts or pairwise comparisons of marginal effects? You can do this using the emmeans package:
library(margins)
library(emmeans)
library(magrittr)
data("mtcars")
mod <- lm(mpg ~ as.factor(am) * disp, data = mtcars)
marg <- margins(model = mod, at = list(am = c("0", "1")))
marg
#> Average marginal effects at specified values
#> lm(formula = mpg ~ as.factor(am) * disp, data = mtcars)
#> at(am) disp am1
#> 0 -0.02758 0.4518
#> 1 -0.05904 0.4518
emmeans(mod, c("am", "disp")) %>%
contrast(method = "pairwise")
#> contrast estimate SE df t.ratio p.value
#> 0,230.721875 - 1,230.721875 -0.452 1.39 28 -0.325 0.7479
emmeans(mod, c("am", "disp")) %>%
contrast()
#> contrast estimate SE df t.ratio p.value
#> 0,230.721875 effect -0.226 0.696 28 -0.325 0.7479
#> 1,230.721875 effect 0.226 0.696 28 0.325 0.7479
#>
#> P value adjustment: fdr method for 2 tests
Or simply use summary():
library(margins)
data("mtcars")
mod <- lm(mpg ~ as.factor(am) * disp, data = mtcars)
marg <- margins(model = mod, at = list(am = c("0", "1")))
marg
#> Average marginal effects at specified values
#> lm(formula = mpg ~ as.factor(am) * disp, data = mtcars)
#> at(am) disp am1
#> 0 -0.02758 0.4518
#> 1 -0.05904 0.4518
summary(marg)
#> factor am AME SE z p lower upper
#> am1 1.0000 0.4518 1.3915 0.3247 0.7454 -2.2755 3.1791
#> am1 2.0000 0.4518 1.3915 0.3247 0.7454 -2.2755 3.1791
#> disp 1.0000 -0.0276 0.0062 -4.4354 0.0000 -0.0398 -0.0154
#> disp 2.0000 -0.0590 0.0096 -6.1353 0.0000 -0.0779 -0.0402
Created on 2019-06-07 by the reprex package (v0.3.0)

Related

How do I extract variables that have a low p-value in R

I have a logistic model with plenty of interactions in R.
I want to extract only the variables and interactions that are either interactions or just predictor variables that are significant.
It's fine if I can just look at every interaction that's significant as I can still look at which non-significant fields were used to get them.
Thank you.
This is the most I have
broom::tidy(logmod)[,c("term", "estimate", "p.value")]
Here is a way. After fitting the logistic model use a logical condition to get the significant predictors and a regex (logical grep) to get the interactions. These two index vectors can be combined with &, in the case below returning no significant interactions at the alpha == 0.05 level.
fit <- glm(am ~ hp + qsec*vs, mtcars, family = binomial)
summary(fit)
#>
#> Call:
#> glm(formula = am ~ hp + qsec * vs, family = binomial, data = mtcars)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.93876 -0.09923 -0.00014 0.05351 1.33693
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 199.02697 102.43134 1.943 0.0520 .
#> hp -0.12104 0.06138 -1.972 0.0486 *
#> qsec -10.87980 5.62557 -1.934 0.0531 .
#> vs -108.34667 63.59912 -1.704 0.0885 .
#> qsec:vs 6.72944 3.85348 1.746 0.0808 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 43.230 on 31 degrees of freedom
#> Residual deviance: 12.574 on 27 degrees of freedom
#> AIC: 22.574
#>
#> Number of Fisher Scoring iterations: 8
alpha <- 0.05
pval <- summary(fit)$coefficients[,4]
sig <- pval <= alpha
intr <- grepl(":", names(coef(fit)))
coef(fit)[sig]
#> hp
#> -0.1210429
coef(fit)[sig & intr]
#> named numeric(0)
Created on 2022-09-15 with reprex v2.0.2

Is it possible to add multiple variables in the same regression model?

These are the regression models that I want to obtain. I want to select many variables at the same time to develop a multivariate model, since my data frame has 357 variables.
summary(lm(formula = bci_bci ~ bti_acp, data = qog))
summary(lm(formula = bci_bci ~ wdi_pop, data = qog))
summary(lm(formula = bci_bci ~ ffp_sl, data = qog))
Instead of listing all your variables using + signs, you can also use the shorthand notation . to add all variables in data as explanatory variables (except the target variable on the left hand side of course).
data("mtcars")
mod <- lm(mpg ~ ., data = mtcars)
summary(mod)
#>
#> Call:
#> lm(formula = mpg ~ ., data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.4506 -1.6044 -0.1196 1.2193 4.6271
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 12.30337 18.71788 0.657 0.5181
#> cyl -0.11144 1.04502 -0.107 0.9161
#> disp 0.01334 0.01786 0.747 0.4635
#> hp -0.02148 0.02177 -0.987 0.3350
#> drat 0.78711 1.63537 0.481 0.6353
#> wt -3.71530 1.89441 -1.961 0.0633 .
#> qsec 0.82104 0.73084 1.123 0.2739
#> vs 0.31776 2.10451 0.151 0.8814
#> am 2.52023 2.05665 1.225 0.2340
#> gear 0.65541 1.49326 0.439 0.6652
#> carb -0.19942 0.82875 -0.241 0.8122
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.65 on 21 degrees of freedom
#> Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
#> F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
par(mfrow=c(2,2))
plot(mod)
par(mfrow=c(1,1))
Created on 2021-12-21 by the reprex package (v2.0.1)
If you want to include all two-way interactions, the notation would be this:
lm(mpg ~ (.)^2, data = mtcars)
If you want to include all three-way interactions, the notation would be this:
lm(mpg ~ (.)^3, data = mtcars)
If you create very large models (with many variables or interactions), make sure that you also perform some model size reduction after that, e.g. using the function step(). It's very likely that not all your predictors are actually going to be informative, and many could be correlated, which causes problems in multivariate models. One way out of this could be to remove any predictors that are highly correlated to other predictors from the model.

How to add a continuous predictor in an aggregated (logistic) regression using glm in R

When performing an aggregated regression using the weights argument in glm, I can add categorical predictors to match results with a regression on individual data (ignoring differences in df), but when I add a continuous predictor the results no longer match.
e.g.,
summary(glm(am ~ as.factor(cyl) + carb,
data = mtcars,
family = binomial(link = "logit")))
##
## Call:
## glm(formula = am ~ as.factor(cyl) + carb, family = binomial(link = "logit"),
## data = mtcars)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8699 -0.5506 -0.1869 0.6185 1.9806
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.6718 1.0924 -0.615 0.53854
## as.factor(cyl)6 -3.7609 1.9072 -1.972 0.04862 *
## as.factor(cyl)8 -5.5958 1.9381 -2.887 0.00389 **
## carb 1.1144 0.5918 1.883 0.05967 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 43.230 on 31 degrees of freedom
## Residual deviance: 26.287 on 28 degrees of freedom
## AIC: 34.287
##
## Number of Fisher Scoring iterations: 5
The results above match the following:
mtcars_percent <- mtcars %>%
group_by(cyl, carb) %>%
summarise(
n = n(),
am = sum(am)/n
)
summary(glm(am ~ as.factor(cyl) + carb,
data = mtcars_percent,
family = binomial(link = "logit"),
weights = n
))
##
## Call:
## glm(formula = am ~ as.factor(cyl) + carb, family = binomial(link = "logit"),
## data = mtcars_percent, weights = n)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7 8
## 0.9179 -0.9407 -0.3772 -0.0251 0.4468 -0.3738 -0.5602 0.1789
## 9
## 0.3699
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.6718 1.0925 -0.615 0.53858
## as.factor(cyl)6 -3.7609 1.9074 -1.972 0.04865 *
## as.factor(cyl)8 -5.5958 1.9383 -2.887 0.00389 **
## carb 1.1144 0.5919 1.883 0.05971 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 19.6356 on 8 degrees of freedom
## Residual deviance: 2.6925 on 5 degrees of freedom
## AIC: 18.485
##
## Number of Fisher Scoring iterations: 5
The coefficients and standard errors above match.
However adding a continuous predictor (e.g., mpg) to this experiment produces differences. Individual data:
summary(glm(formula = am ~ as.factor(cyl) + carb + mpg,
family = binomial,
data = mtcars))
##
## Call:
## glm(formula = am ~ as.factor(cyl) + carb + mpg, family = binomial,
## data = mtcars)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8933 -0.4595 -0.1293 0.1475 1.6969
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -18.3024 9.3442 -1.959 0.0501 .
## as.factor(cyl)6 -1.8594 2.5963 -0.716 0.4739
## as.factor(cyl)8 -0.3029 2.8828 -0.105 0.9163
## carb 1.6959 0.9918 1.710 0.0873 .
## mpg 0.6771 0.3645 1.858 0.0632 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 43.230 on 31 degrees of freedom
## Residual deviance: 18.467 on 27 degrees of freedom
## AIC: 28.467
##
## Number of Fisher Scoring iterations: 6
And now aggregating:
mtcars_percent <- mtcars %>%
group_by(cyl, carb) %>%
summarise(
n = n(),
am = sum(am)/n,
mpg = mean(mpg)
)
# A tibble: 9 x 5
# Groups: cyl [3]
cyl carb n am mpg
<dbl> <dbl> <int> <dbl> <dbl>
1 4 1 5 0.8 27.6
2 4 2 6 0.667 25.9
3 6 1 2 0 19.8
4 6 4 4 0.5 19.8
5 6 6 1 1 19.7
6 8 2 4 0 17.2
7 8 3 3 0 16.3
8 8 4 6 0.167 13.2
9 8 8 1 1 15
glm(formula = am ~ as.factor(cyl) + carb + mpg,
family = binomial,
data = mtcars_percent,
weights = n
) %>%
summary()
##
## Call:
## glm(formula = am ~ as.factor(cyl) + carb + mpg, family = binomial,
## data = mtcars_percent, weights = n)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7 8
## 0.75845 -0.73755 -0.24505 -0.02649 0.34041 -0.50528 -0.74002 0.46178
## 9
## 0.17387
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -11.3593 19.9611 -0.569 0.569
## as.factor(cyl)6 -1.7932 3.7491 -0.478 0.632
## as.factor(cyl)8 -1.4419 7.3124 -0.197 0.844
## carb 1.4059 1.0718 1.312 0.190
## mpg 0.3825 0.7014 0.545 0.585
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 19.6356 on 8 degrees of freedom
## Residual deviance: 2.3423 on 4 degrees of freedom
## AIC: 20.134
##
## Number of Fisher Scoring iterations: 6
Coefficients, standard errors and p-values are now different, and I would like to understand why and what can be done to match the individual data model?
In the help section of glm(), it states "weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations."
I take that to mean I can calculate the mean(mpg) for each grouping factor as I've done and the regression should work. Obviously I am misunderstanding something...
Thanks for your help

Different independent variables and table from summary-values

I have problem that I have been trying to solve for a couple of hours now but I simply can't figure it out (I'm new to R btw..).
Basically, what I'm trying to do (using mtcars to illustrate) is to make R test different independent variables (while adjusting for "cyl" and "disp") for the same independent variable ("mpg"). The best soloution I have been able to come up with is:
lm <- lapply(mtcars[,4:6], function(x) lm(mpg ~ cyl + disp + x, data = mtcars))
summary <- lapply(lm, summary)
... where 4:6 corresponds to columns "hp", "drat" and "wt".
This acutually works OK but the problem is that the summary appers with an "x" instead of for instace "hp":
$hp
Call:
lm(formula = mpg ~ cyl + disp + x, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.0889 -2.0845 -0.7745 1.3972 6.9183
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.18492 2.59078 13.195 1.54e-13 ***
cyl -1.22742 0.79728 -1.540 0.1349
disp -0.01884 0.01040 -1.811 0.0809 .
x -0.01468 0.01465 -1.002 0.3250
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.055 on 28 degrees of freedom
Multiple R-squared: 0.7679, Adjusted R-squared: 0.743
F-statistic: 30.88 on 3 and 28 DF, p-value: 5.054e-09
Questions:
Is there a way to fix this? And have I done this in the smartest way using lapply, or would it be better to use for instance for loops or other options?
Ideally, I would also very much like to make a table showing for instance only the estimae and P-value for each dependent variable. Can this somehow be done?
Best regards
One approach to get the name of the variable displayed in the summary is by looping over the names of the variables and setting up the formula using paste and as.formula:
lm <- lapply(names(mtcars)[4:6], function(x) {
formula <- as.formula(paste0("mpg ~ cyl + disp + ", x))
lm(formula, data = mtcars)
})
summary <- lapply(lm, summary)
summary
#> [[1]]
#>
#> Call:
#> lm(formula = formula, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4.0889 -2.0845 -0.7745 1.3972 6.9183
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 34.18492 2.59078 13.195 1.54e-13 ***
#> cyl -1.22742 0.79728 -1.540 0.1349
#> disp -0.01884 0.01040 -1.811 0.0809 .
#> hp -0.01468 0.01465 -1.002 0.3250
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.055 on 28 degrees of freedom
#> Multiple R-squared: 0.7679, Adjusted R-squared: 0.743
#> F-statistic: 30.88 on 3 and 28 DF, p-value: 5.054e-09
Concerning the second part of your question. One way to achieve this by making use of broom::tidy from the broom package which gives you a summary of regression results as a tidy dataframe:
lapply(lm, broom::tidy)
#> [[1]]
#> # A tibble: 4 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 34.2 2.59 13.2 1.54e-13
#> 2 cyl -1.23 0.797 -1.54 1.35e- 1
#> 3 disp -0.0188 0.0104 -1.81 8.09e- 2
#> 4 hp -0.0147 0.0147 -1.00 3.25e- 1
We could use reformulate to create the formula for the lm
lst1 <- lapply(names(mtcars)[4:6], function(x) {
fmla <- reformulate(c("cyl", "disp", x),
response = "mpg")
model <- lm(fmla, data = mtcars)
model$call <- deparse(fmla)
model
})
Then, get the summary
summary1 <- lapply(lst1, summary)
summary1[[1]]
#Call:
#"mpg ~ cyl + disp + hp"
#Residuals:
# Min 1Q Median 3Q Max
#-4.0889 -2.0845 -0.7745 1.3972 6.9183
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 34.18492 2.59078 13.195 1.54e-13 ***
#cyl -1.22742 0.79728 -1.540 0.1349
#disp -0.01884 0.01040 -1.811 0.0809 .
#hp -0.01468 0.01465 -1.002 0.3250
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 3.055 on 28 degrees of freedom
#Multiple R-squared: 0.7679, Adjusted R-squared: 0.743
#F-statistic: 30.88 on 3 and 28 DF, p-value: 5.054e-09

R logistic regression and marginal effects - how to exclude NA values in categorical independent variable

I am a beginner with R. I am using glm to conduct logistic regression and then using the 'margins' package to calculate marginal effects but I don't seem to be able to exclude the missing values in my categorical independent variable.
I have tried to ask R to exclude NAs from the regression. The categorical variable is weight status at age 9 (wgt9), and it has three levels (1, 2, 3) and some NAs.
What am I doing wrong? Why do I get a wgt9NA result in my outputs and how can I correct it?
Thanks in advance for any help/advice.
Conduct logistic regression
summary(logit.phbehav <- glm(obese13 ~ gender + as.factor(wgt9) + aded08b,
data = gui, weights = bdwg01, family = binomial(link = "logit")))
Regression output
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -3.99 0.293 -13.6 2.86e- 42
2 gender 0.387 0.121 3.19 1.42e- 3
3 as.factor(wgt9)2 2.49 0.177 14.1 3.28e- 45
4 as.factor(wgt9)3 4.65 0.182 25.6 4.81e-144
5 as.factor(wgt9)NA 2.60 0.234 11.1 9.94e- 29
6 aded08b -0.0755 0.0224 -3.37 7.47e- 4
Calculate the marginal effects
effects_logit_phtotal = margins(logit.phtot)
print(effects_logit_phtotal)
summary(effects_logit_phtotal)
Marginal effects output
> summary(effects_logit_phtotal)
factor AME SE z p lower upper
aded08a -0.0012 0.0002 -4.8785 0.0000 -0.0017 -0.0007
gender 0.0115 0.0048 2.3899 0.0169 0.0021 0.0210
wgt92 0.0941 0.0086 10.9618 0.0000 0.0773 0.1109
wgt93 0.4708 0.0255 18.4569 0.0000 0.4208 0.5207
wgt9NA 0.1027 0.0179 5.7531 0.0000 0.0677 0.1377
First of all welcome to stack overflow. Please check the answer here to see how to make a great R question. Not providing a sample of your data, some times makes it impossible to answer the question. However taking a guess, I think that you have not set your NA values correctly but as strings. This behavior can be seen in the dummy data below.
First let's create the dummy data:
v1 <- c(2,3,3,3,2,2,2,2,NA,NA,NA)
v2 <- c(2,3,3,3,2,2,2,2,"NA","NA","NA")
v3 <- c(11,5,6,7,10,8,7,6,2,5,3)
obese <- c(0,1,1,0,0,1,1,1,0,0,0)
df <- data.frame(obese,v1,v2)
Using the variable named v1, does not include NA as a category:
glm(formula = obese ~ as.factor(v1) + v3, family = binomial(link = "logit"),
data = df)
Deviance Residuals:
1 2 3 4 5 6 7 8
-2.110e-08 2.110e-08 1.168e-05 -1.105e-05 -2.110e-08 3.094e-06 2.110e-08 2.110e-08
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 401.48 898581.15 0 1
as.factor(v1)3 -96.51 326132.30 0 1
v3 -46.93 106842.02 0 1
While making the string "NA" to factor gives an output similar to the one in question:
glm(formula = obese ~ as.factor(v2) + v3, family = binomial(link = "logit"),
data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.402e-05 -2.110e-08 -2.110e-08 2.110e-08 1.472e-05
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 394.21 744490.08 0.001 1
as.factor(v2)3 -95.33 340427.26 0.000 1
as.factor(v2)NA -327.07 613934.84 -0.001 1
v3 -45.99 84477.60 -0.001 1
Try the following to replace NAs that are strings:
gui$wgt9[ gui$wgt9 == "NA" ] <- NA
Don't forget to accept any answer that solved your problem.

Resources