I am trying to write a .csv file that appends the important information from the summary of a glmer analysis (from the package lme4).
I have been able to isolate the coefficients, AIC, and random effects , but I have not been able to isolate the scaled residuals (Min, 1Q, Median, 3Q, Max).
I have tried using $residuals, but I get a very long output, not the information shown in the summary.
> library(lme4)
> setwd("C:/Users/Arthur Scully/Dropbox/! ! ! ! PHD/Chapter 2 Lynx Bobcat BC/ResourceSelection")
> #simple vectors
>
> x <- c("a","b","b","b","b","d","b","c","c","a")
>
> y <- c(1,1,0,1,0,1,1,1,1,0)
>
>
> # Simple data frame
>
> aes.samp <- data.frame(x,y)
> aes.samp
x y
1 a 1
2 b 1
3 b 0
4 b 1
5 b 0
6 d 1
7 b 1
8 c 1
9 c 1
10 a 0
>
> # Simple glmer
>
> aes.glmer <- glmer(y~(1|x),aes.samp,family ="binomial")
boundary (singular) fit: see ?isSingular
>
> summary(aes.glmer)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: y ~ (1 | x)
Data: aes.samp
AIC BIC logLik deviance df.resid
16.2 16.8 -6.1 12.2 8
I can isolate information above by using the call summary(aes.glmer)$AIC
Scaled residuals:
Min 1Q Median 3Q Max
-1.5275 -0.9820 0.6546 0.6546 0.6546
I do not know the call to isolate the above information
Random effects:
Groups Name Variance Std.Dev.
x (Intercept) 0 0
Number of obs: 10, groups: x, 4
I can isolate this information using the ranef function
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8473 0.6901 1.228 0.22
And I can isolate the information above using summary(aes.glmer)$coefficient
convergence code: 0
boundary (singular) fit: see ?isSingular
>
> #Pull important
> ##write call to select important output
> aes.glmer.coef <- summary(aes.glmer)$coefficient
> aes.glmer.AIC <- summary(aes.glmer)$AIC
> aes.glmer.ran <-ranef(aes.glmer)
>
> ##
> data.frame(c(aes.glmer.coef, aes.glmer.AIC, aes.glmer.ran))
X0.847297859077025 X0.690065555425105 X1.22785125618255 X0.219502810378876 AIC BIC logLik deviance df.resid X.Intercept.
a 0.8472979 0.6900656 1.227851 0.2195028 16.21729 16.82246 -6.108643 12.21729 8 0
b 0.8472979 0.6900656 1.227851 0.2195028 16.21729 16.82246 -6.108643 12.21729 8 0
c 0.8472979 0.6900656 1.227851 0.2195028 16.21729 16.82246 -6.108643 12.21729 8 0
d 0.8472979 0.6900656 1.227851 0.2195028 16.21729 16.82246 -6.108643 12.21729 8 0
If anyone knows what call I can use to isolate the "scaled residuals" I would be very greatful.
I haven't got your data, so we'll use example data from the lme4 vignette.
library(lme4)
library(lattice)
library(broom)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
This is for the residuals. tidy from the broom package puts it in to a tibble, which you can then export to a csv.
x <- tidy(quantile(residuals(gm1, "pearson", scaled = TRUE)))
x
# A tibble: 5 x 2
names x
<chr> <dbl>
1 0% -2.38
2 25% -0.789
3 50% -0.203
4 75% 0.514
5 100% 2.88
Also here are some of the other bits that you might find useful, using glance from broom.
y <- glance(gm1)
y
# A tibble: 1 x 6
sigma logLik AIC BIC deviance df.residual
<dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 1 -92.0 194. 204. 73.5 51
And
z <- tidy(gm1)
z
# A tibble: 5 x 6
term estimate std.error statistic p.value group
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 (Intercept) -1.40 0.231 -6.05 1.47e-9 fixed
2 period2 -0.992 0.303 -3.27 1.07e-3 fixed
3 period3 -1.13 0.323 -3.49 4.74e-4 fixed
4 period4 -1.58 0.422 -3.74 1.82e-4 fixed
5 sd_(Intercept).herd 0.642 NA NA NA herd
Related
I'm trying to use ggeffects::ggpredict to make some effects plots for my model. I find that the standard errors and confidence limits are missing for many of the results. I can reproduce the problem with some simulated data. It seems specifically for observations where the standard error puts the predicted probability close to 0 or 1.
I tried to get predictions on the link scale to diagnose if it's a problem with the translation from link to response, but I don't believe this is supported by the package.
Any ideas how to address this? Many thanks.
library(tidyverse)
library(lme4)
library(ggeffects)
# number of simulated observations
n <- 1000
# simulated data with a numerical predictor x, factor predictor f, response y
# the simulated effects of x and f are somewhat weak compared to the noise, so expect high standard errors
df <- tibble(
x = seq(-0.1, 0.1, length.out = n),
g = floor(runif(n) * 3),
f = letters[1 + g] %>% as.factor(),
y = pracma::sigmoid(x + (runif(n) - 0.5) + 0.1 * (g - mean(g))),
z = if_else(y > 0.5, "high", "low") %>% as.factor()
)
# glmer model
model <- glmer(z ~ x + (1 | f), data = df, family = binomial)
print(summary(model))
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#> Approximation) [glmerMod]
#> Family: binomial ( logit )
#> Formula: z ~ x + (1 | f)
#> Data: df
#>
#> AIC BIC logLik deviance df.resid
#> 1373.0 1387.8 -683.5 1367.0 997
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -1.3858 -0.9928 0.7317 0.9534 1.3600
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> f (Intercept) 0.0337 0.1836
#> Number of obs: 1000, groups: f, 3
#>
#> Fixed effects:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.02737 0.12380 0.221 0.825
#> x -4.48012 1.12066 -3.998 6.39e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> x -0.001
# missing standard errors
ggpredict(model, c("x", "f")) %>% print()
#> Data were 'prettified'. Consider using `terms="x [all]"` to get smooth plots.
#> # Predicted probabilities of z
#>
#> # f = a
#>
#> x | Predicted | 95% CI
#> --------------------------------
#> -0.10 | 0.62 | [0.54, 0.69]
#> 0.00 | 0.51 |
#> 0.10 | 0.40 |
#>
#> # f = b
#>
#> x | Predicted | 95% CI
#> --------------------------------
#> -0.10 | 0.62 | [0.56, 0.67]
#> 0.00 | 0.51 |
#> 0.10 | 0.40 |
#>
#> # f = c
#>
#> x | Predicted | 95% CI
#> --------------------------------
#> -0.10 | 0.62 | [0.54, 0.69]
#> 0.00 | 0.51 |
#> 0.10 | 0.40 |
ggpredict(model, c("x", "f")) %>% as_tibble() %>% print(n = 20)
#> Data were 'prettified'. Consider using `terms="x [all]"` to get smooth plots.
#> # A tibble: 9 x 6
#> x predicted std.error conf.low conf.high group
#> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 -0.1 0.617 0.167 0.537 0.691 a
#> 2 -0.1 0.617 0.124 0.558 0.672 b
#> 3 -0.1 0.617 0.167 0.537 0.691 c
#> 4 0 0.507 NA NA NA a
#> 5 0 0.507 NA NA NA b
#> 6 0 0.507 NA NA NA c
#> 7 0.1 0.396 NA NA NA a
#> 8 0.1 0.396 NA NA NA b
#> 9 0.1 0.396 NA NA NA c
Created on 2022-04-12 by the reprex package (v2.0.1)
I think this may be due to the singular model fit.
I dug down into the guts of the code as far as here, where there appears to be a mismatch between the dimensions of the covariance matrix of the predictions (3x3) and the number of predicted values (15).
I further suspect that the problem may happen here:
rows_to_keep <- as.numeric(rownames(unique(model_matrix_data[
intersect(colnames(model_matrix_data), terms)])))
Perhaps the function is getting confused because the conditional modes/BLUPs for every group are the same (which will only be true, generically, when the random effects variance is zero) ... ?
This seems worth opening an issue on the ggeffects issues list ?
I don't have any 'treatment' except the passage of time (date), and 10 times points. I have a total of 43190 measurements, they are continuous binomial data (0.0 to 1.0) of the percentual response variable (canopycov). In glm logic, this is a quasibinomial case, but I find just only glmmPQL in MASS package for use, but the model is not OK and I have NA for p-values in all the dates. In my case, I try:
#Packages
library(MASS)
# Dataset
ds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/pred_attack_F.csv")
str(ds)
# 'data.frame': 43190 obs. of 3 variables:
# $ date : chr "2021-12-06" "2021-12-06" "2021-12-06" "2021-12-06" ...
# $ canopycov: int 22 24 24 24 25 25 25 25 26 26 ...
# $ rep : chr "r1" "r1" "r1" "r1" ...
# Binomial Generalized Linear Mixed Models
m.1 <- glmmPQL(canopycov/100~date,random=~1|date,
family="quasibinomial",data=ds)
summary(m.1)
#Linear mixed-effects model fit by maximum likelihood
# Data: ds
# AIC BIC logLik
# NA NA NA
# Random effects:
# Formula: ~1 | date
# (Intercept) Residual
# StdDev: 1.251838e-06 0.1443305
# Variance function:
# Structure: fixed weights
# Formula: ~invwt
# Fixed effects: canopycov/100 ~ date
# Value Std.Error DF t-value p-value
# (Intercept) -0.5955403 0.004589042 43180 -129.77442 0
# date2021-06-14 -0.1249648 0.006555217 0 -19.06341 NaN
# date2021-07-09 0.7661870 0.006363749 0 120.39868 NaN
# date2021-07-24 1.0582366 0.006434893 0 164.45286 NaN
# date2021-08-03 1.0509474 0.006432295 0 163.38607 NaN
# date2021-08-08 1.0794612 0.006442704 0 167.54784 NaN
# date2021-09-02 0.9312346 0.006395722 0 145.60274 NaN
# date2021-09-07 0.9236196 0.006393780 0 144.45595 NaN
# date2021-09-22 0.7268144 0.006359224 0 114.29293 NaN
# date2021-12-06 1.3109809 0.006552314 0 200.07907 NaN
# Correlation:
# (Intr) d2021-06 d2021-07-0 d2021-07-2 d2021-08-03 d2021-08-08
# date2021-06-14 -0.700
# date2021-07-09 -0.721 0.505
# date2021-07-24 -0.713 0.499 0.514
# date2021-08-03 -0.713 0.499 0.514 0.509
# date2021-08-08 -0.712 0.499 0.514 0.508 0.508
# date2021-09-02 -0.718 0.502 0.517 0.512 0.512 0.511
# date2021-09-07 -0.718 0.502 0.518 0.512 0.512 0.511
# date2021-09-22 -0.722 0.505 0.520 0.515 0.515 0.514
# date2021-12-06 -0.700 0.490 0.505 0.499 0.500 0.499
# d2021-09-02 d2021-09-07 d2021-09-2
# date2021-06-14
# date2021-07-09
# date2021-07-24
# date2021-08-03
# date2021-08-08
# date2021-09-02
# date2021-09-07 0.515
# date2021-09-22 0.518 0.518
# date2021-12-06 0.503 0.503 0.505
# Standardized Within-Group Residuals:
# Min Q1 Med Q3 Max
# -6.66259139 -0.47887669 0.09634211 0.54135914 4.32231889
# Number of Observations: 43190
# Number of Groups: 10
I'd like to correctly specify that my data is temporally pseudo replicated in a mixed-effects, but I don't find another approach for this. Please, I need any help to solve it.
I don't understand the motivation for a quasi-binomial model here, there's some nice discussion of the binomial and quasi binomial densities here and here that might be worth reading (including applications).
The problem with the code is that you have date as a character, so R doesn't know its a date. You will have to decide the units of measurement for time as well as the reference point, but the model works fine once you fix this.
ds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/pred_attack_F.csv")
str(ds)
head(ds)
class(ds$date)
ds$datex <- as.Date(ds$date)
summary(as.numeric(difftime(ds$datex, as.Date(Sys.Date(), "%d%b%Y"), units = "days")))
ds$date_time <- as.numeric(difftime(ds$datex, as.Date(Sys.Date(), "%d%b%Y"), units = "days"))
# scale for laplace
ds$sdate_time <- scale(ds$date_time)
m1 <- glmer(cbind(canopycov,100 - canopycov)~ date_time + (1|date_time),
family="binomial",data=ds)
summary(m1)
# Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
# Family: binomial ( logit )
# Formula: canopycov/100 ~ date_time + (1 | date_time)
# Data: ds
#
# AIC BIC logLik deviance df.resid
# 35109.7 35135.7 -17551.9 35103.7 43187
#
# Scaled residuals:
# Min 1Q Median 3Q Max
# -19.0451 -1.5605 -0.5155 -0.1594 0.8081
#
# Random effects:
# Groups Name Variance Std.Dev.
# date_time (Intercept) 0 0
# Number of obs: 43190, groups: date_time, 10
#
# Fixed effects:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 8.5062394 0.0866696 98.15 <2e-16 ***
# date_time 0.0399010 0.0004348 91.77 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Correlation of Fixed Effects:
# (Intr)
# date_time 0.988
# optimizer (Nelder_Mead) convergence code: 0 (OK)
# boundary (singular) fit: see ?isSingular
m2 <- MASS::glmmPQL(canopycov/100~ date_time,random=~1|date_time,
family="quasibinomial",data=ds)
summary(m2)
# Linear mixed-effects model fit by maximum likelihood
# Data: ds
# AIC BIC logLik
# NA NA NA
#
# Random effects:
# Formula: ~1 | date_time
# (Intercept) Residual
# StdDev: 0.3082808 0.1443456
#
# Variance function:
# Structure: fixed weights
# Formula: ~invwt
# Fixed effects: canopycov/100 ~ date_time
# Value Std.Error DF t-value p-value
# (Intercept) 0.1767127 0.0974997 43180 1.812443 0.0699
# date_time 0.3232878 0.0975013 8 3.315728 0.0106
# Correlation:
# (Intr)
# date_time 0
#
# Standardized Within-Group Residuals:
# Min Q1 Med Q3 Max
# -6.66205315 -0.47852364 0.09635514 0.54154467 4.32129236
#
# Number of Observations: 43190
# Number of Groups: 10
Does exist any package which can help me to export results of multinomial logit to excel for example like a table?
The broom package does a reasonable job of tidying multinomial output.
library(broom)
library(nnet)
fit.gear <- multinom(gear ~ mpg + factor(am), data = mtcars)
summary(fit.gear)
Call:
multinom(formula = gear ~ mpg + factor(am), data = mtcars)
Coefficients:
(Intercept) mpg factor(am)1
4 -11.15154 0.5249369 11.90045
5 -18.39374 0.3662580 22.44211
Std. Errors:
(Intercept) mpg factor(am)1
4 5.317047 0.2680456 66.895845
5 67.931319 0.2924021 2.169944
Residual Deviance: 28.03075
AIC: 40.03075
tidy(fit.gear)
# A tibble: 6 x 6
y.level term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 4 (Intercept) 1.44e-5 5.32 -2.10 3.60e- 2
2 4 mpg 1.69e+0 0.268 1.96 5.02e- 2
3 4 factor(am)1 1.47e+5 66.9 0.178 8.59e- 1
4 5 (Intercept) 1.03e-8 67.9 -0.271 7.87e- 1
5 5 mpg 1.44e+0 0.292 1.25 2.10e- 1
6 5 factor(am)1 5.58e+9 2.17 10.3 4.54e-25
Then use the openxlsx package to send that to Excel.
library(openxlsx)
write.xlsx(file="E:/.../fitgear.xlsx", tidy(fit.gear))
(Note that the tidy function exponentiates the coefficients by default, although the help page incorrectly says the default is FALSE). So these are relative risk ratios, which is why they don't match the output of summary. And if you want confidence intervals, you have to ask for them.)
I am a beginner with R. I am using glm to conduct logistic regression and then using the 'margins' package to calculate marginal effects but I don't seem to be able to exclude the missing values in my categorical independent variable.
I have tried to ask R to exclude NAs from the regression. The categorical variable is weight status at age 9 (wgt9), and it has three levels (1, 2, 3) and some NAs.
What am I doing wrong? Why do I get a wgt9NA result in my outputs and how can I correct it?
Thanks in advance for any help/advice.
Conduct logistic regression
summary(logit.phbehav <- glm(obese13 ~ gender + as.factor(wgt9) + aded08b,
data = gui, weights = bdwg01, family = binomial(link = "logit")))
Regression output
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -3.99 0.293 -13.6 2.86e- 42
2 gender 0.387 0.121 3.19 1.42e- 3
3 as.factor(wgt9)2 2.49 0.177 14.1 3.28e- 45
4 as.factor(wgt9)3 4.65 0.182 25.6 4.81e-144
5 as.factor(wgt9)NA 2.60 0.234 11.1 9.94e- 29
6 aded08b -0.0755 0.0224 -3.37 7.47e- 4
Calculate the marginal effects
effects_logit_phtotal = margins(logit.phtot)
print(effects_logit_phtotal)
summary(effects_logit_phtotal)
Marginal effects output
> summary(effects_logit_phtotal)
factor AME SE z p lower upper
aded08a -0.0012 0.0002 -4.8785 0.0000 -0.0017 -0.0007
gender 0.0115 0.0048 2.3899 0.0169 0.0021 0.0210
wgt92 0.0941 0.0086 10.9618 0.0000 0.0773 0.1109
wgt93 0.4708 0.0255 18.4569 0.0000 0.4208 0.5207
wgt9NA 0.1027 0.0179 5.7531 0.0000 0.0677 0.1377
First of all welcome to stack overflow. Please check the answer here to see how to make a great R question. Not providing a sample of your data, some times makes it impossible to answer the question. However taking a guess, I think that you have not set your NA values correctly but as strings. This behavior can be seen in the dummy data below.
First let's create the dummy data:
v1 <- c(2,3,3,3,2,2,2,2,NA,NA,NA)
v2 <- c(2,3,3,3,2,2,2,2,"NA","NA","NA")
v3 <- c(11,5,6,7,10,8,7,6,2,5,3)
obese <- c(0,1,1,0,0,1,1,1,0,0,0)
df <- data.frame(obese,v1,v2)
Using the variable named v1, does not include NA as a category:
glm(formula = obese ~ as.factor(v1) + v3, family = binomial(link = "logit"),
data = df)
Deviance Residuals:
1 2 3 4 5 6 7 8
-2.110e-08 2.110e-08 1.168e-05 -1.105e-05 -2.110e-08 3.094e-06 2.110e-08 2.110e-08
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 401.48 898581.15 0 1
as.factor(v1)3 -96.51 326132.30 0 1
v3 -46.93 106842.02 0 1
While making the string "NA" to factor gives an output similar to the one in question:
glm(formula = obese ~ as.factor(v2) + v3, family = binomial(link = "logit"),
data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.402e-05 -2.110e-08 -2.110e-08 2.110e-08 1.472e-05
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 394.21 744490.08 0.001 1
as.factor(v2)3 -95.33 340427.26 0.000 1
as.factor(v2)NA -327.07 613934.84 -0.001 1
v3 -45.99 84477.60 -0.001 1
Try the following to replace NAs that are strings:
gui$wgt9[ gui$wgt9 == "NA" ] <- NA
Don't forget to accept any answer that solved your problem.
I have a data frame of values from individuals linked to groups. I want to identify those groups who have mean values greater than the mean value plus one standard deviation for the whole data set. To do this, I'm calculating the mean value and standard deviation for the entire data frame and then running pairwise t-tests to compare to each group mean. I'm running into trouble outputting the results.
> head(df)
individual group value
1 11559638 75 0.371
2 11559641 75 0.367
3 11559648 75 0.410
4 11559650 75 0.417
5 11559652 75 0.440
6 11559654 75 0.395
> allvalues <- data.frame(mean=rep(mean(df$value), length(df$individual)), sd=rep(sd(df$value), length(df$individual)))
> valueplus <- with(df, by(df, df$individual, function(x) t.test(allvalues$mean + allvalues$sd, df$value, data=x)))
> tmpplus
--------------------------------------------------------------------------
df$individuals: 10
NULL
--------------------------------------------------------------------------
df$individuals: 20
NULL
--------------------------------------------------------------------------
df$individuals: 21
Welch Two Sample t-test
data: allvalues$mean + allvalues$sd and df$value
t = 84.5217, df = 4999, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.04676957 0.04899068
sample estimates:
mean of x mean of y
0.4719964 0.4241162
How do I get the results into a data frame? I'd expect the output to look something like this:
groups t df p-value mean.x mean.y
1 10 NULL NULL NULL NULL NULL
2 20 NULL NULL NULL NULL NULL
3 21 84.5217 4999 2.2e-16 0.4719964 0.4241162
From a purely programming perspective, you are asking how to get the output of t.test into a data.frame. Try the following, using mtcars:
library(broom)
tidy(t.test(mtcars$mpg))
estimate statistic p.value parameter conf.low conf.high
1 20.09062 18.85693 1.526151e-18 31 17.91768 22.26357
Or for multiple groups:
library(dplyr)
mtcars %>% group_by(vs) %>% do(tidy(t.test(.$mpg)))
# A tibble: 2 x 9
# Groups: vs [2]
vs estimate statistic p.value parameter conf.low conf.high method alternative
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 0 16.6 18.3 1.32e-12 17 14.7 18.5 One Sample t-test two.sided
2 1 24.6 17.1 2.75e-10 13 21.5 27.7 One Sample t-test two.sided
Needless to say, you'll need to adjust the code to fit your specific setting.