I am new to glmmtmb models, so i have ran into a problem.
I build a model and then based on the AICtab and DHARMa this was the best:
Insecticide_2<- glmmTMB(Insect_abundace~field_element+land_distance+sampling_time+year+treatment_day+(1|field_id),
data=Insect_002,
family= nbinom2)
After glmmTMB i ran Anova (from Car), and then emmeans, but the results of p-values in emmeans are the same (not lower.CL or upper.CL). What may be the problem? Is the model overfitted? Is the way i am doing the emmeans wrong?
Anova also showed that the land_distance, sampling_time, treatment_day were significant, year was almost significant (p= 0.07)
comp_emmeans1<-emmeans(Insect_002, pairwise ~ land_distance|year , type = "response")
> comp_emmeans1
$emmeans
Year = 2018:
land_distance response SE df lower.CL upper.CL
30m 2.46 0.492 474 1.658 3.64
50m 1.84 0.369 474 1.241 2.73
80m 1.36 0.283 474 0.906 2.05
110m 1.25 0.259 474 0.836 1.88
Year = 2019:
land_distance response SE df lower.CL upper.CL
30m 3.42 0.593 474 2.434 4.81
50m 2.56 0.461 474 1.799 3.65
80m 1.90 0.335 474 1.343 2.68
110m 1.75 0.317 474 1.222 2.49
Results are averaged over the levels of: field_element, sampling_time, treatment_day
Confidence level used: 0.95
Intervals are back-transformed from the log scale
$contrasts
year = 2018:
contrast ratio SE df null t.ratio p.value
30m / 50m 1.34 0.203 474 1 1.906 0.2268
30m / 80m 1.80 0.279 474 1 3.798 0.0009
30m / 110m 1.96 0.311 474 1 4.239 0.0002
50m / 80m 1.35 0.213 474 1 1.896 0.2311
50m / 110m 1.47 0.234 474 1 2.405 0.0776
80m / 110m 1.09 0.176 474 1 0.516 0.9552
year = 2019:
contrast ratio SE df null t.ratio p.value
30m / 50m 1.34 0.203 474 1 1.906 0.2268
30m / 80m 1.80 0.279 474 1 3.798 0.0009
30m / 110m 1.96 0.311 474 1 4.239 0.0002
50m / 80m 1.35 0.213 474 1 1.896 0.2311
50m / 110m 1.47 0.234 474 1 2.405 0.0776
80m / 110m 1.09 0.176 474 1 0.516 0.9552
Results are averaged over the levels of: field_element, sampling_time, treatment_day
P value adjustment: tukey method for comparing a family of 4 estimates
Tests are performed on the log scale
Should i use different comparison way? I saw that some use poly~, I tried that, results picture is the same. Also am I comparing the right things?
Last and also important question is how should i report the glmmTMB, Anova and emmeans results?
I don't recall seeing this question before, but it's been 8 months, and maybe I just forgot.
Anyway, I am not sure exactly what the question is, but there are three things going on that might possibly have caused some confusion:
The emmeans() call has the specification pairwise ~ land_distance|year, which causes it to compute both means and pairwise comparisons thereof. I think users are almost always better served by separating those steps, because estimating means and estimating contrasts are two different things.
The default way in which means are summarized (estimates, SEs, and confidence intervals) is different than the default for comparisons or other contrasts (estimates, SEs, t ratios, and adjusted P values). That's because, as I said before, there are two different things, and usually people want CIs for means and P values for contrasts. See below.
There is a log link in this model, and that has special properties when it comes to contrasts, because the difference on a log scale is the log of the ratio. So we display a ratio when we have type = "response". (With most other link functions, there is no way to back-transform the differences of transformed values.)
What I suggest, per (1), is to get the means (and not comparisons) first:
EMM <- emmeans(Insect_002, ~ land_distance|year , type = "response")
EMM # see the estimates
You can get pairwise comparisons next:
CON <- pairs(EMM) # or contrast(EMM, "pairwise")
CON # see the ratios as shown in the OP
confint(CON) # see confidence intervals instead of tests
confint(CON, type = "link") # See the pairwise differences on the log scale
If you actually want differences on the response scale rather than ratios, that's possible too:
pairs(regrid(EMM)) # tests
confint(pairs(regrid(EMM)) # CIs
Related
Hey guys, so I taught myself time-to-event analysis recently and I need some help understanding it. I made some Kaplan-Meier survival curves.
Sure, the number of observations within each node is small but let's pretend that I have plenty.
K <- HF %>%
filter(serum_creatinine <= 1.8, ejection_fraction <= 25)
## Call: survfit(formula = Surv(time, DEATH_EVENT) ~ 1, data = K)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 20 36 5 0.881 0.0500 0.788 0.985
## 45 33 3 0.808 0.0612 0.696 0.937
## 60 31 3 0.734 0.0688 0.611 0.882
## 80 23 6 0.587 0.0768 0.454 0.759
## 100 17 1 0.562 0.0776 0.429 0.736
## 110 17 0 0.562 0.0776 0.429 0.736
## 120 16 1 0.529 0.0798 0.393 0.711
## 130 14 0 0.529 0.0798 0.393 0.711
## 140 14 0 0.529 0.0798 0.393 0.711
## 150 13 1 0.488 0.0834 0.349 0.682
If someone were to ask me about the third node, would the following statements be valid?:
For any new patient that walks into this hospital with <= 1.8 in Serum_Creatine & <= 25 in Ejection Fraction, their probability of survival is 53% after 140 days.
What about:
The survival distributions for the samples analyzed, and no other future incoming samples, are visualized above.
I want to make sure these statements are correct.
I would also like to know if logistic regression could be used to predict the binary variable DEATH_EVENT? Since the TIME variable contributes to how much weight one patient's death at 20 days has over another patient's death at 175 days, I understand that this needs to be accounted for.
If logistic regression can be used, does that imply anything over keeping/removing variable TIME?
Here are some thoughts:
Logistic regression is not appropriate in your case. As it is not the correct method for time to event analysis.
If the clinical outcome observed is “either-or,” such as if a patient suffers an MI or not, logistic regression can be used.
However, if the information on the time to MI is the observed outcome, data are analyzed using statistical methods for survival analysis.
Text from here
If you want to use a regression model in survival analysis then you should use a COX PROPORTIONAL HAZARDS MODEL. To understand the difference of a Kaplan-Meier analysis and Cox proportional hazards model you should understand both of them.
The next step would be to understand what is a univariable in contrast to a multivariable Cox proportional hazard model.
At the end you should understand all 3 methods(Kaplan-Meier, Cox univariable and Cox multivariable) then you can answer your question if this is a valid statement:
For any new patient that walks into this hospital with <= 1.8 in Serum_Creatine & <= 25 in Ejection Fraction, their probability of survival is 53% after 140 days.
There is nothing wrong to state the results of a subgroup of a Kaplan-Meier method. But it has a different value if the statement comes from a multivariable Cox regression analysis.
I have been trying to convert some PROC MIXED SAS code into R, but without success. The code is:
proc mixed data=rmanova4;
class randomization_arm cancer_type site wk;
model chgpf=randomization_arm cancer_type site wk;
repeated / subject=study_id;
contrast '12 vs 4' randomization_arm 1 -1;
lsmeans randomization_arm / cl pdiff alpha=0.05;
run;quit;
I have tried something like
mod4 <- lme(chgpf ~ Randomization_Arm + Cancer_Type + site + wk, data=rmanova.data, random = ~ 1 | Study_ID, na.action=na.exclude)
but I am getting different estimate values.
Perhaps I am misunderstanding something basic. Any comment/suggestion would be greatly appreciated.
(Additional editing)
I am adding here the output. Part of the output from the SAS code is below:
Least Squares Means
Effect Randomization_Arm Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper
Randomization_Arm 12 weekly BTA -4.5441 1.3163 222 -3.45 0.0007 0.05 -7.1382 -1.9501
Randomization_Arm 4 weekly BTA -6.4224 1.3143 222 -4.89 <.0001 0.05 -9.0126 -3.8322
Differences of Least Squares Means
Effect Randomization_Arm _Randomization_Arm Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper
Randomization_Arm 12 weekly BTA 4 weekly BTA 1.8783 1.4774 222 1.27 0.2049 0.05 -1.0332 4.7898
The output from the R code is below:
Linear mixed-effects model fit by REML
Data: rmanova.data
AIC BIC logLik
6522.977 6578.592 -3249.488
Random effects:
Formula: ~1 | Study_ID
(Intercept) Residual
StdDev: 16.59143 12.81334
Fixed effects: chgpf ~ Randomization_Arm + Cancer_Type + site + wk
Value Std.Error DF t-value p-value
(Intercept) 2.332268 2.314150 539 1.0078294 0.3140
Randomization_Arm4 weekly BTA -1.708401 2.409444 222 -0.7090435 0.4790
Cancer_TypeProsta -4.793787 2.560133 222 -1.8724761 0.0625
site2 -1.492911 3.665674 222 -0.4072678 0.6842
site3 -4.002252 3.510111 222 -1.1402066 0.2554
site4 -12.013758 5.746988 222 -2.0904442 0.0377
site5 -3.823504 4.938590 222 -0.7742097 0.4396
wk2 0.313863 1.281047 539 0.2450052 0.8065
wk3 -3.606267 1.329357 539 -2.7127905 0.0069
wk4 -4.246526 1.345526 539 -3.1560334 0.0017
Correlation:
(Intr) R_A4wB Cnc_TP site2 site3 site4 site5 wk2 wk3
Randomization_Arm4 weekly BTA -0.558
Cancer_TypeProsta -0.404 0.046
site2 -0.257 0.001 -0.087
site3 -0.238 0.004 -0.163 0.201
site4 -0.255 0.031 0.151 0.101 0.095
site5 -0.172 -0.016 -0.077 0.139 0.151 0.073
wk2 -0.254 -0.008 0.010 0.011 -0.003 0.005 -0.001
wk3 -0.257 0.005 0.020 0.014 0.006 -0.001 -0.002 0.464
wk4 -0.251 -0.007 0.022 0.020 0.002 0.006 -0.002 0.461 0.461
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-5.6784364 -0.3796392 0.1050812 0.4588555 3.1055046
Number of Observations: 771
Number of Groups: 229
Adding some comments and observations
Since my original posting, I have tried various pieces of R code but I am getting different estimates from those given in SAS.
More importantly, the standard errors are almost double than those given by SAS.
Any suggestions would be greatly appreciated.
I got the solution to the problem from someone after posting the question at the R-sig-ME. It seems that the above SAS fits actually a simple linear regression model, assuming independent across observations, which is equivalent to
proc glm data=rmanova4;
class randomization_arm cancer_type site wk;
model chgpf = randomization_arm cancer_type site wk;
run;
which of course in R is equivalent to
lm(chgpf ~ Randomization_Arm + Cancer_Type + site + wk, data=rmanova.data)
Hi stackoverflow community,
I am a recent R starter and today I tried several hours to figure out how to get a scientific p-value (e.g. 3*e-1) from a competing risk analysis using the cmprsk package.
I used:
sumary_J1<-crr(ftime, fstatus, cov1, failcode=2)
summary(sumary_J1)
And got
Call:
crr(ftime = ftime, fstatus = fstatus, cov1 = cov1, failcode = 2)
coef exp(coef) se(coef) z p-value
group1 0.373 1.45 0.02684 13.90 0.00
age 0.122 1.13 0.00384 31.65 0.00
sex 0.604 1.83 0.04371 13.83 0.00
bmi 0.012 1.01 0.00611 1.96 0.05
exp(coef) exp(-coef) 2.5% 97.5%
group1 1.45 0.689 1.38 1.53
age 1.13 0.886 1.12 1.14
sex 1.83 0.546 1.68 1.99
bmi 1.01 0.988 1.00 1.02
Num. cases = 470690 (1900 cases omitted due to missing values)
Pseudo Log-likelihood = -28721
Pseudo likelihood ratio test = 2229 on 4 df,
I can see the p-value column,but I only get two decimal places. I would like to see as many decimal places as possible or print those p-values in the format e.g. 3.0*e-3.
I tried all of those, but nothing worked so far:
summary(sumary_J1, digits=max(options()$digits - 5,10))
print.crr(sumary_J1, digits = 20)
print.crr(sumary_J1, digits = 3, scipen = -2)
print.crr(sumary_J1, format = "e", digits = 3)
Maybe someone is able to help me! Thanks!
Best,
Carolin
The use of digits=2 limits the number of digits to the right of the decimal point when used as an argument to a .summary value. The digits parameter does affect how results are displayed for summary.crr.
summary(z, digits=3) # using first example in `?cmprsk::crr`
#----------------------
#Competing Risks Regression
Call:
crr(ftime = ftime, fstatus = fstatus, cov1 = cov)
coef exp(coef) se(coef) z p-value
x1 0.2668 1.306 0.421 0.633 0.526
x2 -0.0557 0.946 0.381 -0.146 0.884
x3 0.2805 1.324 0.381 0.736 0.462
exp(coef) exp(-coef) 2.5% 97.5%
x1 1.306 0.766 0.572 2.98
x2 0.946 1.057 0.448 2.00
x3 1.324 0.755 0.627 2.79
Num. cases = 200
Pseudo Log-likelihood = -320
Pseudo likelihood ratio test = 1.02 on 3 df,
You can use formatC to control format:
formatC( summary(z, digits=5)$coef , format="e")
#------------>
coef exp(coef) se(coef) z p-value
x1 "2.6676e-01" "1.3057e+00" "4.2115e-01" "6.3340e-01" "5.2647e-01"
x2 "-5.5684e-02" "9.4584e-01" "3.8124e-01" "-1.4606e-01" "8.8387e-01"
x3 "2.8049e-01" "1.3238e+00" "3.8098e-01" "7.3622e-01" "4.6159e-01"
You also might search on [r] very small p-value
Here's the first of over 100 hits on that topic, which despite not very much attention, still has very useful information and coding examples: Reading a very small p-value in R
By looking at the function that prints the output of crr() (cmprsk::print.crr) you can see what is done to create the p-values displayed in the summary. The code below is taken from that function.
x <- sumary_J1
v <- sqrt(diag(x$var))
signif(v, 4) # Gives you the one-sided p-values.
v <- 2 * (1 - pnorm(abs(x$coef)/v))
signif(v, 4) # Gibes you the two-sided p-values.
Hoping that you can clear some confusion in my head.
Linear mixed model is constructed with lmerTest:
MODEL <- lmer(Ca content ~ SYSTEM +(1 | YEAR/replicate) +
(1 | YEAR:SYSTEM), data = IOSDV1)
Fun starts happening when I'm trying to get the confidence intervals for the specific levels of the main effect.
Commands emmeans and lsmeans produce the same intervals (example; SYSTEM A3: 23.9-128.9, mean 76.4, SE:8.96).
However, the command as.data.frame(effect("SYSTEM", MODEL)) produces different, narrower confidence intervals (example; SYSTEM A3: 58.0-94.9, mean 76.4, SE:8.96).
What am I missing and what number should I report?
To summarize, for the content of Ca, i have 6 total measurements per treatment (three per year, each from different replication). I will leave the names in the code in my language, as used. Idea is to test if certain production practices affect the content of specific minerals in the grains. Random effects without residual variance were left in the model for this example.
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: CA ~ SISTEM + (1 | LETO/ponovitev) + (1 | LETO:SISTEM)
Data: IOSDV1
REML criterion at convergence: 202.1
Scaled residuals:
Min 1Q Median 3Q Max
-1.60767 -0.74339 0.04665 0.73152 1.50519
Random effects:
Groups Name Variance Std.Dev.
LETO:SISTEM (Intercept) 0.0 0.0
ponovitev:LETO (Intercept) 0.0 0.0
LETO (Intercept) 120.9 11.0
Residual 118.7 10.9
Number of obs: 30, groups: LETO:SISTEM, 10; ponovitev:LETO, 8; LETO, 2
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 76.417 8.959 1.548 8.530 0.0276 *
SISTEM[T.C0] -5.183 6.291 24.000 -0.824 0.4181
SISTEM[T.C110] -13.433 6.291 24.000 -2.135 0.0431 *
SISTEM[T.C165] -7.617 6.291 24.000 -1.211 0.2378
SISTEM[T.C55] -10.883 6.291 24.000 -1.730 0.0965 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) SISTEM[T.C0 SISTEM[T.C11 SISTEM[T.C16
SISTEM[T.C0 -0.351
SISTEM[T.C11 -0.351 0.500
SISTEM[T.C16 -0.351 0.500 0.500
SISTEM[T.C5 -0.351 0.500 0.500 0.500
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular
> ls_means(MODEL, ddf="Kenward-Roger")
Least Squares Means table:
Estimate Std. Error df t value lower upper Pr(>|t|)
SISTEMA3 76.4167 8.9586 1.5 8.5299 23.9091 128.9243 0.02853 *
SISTEMC0 71.2333 8.9586 1.5 7.9514 18.7257 123.7409 0.03171 *
SISTEMC110 62.9833 8.9586 1.5 7.0305 10.4757 115.4909 0.03813 *
SISTEMC165 68.8000 8.9586 1.5 7.6797 16.2924 121.3076 0.03341 *
SISTEMC55 65.5333 8.9586 1.5 7.3151 13.0257 118.0409 0.03594 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Confidence level: 95%
Degrees of freedom method: Kenward-Roger
> emmeans(MODEL, spec = c("SISTEM"))
SISTEM emmean SE df lower.CL upper.CL
A3 76.4 8.96 1.53 23.9 129
C0 71.2 8.96 1.53 18.7 124
C110 63.0 8.96 1.53 10.5 115
C165 68.8 8.96 1.53 16.3 121
C55 65.5 8.96 1.53 13.0 118
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
> as.data.frame(effect("SISTEM", MODEL))
SISTEM fit se lower upper
1 A3 76.41667 8.958643 57.96600 94.86734
2 C0 71.23333 8.958643 52.78266 89.68400
3 C110 62.98333 8.958643 44.53266 81.43400
4 C165 68.80000 8.958643 50.34933 87.25067
5 C55 65.53333 8.958643 47.08266 83.98400
Many thanks.
I'm pretty sure this has to do with the dreaded "denominator degrees of freedom" question, i.e. what kind (if any) of finite-sample correction is being employed. tl;dr emmeans is using a Kenward-Roger correction, which is more or less the most accurate available option — the only reason not to use K-R is if you have a large data set for which it becomes unbearably slow.
load packages, simulate data, fit model
library(lmerTest)
library(emmeans)
library(effects)
dd <- expand.grid(f=factor(letters[1:3]),g=factor(1:20),rep=1:10)
set.seed(101)
dd$y <- simulate(~f+(1|g), newdata=dd, newparams=list(beta=rep(1,3),theta=1,sigma=1))[[1]]
m <- lmer(y~f+(1|g), data=dd)
compare default emmeans with effects
emmeans(m, ~f)
## f emmean SE df lower.CL upper.CL
## a 0.848 0.212 21.9 0.409 1.29
## b 1.853 0.212 21.9 1.414 2.29
## c 1.863 0.212 21.9 1.424 2.30
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
as.data.frame(effect("f",m))
## f fit se lower upper
## 1 a 0.8480161 0.2117093 0.4322306 1.263802
## 2 b 1.8531805 0.2117093 1.4373950 2.268966
## 3 c 1.8632228 0.2117093 1.4474373 2.279008
effects doesn't explicitly tell us what/whether it's using a finite-sample correction: we could dig around in the documentation or the code to try to find out. Alternatively, we can tell emmeans not to use finite-sample correction:
emmeans(m, ~f, lmer.df="asymptotic")
## f emmean SE df asymp.LCL asymp.UCL
## a 0.848 0.212 Inf 0.433 1.26
## b 1.853 0.212 Inf 1.438 2.27
## c 1.863 0.212 Inf 1.448 2.28
## Degrees-of-freedom method: asymptotic
## Confidence level used: 0.95
Testing shows that these are equivalent to about a tolerance of 0.001 (probably close enough). In principle we should be able to specify KR=TRUE to get effects to use Kenward-Roger correction, but I haven't been able to get that to work yet.
However, I will also say that there's something a little bit funky about your example. If we compute the distance between the mean and the lower CI in units of standard error, for emmeans we get (76.4-23.9)/8.96 = 5.86, which implies a very small effect degrees of freedom (e.g. about 1.55). That seems questionable to me unless your data set is extremely small ...
From your updated post, it appears that Kenward-Roger is indeed estimating only 1.5 denominator df.
In general it is dicey/not recommended to try fitting random effects where the grouping variable has a small number of levels (although see here for a counterargument). I would try treating LETO (which has only two levels) as a fixed effect, i.e.
CA ~ SISTEM + LETO + (1 | LETO:ponovitev) + (1 | LETO:SISTEM)
and see if that helps. (I would expect you would then get on the order of 7 df, which would make your CIs ± 2.4 SE instead of ± 6 SE ...)
Is there a way to have effect size (such as Cohen's d or the most appropriate) directly using emmeans()?
I cannot find anything for obtaining effect size by using emmeans()
post <- emmeans(fit, pairwise~ favorite.pirate | sex)
emmip(fit, ~ favorite.pirate | sex)
There is not a built-in provision for effect-size calculations, but you can cobble one together by defining a custom contrast function that divides each pairwise comparison by a value of sigma:
mypw.emmc = function(..., sigma = 1) {
result = emmeans:::pairwise.emmc (...)
for (i in seq_along(result[1, ]))
result[[i]] = result[[i]] / sigma
result
}
Here's a test run:
> mypw.emmc(1:3, sigma = 4)
1 - 2 1 - 3 2 - 3
1 0.25 0.25 0.00
2 -0.25 0.00 0.25
3 0.00 -0.25 -0.25
With your model, the error SD is 9.246 (look at summary(fit); so, ...
> emmeans(fit, mypw ~ sex, sigma = 9.246, name = "effect.size")
NOTE: Results may be misleading due to involvement in interactions
$emmeans
sex emmean SE df lower.CL upper.CL
female 63.8 0.434 3.03 62.4 65.2
male 74.5 0.809 15.82 72.8 76.2
other 68.8 1.439 187.08 65.9 71.6
Results are averaged over the levels of: favorite.pirate
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
effect.size estimate SE df t.ratio p.value
female - male -1.158 0.0996 399 -11.624 <.0001
female - other -0.537 0.1627 888 -3.299 0.0029
male - other 0.621 0.1717 981 3.617 0.0009
Results are averaged over the levels of: favorite.pirate
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 3 estimates
Some words of caution though:
The SEs of the effect sizes are misleading because they don't account for the variation in sigma.
This is not a very good example because
a. The factors interact (Edward Low is different in his profile).
Also, see the warning message.
b. The model is singular (as warned when the model was fitted), yielding an estimated variance of zero for college)
library(yarrr)
View(pirates)
library(lme4)
library(lmerTest)
fit <- lmer(weight~ favorite.pirate * sex +(1|college), data = pirates)
anova(fit, ddf = "Kenward-Roger")
post <- emmeans(fit, pairwise~ sex)
post