converting proc mixed to R - r

I have been trying to convert some PROC MIXED SAS code into R, but without success. The code is:
proc mixed data=rmanova4;
class randomization_arm cancer_type site wk;
model chgpf=randomization_arm cancer_type site wk;
repeated / subject=study_id;
contrast '12 vs 4' randomization_arm 1 -1;
lsmeans randomization_arm / cl pdiff alpha=0.05;
run;quit;
I have tried something like
mod4 <- lme(chgpf ~ Randomization_Arm + Cancer_Type + site + wk, data=rmanova.data, random = ~ 1 | Study_ID, na.action=na.exclude)
but I am getting different estimate values.
Perhaps I am misunderstanding something basic. Any comment/suggestion would be greatly appreciated.
(Additional editing)
I am adding here the output. Part of the output from the SAS code is below:
Least Squares Means
Effect Randomization_Arm Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper
Randomization_Arm 12 weekly BTA -4.5441 1.3163 222 -3.45 0.0007 0.05 -7.1382 -1.9501
Randomization_Arm 4 weekly BTA -6.4224 1.3143 222 -4.89 <.0001 0.05 -9.0126 -3.8322
Differences of Least Squares Means
Effect Randomization_Arm _Randomization_Arm Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper
Randomization_Arm 12 weekly BTA 4 weekly BTA 1.8783 1.4774 222 1.27 0.2049 0.05 -1.0332 4.7898
The output from the R code is below:
Linear mixed-effects model fit by REML
Data: rmanova.data
AIC BIC logLik
6522.977 6578.592 -3249.488
Random effects:
Formula: ~1 | Study_ID
(Intercept) Residual
StdDev: 16.59143 12.81334
Fixed effects: chgpf ~ Randomization_Arm + Cancer_Type + site + wk
Value Std.Error DF t-value p-value
(Intercept) 2.332268 2.314150 539 1.0078294 0.3140
Randomization_Arm4 weekly BTA -1.708401 2.409444 222 -0.7090435 0.4790
Cancer_TypeProsta -4.793787 2.560133 222 -1.8724761 0.0625
site2 -1.492911 3.665674 222 -0.4072678 0.6842
site3 -4.002252 3.510111 222 -1.1402066 0.2554
site4 -12.013758 5.746988 222 -2.0904442 0.0377
site5 -3.823504 4.938590 222 -0.7742097 0.4396
wk2 0.313863 1.281047 539 0.2450052 0.8065
wk3 -3.606267 1.329357 539 -2.7127905 0.0069
wk4 -4.246526 1.345526 539 -3.1560334 0.0017
Correlation:
(Intr) R_A4wB Cnc_TP site2 site3 site4 site5 wk2 wk3
Randomization_Arm4 weekly BTA -0.558
Cancer_TypeProsta -0.404 0.046
site2 -0.257 0.001 -0.087
site3 -0.238 0.004 -0.163 0.201
site4 -0.255 0.031 0.151 0.101 0.095
site5 -0.172 -0.016 -0.077 0.139 0.151 0.073
wk2 -0.254 -0.008 0.010 0.011 -0.003 0.005 -0.001
wk3 -0.257 0.005 0.020 0.014 0.006 -0.001 -0.002 0.464
wk4 -0.251 -0.007 0.022 0.020 0.002 0.006 -0.002 0.461 0.461
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-5.6784364 -0.3796392 0.1050812 0.4588555 3.1055046
Number of Observations: 771
Number of Groups: 229
Adding some comments and observations
Since my original posting, I have tried various pieces of R code but I am getting different estimates from those given in SAS.
More importantly, the standard errors are almost double than those given by SAS.
Any suggestions would be greatly appreciated.

I got the solution to the problem from someone after posting the question at the R-sig-ME. It seems that the above SAS fits actually a simple linear regression model, assuming independent across observations, which is equivalent to
proc glm data=rmanova4;
class randomization_arm cancer_type site wk;
model chgpf = randomization_arm cancer_type site wk;
run;
which of course in R is equivalent to
lm(chgpf ~ Randomization_Arm + Cancer_Type + site + wk, data=rmanova.data)

Related

Linear mixed model confidence intervals question

Hoping that you can clear some confusion in my head.
Linear mixed model is constructed with lmerTest:
MODEL <- lmer(Ca content ~ SYSTEM +(1 | YEAR/replicate) +
(1 | YEAR:SYSTEM), data = IOSDV1)
Fun starts happening when I'm trying to get the confidence intervals for the specific levels of the main effect.
Commands emmeans and lsmeans produce the same intervals (example; SYSTEM A3: 23.9-128.9, mean 76.4, SE:8.96).
However, the command as.data.frame(effect("SYSTEM", MODEL)) produces different, narrower confidence intervals (example; SYSTEM A3: 58.0-94.9, mean 76.4, SE:8.96).
What am I missing and what number should I report?
To summarize, for the content of Ca, i have 6 total measurements per treatment (three per year, each from different replication). I will leave the names in the code in my language, as used. Idea is to test if certain production practices affect the content of specific minerals in the grains. Random effects without residual variance were left in the model for this example.
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: CA ~ SISTEM + (1 | LETO/ponovitev) + (1 | LETO:SISTEM)
Data: IOSDV1
REML criterion at convergence: 202.1
Scaled residuals:
Min 1Q Median 3Q Max
-1.60767 -0.74339 0.04665 0.73152 1.50519
Random effects:
Groups Name Variance Std.Dev.
LETO:SISTEM (Intercept) 0.0 0.0
ponovitev:LETO (Intercept) 0.0 0.0
LETO (Intercept) 120.9 11.0
Residual 118.7 10.9
Number of obs: 30, groups: LETO:SISTEM, 10; ponovitev:LETO, 8; LETO, 2
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 76.417 8.959 1.548 8.530 0.0276 *
SISTEM[T.C0] -5.183 6.291 24.000 -0.824 0.4181
SISTEM[T.C110] -13.433 6.291 24.000 -2.135 0.0431 *
SISTEM[T.C165] -7.617 6.291 24.000 -1.211 0.2378
SISTEM[T.C55] -10.883 6.291 24.000 -1.730 0.0965 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) SISTEM[T.C0 SISTEM[T.C11 SISTEM[T.C16
SISTEM[T.C0 -0.351
SISTEM[T.C11 -0.351 0.500
SISTEM[T.C16 -0.351 0.500 0.500
SISTEM[T.C5 -0.351 0.500 0.500 0.500
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular
> ls_means(MODEL, ddf="Kenward-Roger")
Least Squares Means table:
Estimate Std. Error df t value lower upper Pr(>|t|)
SISTEMA3 76.4167 8.9586 1.5 8.5299 23.9091 128.9243 0.02853 *
SISTEMC0 71.2333 8.9586 1.5 7.9514 18.7257 123.7409 0.03171 *
SISTEMC110 62.9833 8.9586 1.5 7.0305 10.4757 115.4909 0.03813 *
SISTEMC165 68.8000 8.9586 1.5 7.6797 16.2924 121.3076 0.03341 *
SISTEMC55 65.5333 8.9586 1.5 7.3151 13.0257 118.0409 0.03594 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Confidence level: 95%
Degrees of freedom method: Kenward-Roger
> emmeans(MODEL, spec = c("SISTEM"))
SISTEM emmean SE df lower.CL upper.CL
A3 76.4 8.96 1.53 23.9 129
C0 71.2 8.96 1.53 18.7 124
C110 63.0 8.96 1.53 10.5 115
C165 68.8 8.96 1.53 16.3 121
C55 65.5 8.96 1.53 13.0 118
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
> as.data.frame(effect("SISTEM", MODEL))
SISTEM fit se lower upper
1 A3 76.41667 8.958643 57.96600 94.86734
2 C0 71.23333 8.958643 52.78266 89.68400
3 C110 62.98333 8.958643 44.53266 81.43400
4 C165 68.80000 8.958643 50.34933 87.25067
5 C55 65.53333 8.958643 47.08266 83.98400
Many thanks.
I'm pretty sure this has to do with the dreaded "denominator degrees of freedom" question, i.e. what kind (if any) of finite-sample correction is being employed. tl;dr emmeans is using a Kenward-Roger correction, which is more or less the most accurate available option — the only reason not to use K-R is if you have a large data set for which it becomes unbearably slow.
load packages, simulate data, fit model
library(lmerTest)
library(emmeans)
library(effects)
dd <- expand.grid(f=factor(letters[1:3]),g=factor(1:20),rep=1:10)
set.seed(101)
dd$y <- simulate(~f+(1|g), newdata=dd, newparams=list(beta=rep(1,3),theta=1,sigma=1))[[1]]
m <- lmer(y~f+(1|g), data=dd)
compare default emmeans with effects
emmeans(m, ~f)
## f emmean SE df lower.CL upper.CL
## a 0.848 0.212 21.9 0.409 1.29
## b 1.853 0.212 21.9 1.414 2.29
## c 1.863 0.212 21.9 1.424 2.30
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
as.data.frame(effect("f",m))
## f fit se lower upper
## 1 a 0.8480161 0.2117093 0.4322306 1.263802
## 2 b 1.8531805 0.2117093 1.4373950 2.268966
## 3 c 1.8632228 0.2117093 1.4474373 2.279008
effects doesn't explicitly tell us what/whether it's using a finite-sample correction: we could dig around in the documentation or the code to try to find out. Alternatively, we can tell emmeans not to use finite-sample correction:
emmeans(m, ~f, lmer.df="asymptotic")
## f emmean SE df asymp.LCL asymp.UCL
## a 0.848 0.212 Inf 0.433 1.26
## b 1.853 0.212 Inf 1.438 2.27
## c 1.863 0.212 Inf 1.448 2.28
## Degrees-of-freedom method: asymptotic
## Confidence level used: 0.95
Testing shows that these are equivalent to about a tolerance of 0.001 (probably close enough). In principle we should be able to specify KR=TRUE to get effects to use Kenward-Roger correction, but I haven't been able to get that to work yet.
However, I will also say that there's something a little bit funky about your example. If we compute the distance between the mean and the lower CI in units of standard error, for emmeans we get (76.4-23.9)/8.96 = 5.86, which implies a very small effect degrees of freedom (e.g. about 1.55). That seems questionable to me unless your data set is extremely small ...
From your updated post, it appears that Kenward-Roger is indeed estimating only 1.5 denominator df.
In general it is dicey/not recommended to try fitting random effects where the grouping variable has a small number of levels (although see here for a counterargument). I would try treating LETO (which has only two levels) as a fixed effect, i.e.
CA ~ SISTEM + LETO + (1 | LETO:ponovitev) + (1 | LETO:SISTEM)
and see if that helps. (I would expect you would then get on the order of 7 df, which would make your CIs ± 2.4 SE instead of ± 6 SE ...)

How do I include p-value and R-square for the estimates in semPaths?

I am using semPaths (semPlot package) to draw my structural equation models. After some trial and error, I have a pretty good script to show what I want. Except, I haven’t been able to figure out how to include the p-value/significance levels of the estimates/regression coefficients in the figure.
Can/how can I include significance levels either as e.g. p-value in the edge labels below the estimate or as a broken line for insignificance or …?
I am also interested in including the R-square, but not as critically as the significance level.
This is the script I am using so far:
semPaths(fitmod.bac.class2,
what = "std",
whatLabels = "std",
style="ram",
edge.label.cex = 1.3,
layout = 'tree',
intercepts=FALSE,
residuals=FALSE,
nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
sizeMan=7 )
Example of one of the SemPath outputs
In this example the following are not significant:
Ignavibacteria -> First_C_CO2_ugC_gC_day, p = 0.096
pH -> Ignavibacteria, p = 0.151
cand_class_MB_A2_108 <-> Bacilli correlation, p = 0.054
I am a R-user and not really a coder, so I might just be missing a crucial point in the arguments.
I am testing a lot of different models at the moment, and would really like not to have to draw them all up by hand.
update:
Using semPlotModel: Am I right in understanding that semPlotModel doesn’t include the significance levels from the sem function (see my script and output below)? I am specifically looking to include the P(>|z|) for regressions and covariance.
Is it just me that is missing that, or is it not included? If it is not included, my solution is simply just to custom the edge labels.
{model.NA.UP.bac.class2 <- '
#LATANT VARIABLES
#REGRESSIONS
#soil organic carbon quality
c_Negativicutes ~ CN
#microorganisms
First_C_CO2_ugC_gC_day ~ c_Bacilli
First_C_CO2_ugC_gC_day ~ c_Ignavibacteria
First_C_CO2_ugC_gC_day ~ c_cand_class_MB_A2_108
First_C_CO2_ugC_gC_day ~ c_Negativicutes
#pH
c_Bacilli ~pH
c_Ignavibacteria ~pH
c_cand_class_MB_A2_108~pH
c_Negativicutes ~pH
#COVARIANCE
initial_water ~~ CN
c_cand_class_MB_A2_108 ~~ c_Bacilli
'
fitmod.bac.class2 <- sem(model.NA.UP.bac.class2, data=datapNA.UP.log, missing="ml", meanstructure=TRUE, fixed.x=FALSE, std.lv=FALSE, std.ov=FALSE)
summary(fitmod.bac.class2, standardized=TRUE, fit.measures=TRUE, rsq=TRUE)
out <- capture.output(summary(fitmod.bac.class2, standardized=TRUE, fit.measures=TRUE, rsq=TRUE))
}
Output:
lavaan 0.6-5 ended normally after 188 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 28
Number of observations 30
Number of missing patterns 1
Model Test User Model:
Test statistic 17.816
Degrees of freedom 16
P-value (Chi-square) 0.335
Model Test Baseline Model:
Test statistic 101.570
Degrees of freedom 28
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.975
Tucker-Lewis Index (TLI) 0.957
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) 472.465
Loglikelihood unrestricted model (H1) 481.373
Akaike (AIC) -888.930
Bayesian (BIC) -849.697
Sample-size adjusted Bayesian (BIC) -936.875
Root Mean Square Error of Approximation:
RMSEA 0.062
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.185
P-value RMSEA <= 0.05 0.414
Standardized Root Mean Square Residual:
SRMR 0.107
Parameter Estimates:
Information Observed
Observed information based on Hessian
Standard errors Standard
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
c_Negativicutes ~
CN 0.419 0.143 2.939 0.003 0.419 0.416
c_cand_class_MB_A2_108 ~
CN -0.433 0.160 -2.707 0.007 -0.433 -0.394
First_C_CO2_ugC_gC_day ~
c_Bacilli 0.525 0.128 4.092 0.000 0.525 0.496
c_Ignavibacter 0.207 0.124 1.667 0.096 0.207 0.195
c_c__MB_A2_108 0.310 0.125 2.475 0.013 0.310 0.301
c_Negativicuts 0.304 0.137 2.220 0.026 0.304 0.271
c_Bacilli ~
pH 0.624 0.135 4.604 0.000 0.624 0.643
c_Ignavibacteria ~
pH 0.245 0.171 1.436 0.151 0.245 0.254
c_cand_class_MB_A2_108 ~
pH 0.393 0.151 2.597 0.009 0.393 0.394
c_Negativicutes ~
pH 0.435 0.129 3.361 0.001 0.435 0.476
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
CN ~~
initial_water 0.001 0.000 2.679 0.007 0.001 0.561
.c_cand_class_MB_A2_108 ~~
.c_Bacilli -0.000 0.000 -1.923 0.054 -0.000 -0.388
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.c_Negativicuts 0.145 0.198 0.734 0.463 0.145 3.826
.c_c__MB_A2_108 1.038 0.226 4.594 0.000 1.038 25.076
.Frs_C_CO2_C_C_ -0.346 0.233 -1.485 0.137 -0.346 -8.115
.c_Bacilli 0.376 0.135 2.778 0.005 0.376 9.340
.c_Ignavibacter 0.754 0.170 4.424 0.000 0.754 18.796
CN 0.998 0.007 145.158 0.000 0.998 26.502
pH 0.998 0.008 131.642 0.000 0.998 24.034
initial_water 0.998 0.008 125.994 0.000 0.998 23.003
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.c_Negativicuts 0.001 0.000 3.873 0.000 0.001 0.600
.c_c__MB_A2_108 0.001 0.000 3.833 0.000 0.001 0.689
.Frs_C_CO2_C_C_ 0.001 0.000 3.873 0.000 0.001 0.408
.c_Bacilli 0.001 0.000 3.873 0.000 0.001 0.586
.c_Ignavibacter 0.002 0.000 3.873 0.000 0.002 0.936
CN 0.001 0.000 3.873 0.000 0.001 1.000
initial_water 0.002 0.000 3.873 0.000 0.002 1.000
pH 0.002 0.000 3.873 0.000 0.002 1.000
R-Square:
Estimate
c_Negativicuts 0.400
c_c__MB_A2_108 0.311
Frs_C_CO2_C_C_ 0.592
c_Bacilli 0.414
c_Ignavibacter 0.064
Warning message:
In lav_model_hessian(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :
lavaan WARNING: Hessian is not fully symmetric. Max diff = 5.15131396241486e-05
This example is taken from ?semPaths since we don't have your object.
library('semPlot')
modFile <- tempfile(fileext = '.OUT')
download.file('http://sachaepskamp.com/files/mi1.OUT', modFile)
Use semPlotModel to get the object without plotting. There you can inspect what is to be plotted. I just dug around without reading the docs until I found what it seems to be using.
After you run semPlotModel, the object has an element x#Pars which contains the edges, nodes, and the std which is being used for the edge labels in your case. semPaths also has an argument that allows you to make custom edge labels, so you can take the data you need from x#Pars and add your p-values:
x <- semPlotModel(modFile)
x#Pars
# label lhs edge rhs est std group fixed par
# 1 lambda[11]^{(y)} perfIQ -> pc 1.000 0.6219648 Group 1 TRUE 0
# 2 lambda[21]^{(y)} perfIQ -> pa 0.923 0.5664888 Group 1 FALSE 1
# 3 lambda[31]^{(y)} perfIQ -> oa 1.098 0.6550159 Group 1 FALSE 2
# 4 lambda[41]^{(y)} perfIQ -> ma 0.784 0.4609990 Group 1 FALSE 3
# 5 theta[11]^{(epsilon)} pc <-> pc 5.088 0.6131598 Group 1 FALSE 5
# 10 theta[22]^{(epsilon)} pa <-> pa 5.787 0.6790905 Group 1 FALSE 6
# 15 theta[33]^{(epsilon)} oa <-> oa 5.150 0.5709541 Group 1 FALSE 7
# 20 theta[44]^{(epsilon)} ma <-> ma 7.311 0.7874800 Group 1 FALSE 8
# 21 psi[11] perfIQ <-> perfIQ 3.210 1.0000000 Group 1 FALSE 4
# 22 tau[1]^{(y)} int pc 10.500 NA Group 1 FALSE 9
# 23 tau[2]^{(y)} int pa 10.374 NA Group 1 FALSE 10
# 24 tau[3]^{(y)} int oa 10.663 NA Group 1 FALSE 11
# 25 tau[4]^{(y)} int ma 10.371 NA Group 1 FALSE 12
# 11 lambda[11]^{(y)} perfIQ -> pc 1.000 0.6515609 Group 2 TRUE 0
# 27 lambda[21]^{(y)} perfIQ -> pa 0.923 0.5876948 Group 2 FALSE 1
# 31 lambda[31]^{(y)} perfIQ -> oa 1.098 0.6981974 Group 2 FALSE 2
# 41 lambda[41]^{(y)} perfIQ -> ma 0.784 0.4621919 Group 2 FALSE 3
# 51 theta[11]^{(epsilon)} pc <-> pc 5.006 0.5754684 Group 2 FALSE 14
# 101 theta[22]^{(epsilon)} pa <-> pa 5.963 0.6546148 Group 2 FALSE 15
# 151 theta[33]^{(epsilon)} oa <-> oa 4.681 0.5125204 Group 2 FALSE 16
# 201 theta[44]^{(epsilon)} ma <-> ma 8.356 0.7863786 Group 2 FALSE 17
# 211 psi[11] perfIQ <-> perfIQ 3.693 1.0000000 Group 2 FALSE 13
# 221 tau[1]^{(y)} int pc 10.500 NA Group 2 FALSE 9
# 231 tau[2]^{(y)} int pa 10.374 NA Group 2 FALSE 10
# 241 tau[3]^{(y)} int oa 10.663 NA Group 2 FALSE 11
# 251 tau[4]^{(y)} int ma 10.371 NA Group 2 FALSE 12
# 26 alpha[1] int perfIQ -2.469 NA Group 2 FALSE 18
As you can see there are more edge labels than ones that are plotted, and I have no idea how it chooses which to use, so I am just taking the first four from each group (since there are four edges shown and the stds match those. Maybe there is an option to plot all of them or select which ones you need--I haven't read the docs.
## take first four stds from each group, generate some p-values
l <- sapply(split(x#Pars$std, x#Pars$group), function(x) head(x, 4))
set.seed(1)
l <- sprintf('%.3f, p=%s', l, format.pval(runif(length(l)), digits = 2))
l
# [1] "0.622, p=0.27" "0.566, p=0.37" "0.655, p=0.57" "0.461, p=0.91" "0.652, p=0.20" "0.588, p=0.90" "0.698, p=0.94" "0.462, p=0.66"
Then you can plot the object with your new labels, edgeLabels = l
layout(1:2)
semPaths(
x,
edgeLabels = l,
ask = FALSE, title = FALSE,
what = 'std',
whatLabels = 'std',
style = 'ram',
edge.label.cex = 1.3,
layout = 'tree',
intercepts = FALSE,
residuals = FALSE,
sizeMan = 7
)
With the help from #rawr, I have worked it out. If anybody else needs to include estimates and p-value from Lavaan in their semPaths, here is how it can be done.
#extracting the parameters from the sem model and selecting the interactions relevant for the semPaths (here, I need 12 estimates and p-values)
table2<-parameterEstimates(fitmod.bac.class2,standardized=TRUE) %>% head(12)
#turning the chosen parameters into text
b<-gettextf('%.3f \n p=%.3f', table2$std.all, digits=table2$pvalue)
I can honestly say that I do not understand how the last bit of script works. This is copied from rawr's answer before a lot of trial and error until it worked. There might (quite possibly) be a nicer way to write it, but it works :)
#putting that list into edgeLabels in sempaths
semPaths(fitmod.bac.class2,
what = "std",
edgeLabels = b,
style="ram",
edge.label.cex = 1,
layout = 'tree',
intercepts=FALSE,
residuals=FALSE,
nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
sizeMan=7
)
Just a small, but relevant detail for an improvement for the above answer.
The above code requires an inspection of the parameter table to count how many lines to maintain to specify as in %>%head(4).
We can exclude from the extracted parameter table those lines which lhs and rhs are not equal.
#extracting the parameters from the sem model and selecting the interactions relevant for the semPaths
table2<-parameterEstimates(fitmod.bac.class2,standardized=TRUE)%>%as.dataframe()
table2<-table2[!table2$lhs==table2$rhs,]
If the formula comprised also extra lines as those with ':=' those also will comprise the parameter table, and should be removed.
The remaining keeps the same...
#turning the chosen parameters into text
b<-gettextf('%.3f \n p=%.3f', table2$std.all, digits=table2$pvalue)
#putting that list into edgeLabels in sempaths
semPaths(fitmod.bac.class2,
what = "std",
edgeLabels = b,
style="ram",
edge.label.cex = 1,
layout = 'tree',
intercepts=FALSE,
residuals=FALSE,
nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
sizeMan=7
)

Different outputs using ggpredict for glmer and glmmTMB model

I am trying to predict and graph models with species presence as the response. However I've run into the following problem: the ggpredict outputs are wildly different for the same data in glmer and glmmTMB. However, the estimates and AIC are very similar. These are simplified models only including date (which has been centered and scaled), which seems to be the most problematic to predict.
yntest<- glmer(MYOSOD.P~ jdate.z + I(jdate.z^2) + I(jdate.z^3) +
(1|area/SiteID), family = binomial, data = sodpYN)
> summary(yntest)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: MYOSOD.P ~ jdate.z + I(jdate.z^2) + I(jdate.z^3) + (1 | area/SiteID)
Data: sodpYN
AIC BIC logLik deviance df.resid
1260.8 1295.1 -624.4 1248.8 2246
Scaled residuals:
Min 1Q Median 3Q Max
-2.0997 -0.3218 -0.2013 -0.1238 9.4445
Random effects:
Groups Name Variance Std.Dev.
SiteID:area (Intercept) 1.6452 1.2827
area (Intercept) 0.6242 0.7901
Number of obs: 2252, groups: SiteID:area, 27; area, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.96778 0.39190 -7.573 3.65e-14 ***
jdate.z -0.72258 0.17915 -4.033 5.50e-05 ***
I(jdate.z^2) 0.10091 0.08068 1.251 0.21102
I(jdate.z^3) 0.25025 0.08506 2.942 0.00326 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) jdat.z I(.^2)
jdate.z 0.078
I(jdat.z^2) -0.222 -0.154
I(jdat.z^3) -0.071 -0.910 0.199
The glmmTMB model + summary:
Tyntest<- glmmTMB(MYOSOD.P ~ jdate.z + I(jdate.z^2) + I(jdate.z^3) +
(1|area/SiteID), family = binomial("logit"), data = sodpYN)
> summary(Tyntest)
Family: binomial ( logit )
Formula: MYOSOD.P ~ jdate.z + I(jdate.z^2) + I(jdate.z^3) + (1 | area/SiteID)
Data: sodpYN
AIC BIC logLik deviance df.resid
1260.8 1295.1 -624.4 1248.8 2246
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
SiteID:area (Intercept) 1.6490 1.2841
area (Intercept) 0.6253 0.7908
Number of obs: 2252, groups: SiteID:area, 27; area, 9
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.96965 0.39638 -7.492 6.78e-14 ***
jdate.z -0.72285 0.18250 -3.961 7.47e-05 ***
I(jdate.z^2) 0.10096 0.08221 1.228 0.21941
I(jdate.z^3) 0.25034 0.08662 2.890 0.00385 **
---
ggpredict outputs
testg<-ggpredict(yntest, terms ="jdate.z[all]")
> testg
# Predicted probabilities of MYOSOD.P
# x = jdate.z
x predicted std.error conf.low conf.high
-1.95 0.046 0.532 0.017 0.120
-1.51 0.075 0.405 0.036 0.153
-1.03 0.084 0.391 0.041 0.165
-0.58 0.072 0.391 0.035 0.142
-0.14 0.054 0.390 0.026 0.109
0.35 0.039 0.399 0.018 0.082
0.79 0.034 0.404 0.016 0.072
1.72 0.067 0.471 0.028 0.152
Adjusted for:
* SiteID = 0 (population-level)
* area = 0 (population-level)
Standard errors are on link-scale (untransformed).
testgTMB<- ggpredict(Tyntest, "jdate.z[all]")
> testgTMB
# Predicted probabilities of MYOSOD.P
# x = jdate.z
x predicted std.error conf.low conf.high
-1.95 0.444 0.826 0.137 0.801
-1.51 0.254 0.612 0.093 0.531
-1.03 0.136 0.464 0.059 0.280
-0.58 0.081 0.404 0.038 0.163
-0.14 0.054 0.395 0.026 0.110
0.35 0.040 0.402 0.019 0.084
0.79 0.035 0.406 0.016 0.074
1.72 0.040 0.444 0.017 0.091
Adjusted for:
* SiteID = NA (population-level)
* area = NA (population-level)
Standard errors are on link-scale (untransformed).
The estimates are completely different and I have no idea why.
I did try to use both the ggeffects package from CRAN and the developer version in case that changed anything. It did not. I am using the most up to date version of glmmTMB.
This is my first time asking a question here so please let me know if I should provide more information to help explain the problem.
I checked and the issue is the same when using predict instead of ggpredict, which would imply that it is a glmmTMB issue?
GLMER:
dayplotg<-expand.grid(jdate.z=seq(min(sodp$jdate.z), max(sodp$jdate.z), length=92))
Dfitg<-predict(yntest, re.form=NA, newdata=dayplotg, type='response')
dayplotg<-data.frame(dayplotg, Dfitg)
head(dayplotg)
> head(dayplotg)
jdate.z Dfitg
1 -1.953206 0.04581691
2 -1.912873 0.04889584
3 -1.872540 0.05195598
4 -1.832207 0.05497553
5 -1.791875 0.05793307
6 -1.751542 0.06080781
glmmTMB:
dayplot<-expand.grid(jdate.z=seq(min(sodp$jdate.z), max(sodp$jdate.z), length=92),
SiteID=NA,
area=NA)
Dfit<-predict(Tyntest, newdata=dayplot, type='response')
head(Dfit)
dayplot<-data.frame(dayplot, Dfit)
head(dayplot)
> head(dayplot)
jdate.z SiteID area Dfit
1 -1.953206 NA NA 0.4458236
2 -1.912873 NA NA 0.4251926
3 -1.872540 NA NA 0.4050944
4 -1.832207 NA NA 0.3855801
5 -1.791875 NA NA 0.3666922
6 -1.751542 NA NA 0.3484646
I contacted the ggpredict developer and figured out that if I used poly(jdate.z,3) rather than jdate.z + I(jdate.z^2) + I(jdate.z^3) in the glmmTMB model, the glmer and glmmTMB predictions were the same.
I'll leave this post up even though I was able to answer my own question in case someone else has this question later.

Can I use emmeans with LME model?

I am using LME model defined like:
mod4.lme <- lme(pRNFL ~ Init.Age + Status + I(Time^2), random= ~1|Patient/EyeID,data = long1, na.action = na.omit)
The output is:
> summary(mod4.lme)
Linear mixed-effects model fit by REML
Data: long1
AIC BIC logLik
2055.295 2089.432 -1018.647
Random effects:
Formula: ~1 | Patient
(Intercept)
StdDev: 7.949465
Formula: ~1 | EyeID %in% Patient
(Intercept) Residual
StdDev: 12.10405 2.279917
Fixed effects: pRNFL ~ Init.Age + Status + I(Time^2)
Value Std.Error DF t-value p-value
(Intercept) 97.27827 6.156093 212 15.801950 0.0000
Init.Age 0.02114 0.131122 57 0.161261 0.8725
StatusA -27.32643 3.762155 212 -7.263504 0.0000
StatusF -23.31652 3.984353 212 -5.852023 0.0000
StatusN -0.28814 3.744980 57 -0.076940 0.9389
I(Time^2) -0.06498 0.030223 212 -2.149921 0.0327
Correlation:
(Intr) Int.Ag StatsA StatsF StatsN
Init.Age -0.921
StatusA -0.317 0.076
StatusF -0.314 0.088 0.834
StatusN -0.049 -0.216 0.390 0.365
I(Time^2) -0.006 -0.004 0.001 -0.038 -0.007
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.3565641 -0.4765840 0.0100608 0.4670792 2.7775392
Number of Observations: 334
Number of Groups:
Patient EyeID %in% Patient
60 119
I wanted to get comparisons between my 'Status' factors (named A, N, F and H). So I did a emmeans model using this code:
emmeans(mod4.lme, pairwise ~ Status, adjust="bonferroni")
The output for this, is:
> emmeans(mod4.lme, pairwise ~ Status, adjust="bonferroni")
$emmeans
Status emmean SE df lower.CL upper.CL
H 98.13515 2.402248 57 93.32473 102.94557
A 70.80872 2.930072 57 64.94135 76.67609
F 74.81863 3.215350 57 68.38000 81.25726
N 97.84701 2.829706 57 92.18062 103.51340
Degrees-of-freedom method: containment
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
H - A 27.3264289 3.762155 212 7.264 <.0001
H - F 23.3165220 3.984353 212 5.852 <.0001
H - N 0.2881375 3.744980 57 0.077 1.0000
A - F -4.0099069 2.242793 212 -1.788 0.4513
A - N -27.0382913 4.145370 57 -6.523 <.0001
F - N -23.0283844 4.359019 57 -5.283 <.0001
The answer is yes, emmeans does the calculation based on the model

Emmeans in LME and adjusting for "time since" [duplicate]

I am using LME model defined like:
mod4.lme <- lme(pRNFL ~ Init.Age + Status + I(Time^2), random= ~1|Patient/EyeID,data = long1, na.action = na.omit)
The output is:
> summary(mod4.lme)
Linear mixed-effects model fit by REML
Data: long1
AIC BIC logLik
2055.295 2089.432 -1018.647
Random effects:
Formula: ~1 | Patient
(Intercept)
StdDev: 7.949465
Formula: ~1 | EyeID %in% Patient
(Intercept) Residual
StdDev: 12.10405 2.279917
Fixed effects: pRNFL ~ Init.Age + Status + I(Time^2)
Value Std.Error DF t-value p-value
(Intercept) 97.27827 6.156093 212 15.801950 0.0000
Init.Age 0.02114 0.131122 57 0.161261 0.8725
StatusA -27.32643 3.762155 212 -7.263504 0.0000
StatusF -23.31652 3.984353 212 -5.852023 0.0000
StatusN -0.28814 3.744980 57 -0.076940 0.9389
I(Time^2) -0.06498 0.030223 212 -2.149921 0.0327
Correlation:
(Intr) Int.Ag StatsA StatsF StatsN
Init.Age -0.921
StatusA -0.317 0.076
StatusF -0.314 0.088 0.834
StatusN -0.049 -0.216 0.390 0.365
I(Time^2) -0.006 -0.004 0.001 -0.038 -0.007
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.3565641 -0.4765840 0.0100608 0.4670792 2.7775392
Number of Observations: 334
Number of Groups:
Patient EyeID %in% Patient
60 119
I wanted to get comparisons between my 'Status' factors (named A, N, F and H). So I did a emmeans model using this code:
emmeans(mod4.lme, pairwise ~ Status, adjust="bonferroni")
The output for this, is:
> emmeans(mod4.lme, pairwise ~ Status, adjust="bonferroni")
$emmeans
Status emmean SE df lower.CL upper.CL
H 98.13515 2.402248 57 93.32473 102.94557
A 70.80872 2.930072 57 64.94135 76.67609
F 74.81863 3.215350 57 68.38000 81.25726
N 97.84701 2.829706 57 92.18062 103.51340
Degrees-of-freedom method: containment
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
H - A 27.3264289 3.762155 212 7.264 <.0001
H - F 23.3165220 3.984353 212 5.852 <.0001
H - N 0.2881375 3.744980 57 0.077 1.0000
A - F -4.0099069 2.242793 212 -1.788 0.4513
A - N -27.0382913 4.145370 57 -6.523 <.0001
F - N -23.0283844 4.359019 57 -5.283 <.0001
The answer is yes, emmeans does the calculation based on the model

Resources