R lavaan sem categorical variable no standard error - r

I want to compute a structural equation model with the sem() function in R with the package lavaan.
There are two categorial variables, one latent exogenous and one latent endogenous, I want to include in the final version of the model.
When I include one of the categorial variables in the model, however, R produces the following warning:
1: In estimateVCOV(lavaanModel, samplestats = lavaanSampleStats,
options = lavaanOptions, : lavaan WARNING: could not compute
standard errors!
2: In computeTestStatistic(lavaanModel, partable = lavaanParTable, :
lavaan WARNING: could not compute scaled test statistic
Code used:
model1 <- '
Wertschaetzung_Essen =~ abwechslungsreiche_M + schnell_zubereitbar + koche_sehr_gerne + koche_sehr_haeufig
Fleischverzicht =~ Ern_Index1
Fleischverzicht ~ Wertschaetzung_Essen
'
fit_model1 <- sem(model1, data=survey2_subset, ordered = c("Ern_Index1"))
Note: This is only a small version of the final model and in which I only introduce one categorial variable. The warning, however, is the same for more complex versions of the model.
Output
str(survey2_subset):
'data.frame': 3676 obs. of 116 variables:
$ abwechslungsreiche_M : num 4 2 3 4 3 3 4 3 3 3 ...
$ schnell_zubereitbar : num 0 3 2 0 0 1 3 2 1 1 ...
$ koche_sehr_gerne : num 1 3 3 1 3 1 4 4 4 3 ...
$ koche_sehr_haeufig : num 2 2 3 NA 3 2 2 4 3 3 ...
$ Ern_Index1 : num 1 1 1 1 0 0 1 0 1 0 ...
summary(fit_model1, fit.measures = TRUE, standardized=TRUE)
lavaan (0.5-15) converged normally after 31 iterations
Used Total
Number of observations 3469 3676
Estimator DWLS Robust
Minimum Function Test Statistic 13.716 NA
Degrees of freedom 4 4
P-value (Chi-square) 0.008 NA
Scaling correction factor NA
Shift parameter
for simple second-order correction (Mplus variant)
Model test baseline model:
Minimum Function Test Statistic 2176.159 1582.139
Degrees of freedom 10 10
P-value 0.000 0.000
User model versus baseline model:
Comparative Fit Index (CFI) 0.996 NA
Tucker-Lewis Index (TLI) 0.989 NA
Root Mean Square Error of Approximation:
RMSEA 0.026 NA
90 Percent Confidence Interval 0.012 0.042 NA NA
P-value RMSEA <= 0.05 0.994 NA
Parameter estimates:
Information Expected
Standard Errors Robust.sem
Estimate Std.err Z-value P(>|z|) Std.lv Std.all
Latent variables:
Wertschaetzung_Essen =~
abwchslngsr_M 1.000 0.363 0.436
schnll_zbrtbr 1.179 0.428 0.438
koche_shr_grn 2.549 0.925 0.846
koche_shr_hfg 2.530 0.918 0.775
Fleischverzicht =~
Ern_Index1 1.000 0.249 0.249
Regressions:
Fleischverzicht ~
Wrtschtzng_Es 0.302 0.440 0.440
Intercepts:
abwchslngsr_M 3.133 3.133 3.760
schnll_zbrtbr 1.701 1.701 1.741
koche_shr_grn 2.978 2.978 2.725
koche_shr_hfg 2.543 2.543 2.148
Wrtschtzng_Es 0.000 0.000 0.000
Fleischvrzcht 0.000 0.000 0.000
Thresholds:
Ern_Index1|t1 0.197 0.197 0.197
Variances:
abwchslngsr_M 0.562 0.562 0.810
schnll_zbrtbr 0.771 0.771 0.808
koche_shr_grn 0.339 0.339 0.284
koche_shr_hfg 0.559 0.559 0.399
Ern_Index1 0.938 0.938 0.938
Wrtschtzng_Es 0.132 1.000 1.000
Fleischvrzcht 0.050 0.806 0.806
Is the model not identified? There should be enough degrees of freedom and the loadings of the first manifest items are set to one.
How can I resolve this issue?

My first thought was:
You can´t have missing values in the dataframe, because with categorial variables WLSMV is used and FIML (missing="ML") is only usable with ML estimates. Perhaps that´s a problem.
Also: Does lavaan automatically fix the residual-variance of "Fleischverzicht" to 0 (or some other value)? A single-item latent variable would not be identified without that, I think.

Related

How do I include p-value and R-square for the estimates in semPaths?

I am using semPaths (semPlot package) to draw my structural equation models. After some trial and error, I have a pretty good script to show what I want. Except, I haven’t been able to figure out how to include the p-value/significance levels of the estimates/regression coefficients in the figure.
Can/how can I include significance levels either as e.g. p-value in the edge labels below the estimate or as a broken line for insignificance or …?
I am also interested in including the R-square, but not as critically as the significance level.
This is the script I am using so far:
semPaths(fitmod.bac.class2,
what = "std",
whatLabels = "std",
style="ram",
edge.label.cex = 1.3,
layout = 'tree',
intercepts=FALSE,
residuals=FALSE,
nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
sizeMan=7 )
Example of one of the SemPath outputs
In this example the following are not significant:
Ignavibacteria -> First_C_CO2_ugC_gC_day, p = 0.096
pH -> Ignavibacteria, p = 0.151
cand_class_MB_A2_108 <-> Bacilli correlation, p = 0.054
I am a R-user and not really a coder, so I might just be missing a crucial point in the arguments.
I am testing a lot of different models at the moment, and would really like not to have to draw them all up by hand.
update:
Using semPlotModel: Am I right in understanding that semPlotModel doesn’t include the significance levels from the sem function (see my script and output below)? I am specifically looking to include the P(>|z|) for regressions and covariance.
Is it just me that is missing that, or is it not included? If it is not included, my solution is simply just to custom the edge labels.
{model.NA.UP.bac.class2 <- '
#LATANT VARIABLES
#REGRESSIONS
#soil organic carbon quality
c_Negativicutes ~ CN
#microorganisms
First_C_CO2_ugC_gC_day ~ c_Bacilli
First_C_CO2_ugC_gC_day ~ c_Ignavibacteria
First_C_CO2_ugC_gC_day ~ c_cand_class_MB_A2_108
First_C_CO2_ugC_gC_day ~ c_Negativicutes
#pH
c_Bacilli ~pH
c_Ignavibacteria ~pH
c_cand_class_MB_A2_108~pH
c_Negativicutes ~pH
#COVARIANCE
initial_water ~~ CN
c_cand_class_MB_A2_108 ~~ c_Bacilli
'
fitmod.bac.class2 <- sem(model.NA.UP.bac.class2, data=datapNA.UP.log, missing="ml", meanstructure=TRUE, fixed.x=FALSE, std.lv=FALSE, std.ov=FALSE)
summary(fitmod.bac.class2, standardized=TRUE, fit.measures=TRUE, rsq=TRUE)
out <- capture.output(summary(fitmod.bac.class2, standardized=TRUE, fit.measures=TRUE, rsq=TRUE))
}
Output:
lavaan 0.6-5 ended normally after 188 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 28
Number of observations 30
Number of missing patterns 1
Model Test User Model:
Test statistic 17.816
Degrees of freedom 16
P-value (Chi-square) 0.335
Model Test Baseline Model:
Test statistic 101.570
Degrees of freedom 28
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.975
Tucker-Lewis Index (TLI) 0.957
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) 472.465
Loglikelihood unrestricted model (H1) 481.373
Akaike (AIC) -888.930
Bayesian (BIC) -849.697
Sample-size adjusted Bayesian (BIC) -936.875
Root Mean Square Error of Approximation:
RMSEA 0.062
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.185
P-value RMSEA <= 0.05 0.414
Standardized Root Mean Square Residual:
SRMR 0.107
Parameter Estimates:
Information Observed
Observed information based on Hessian
Standard errors Standard
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
c_Negativicutes ~
CN 0.419 0.143 2.939 0.003 0.419 0.416
c_cand_class_MB_A2_108 ~
CN -0.433 0.160 -2.707 0.007 -0.433 -0.394
First_C_CO2_ugC_gC_day ~
c_Bacilli 0.525 0.128 4.092 0.000 0.525 0.496
c_Ignavibacter 0.207 0.124 1.667 0.096 0.207 0.195
c_c__MB_A2_108 0.310 0.125 2.475 0.013 0.310 0.301
c_Negativicuts 0.304 0.137 2.220 0.026 0.304 0.271
c_Bacilli ~
pH 0.624 0.135 4.604 0.000 0.624 0.643
c_Ignavibacteria ~
pH 0.245 0.171 1.436 0.151 0.245 0.254
c_cand_class_MB_A2_108 ~
pH 0.393 0.151 2.597 0.009 0.393 0.394
c_Negativicutes ~
pH 0.435 0.129 3.361 0.001 0.435 0.476
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
CN ~~
initial_water 0.001 0.000 2.679 0.007 0.001 0.561
.c_cand_class_MB_A2_108 ~~
.c_Bacilli -0.000 0.000 -1.923 0.054 -0.000 -0.388
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.c_Negativicuts 0.145 0.198 0.734 0.463 0.145 3.826
.c_c__MB_A2_108 1.038 0.226 4.594 0.000 1.038 25.076
.Frs_C_CO2_C_C_ -0.346 0.233 -1.485 0.137 -0.346 -8.115
.c_Bacilli 0.376 0.135 2.778 0.005 0.376 9.340
.c_Ignavibacter 0.754 0.170 4.424 0.000 0.754 18.796
CN 0.998 0.007 145.158 0.000 0.998 26.502
pH 0.998 0.008 131.642 0.000 0.998 24.034
initial_water 0.998 0.008 125.994 0.000 0.998 23.003
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.c_Negativicuts 0.001 0.000 3.873 0.000 0.001 0.600
.c_c__MB_A2_108 0.001 0.000 3.833 0.000 0.001 0.689
.Frs_C_CO2_C_C_ 0.001 0.000 3.873 0.000 0.001 0.408
.c_Bacilli 0.001 0.000 3.873 0.000 0.001 0.586
.c_Ignavibacter 0.002 0.000 3.873 0.000 0.002 0.936
CN 0.001 0.000 3.873 0.000 0.001 1.000
initial_water 0.002 0.000 3.873 0.000 0.002 1.000
pH 0.002 0.000 3.873 0.000 0.002 1.000
R-Square:
Estimate
c_Negativicuts 0.400
c_c__MB_A2_108 0.311
Frs_C_CO2_C_C_ 0.592
c_Bacilli 0.414
c_Ignavibacter 0.064
Warning message:
In lav_model_hessian(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :
lavaan WARNING: Hessian is not fully symmetric. Max diff = 5.15131396241486e-05
This example is taken from ?semPaths since we don't have your object.
library('semPlot')
modFile <- tempfile(fileext = '.OUT')
download.file('http://sachaepskamp.com/files/mi1.OUT', modFile)
Use semPlotModel to get the object without plotting. There you can inspect what is to be plotted. I just dug around without reading the docs until I found what it seems to be using.
After you run semPlotModel, the object has an element x#Pars which contains the edges, nodes, and the std which is being used for the edge labels in your case. semPaths also has an argument that allows you to make custom edge labels, so you can take the data you need from x#Pars and add your p-values:
x <- semPlotModel(modFile)
x#Pars
# label lhs edge rhs est std group fixed par
# 1 lambda[11]^{(y)} perfIQ -> pc 1.000 0.6219648 Group 1 TRUE 0
# 2 lambda[21]^{(y)} perfIQ -> pa 0.923 0.5664888 Group 1 FALSE 1
# 3 lambda[31]^{(y)} perfIQ -> oa 1.098 0.6550159 Group 1 FALSE 2
# 4 lambda[41]^{(y)} perfIQ -> ma 0.784 0.4609990 Group 1 FALSE 3
# 5 theta[11]^{(epsilon)} pc <-> pc 5.088 0.6131598 Group 1 FALSE 5
# 10 theta[22]^{(epsilon)} pa <-> pa 5.787 0.6790905 Group 1 FALSE 6
# 15 theta[33]^{(epsilon)} oa <-> oa 5.150 0.5709541 Group 1 FALSE 7
# 20 theta[44]^{(epsilon)} ma <-> ma 7.311 0.7874800 Group 1 FALSE 8
# 21 psi[11] perfIQ <-> perfIQ 3.210 1.0000000 Group 1 FALSE 4
# 22 tau[1]^{(y)} int pc 10.500 NA Group 1 FALSE 9
# 23 tau[2]^{(y)} int pa 10.374 NA Group 1 FALSE 10
# 24 tau[3]^{(y)} int oa 10.663 NA Group 1 FALSE 11
# 25 tau[4]^{(y)} int ma 10.371 NA Group 1 FALSE 12
# 11 lambda[11]^{(y)} perfIQ -> pc 1.000 0.6515609 Group 2 TRUE 0
# 27 lambda[21]^{(y)} perfIQ -> pa 0.923 0.5876948 Group 2 FALSE 1
# 31 lambda[31]^{(y)} perfIQ -> oa 1.098 0.6981974 Group 2 FALSE 2
# 41 lambda[41]^{(y)} perfIQ -> ma 0.784 0.4621919 Group 2 FALSE 3
# 51 theta[11]^{(epsilon)} pc <-> pc 5.006 0.5754684 Group 2 FALSE 14
# 101 theta[22]^{(epsilon)} pa <-> pa 5.963 0.6546148 Group 2 FALSE 15
# 151 theta[33]^{(epsilon)} oa <-> oa 4.681 0.5125204 Group 2 FALSE 16
# 201 theta[44]^{(epsilon)} ma <-> ma 8.356 0.7863786 Group 2 FALSE 17
# 211 psi[11] perfIQ <-> perfIQ 3.693 1.0000000 Group 2 FALSE 13
# 221 tau[1]^{(y)} int pc 10.500 NA Group 2 FALSE 9
# 231 tau[2]^{(y)} int pa 10.374 NA Group 2 FALSE 10
# 241 tau[3]^{(y)} int oa 10.663 NA Group 2 FALSE 11
# 251 tau[4]^{(y)} int ma 10.371 NA Group 2 FALSE 12
# 26 alpha[1] int perfIQ -2.469 NA Group 2 FALSE 18
As you can see there are more edge labels than ones that are plotted, and I have no idea how it chooses which to use, so I am just taking the first four from each group (since there are four edges shown and the stds match those. Maybe there is an option to plot all of them or select which ones you need--I haven't read the docs.
## take first four stds from each group, generate some p-values
l <- sapply(split(x#Pars$std, x#Pars$group), function(x) head(x, 4))
set.seed(1)
l <- sprintf('%.3f, p=%s', l, format.pval(runif(length(l)), digits = 2))
l
# [1] "0.622, p=0.27" "0.566, p=0.37" "0.655, p=0.57" "0.461, p=0.91" "0.652, p=0.20" "0.588, p=0.90" "0.698, p=0.94" "0.462, p=0.66"
Then you can plot the object with your new labels, edgeLabels = l
layout(1:2)
semPaths(
x,
edgeLabels = l,
ask = FALSE, title = FALSE,
what = 'std',
whatLabels = 'std',
style = 'ram',
edge.label.cex = 1.3,
layout = 'tree',
intercepts = FALSE,
residuals = FALSE,
sizeMan = 7
)
With the help from #rawr, I have worked it out. If anybody else needs to include estimates and p-value from Lavaan in their semPaths, here is how it can be done.
#extracting the parameters from the sem model and selecting the interactions relevant for the semPaths (here, I need 12 estimates and p-values)
table2<-parameterEstimates(fitmod.bac.class2,standardized=TRUE) %>% head(12)
#turning the chosen parameters into text
b<-gettextf('%.3f \n p=%.3f', table2$std.all, digits=table2$pvalue)
I can honestly say that I do not understand how the last bit of script works. This is copied from rawr's answer before a lot of trial and error until it worked. There might (quite possibly) be a nicer way to write it, but it works :)
#putting that list into edgeLabels in sempaths
semPaths(fitmod.bac.class2,
what = "std",
edgeLabels = b,
style="ram",
edge.label.cex = 1,
layout = 'tree',
intercepts=FALSE,
residuals=FALSE,
nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
sizeMan=7
)
Just a small, but relevant detail for an improvement for the above answer.
The above code requires an inspection of the parameter table to count how many lines to maintain to specify as in %>%head(4).
We can exclude from the extracted parameter table those lines which lhs and rhs are not equal.
#extracting the parameters from the sem model and selecting the interactions relevant for the semPaths
table2<-parameterEstimates(fitmod.bac.class2,standardized=TRUE)%>%as.dataframe()
table2<-table2[!table2$lhs==table2$rhs,]
If the formula comprised also extra lines as those with ':=' those also will comprise the parameter table, and should be removed.
The remaining keeps the same...
#turning the chosen parameters into text
b<-gettextf('%.3f \n p=%.3f', table2$std.all, digits=table2$pvalue)
#putting that list into edgeLabels in sempaths
semPaths(fitmod.bac.class2,
what = "std",
edgeLabels = b,
style="ram",
edge.label.cex = 1,
layout = 'tree',
intercepts=FALSE,
residuals=FALSE,
nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
sizeMan=7
)

Why am I getting different predicted probabilities on random forest rf$votes vs. predict()?

I ran randomForest on a dataset with binary outcome and want the predicted probabilities (on the same dataset - I don't need separate train/test for this). I was expecting the values for p1 and p2 below to be the same, but clearly they are not. I haven't been able to find a clear description of how they are different. Any help would be appreciated.
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
rf = randomForest(factor(admit)~., data = mydata)
p1 = predict(rf, mydata[,c(2:4)], type = "prob")
p2 <- rf$votes
> head(p1)
0 1
1 0.926 0.074
2 0.584 0.416
3 0.166 0.834
4 0.722 0.278
5 0.968 0.032
6 0.258 0.742
> head(p2)
0 1
1 0.8324324 0.16756757
2 0.7663043 0.23369565
3 0.2447917 0.75520833
4 0.9695431 0.03045685
5 0.9264706 0.07352941
6 0.3351351 0.66486486

Different outputs using ggpredict for glmer and glmmTMB model

I am trying to predict and graph models with species presence as the response. However I've run into the following problem: the ggpredict outputs are wildly different for the same data in glmer and glmmTMB. However, the estimates and AIC are very similar. These are simplified models only including date (which has been centered and scaled), which seems to be the most problematic to predict.
yntest<- glmer(MYOSOD.P~ jdate.z + I(jdate.z^2) + I(jdate.z^3) +
(1|area/SiteID), family = binomial, data = sodpYN)
> summary(yntest)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: MYOSOD.P ~ jdate.z + I(jdate.z^2) + I(jdate.z^3) + (1 | area/SiteID)
Data: sodpYN
AIC BIC logLik deviance df.resid
1260.8 1295.1 -624.4 1248.8 2246
Scaled residuals:
Min 1Q Median 3Q Max
-2.0997 -0.3218 -0.2013 -0.1238 9.4445
Random effects:
Groups Name Variance Std.Dev.
SiteID:area (Intercept) 1.6452 1.2827
area (Intercept) 0.6242 0.7901
Number of obs: 2252, groups: SiteID:area, 27; area, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.96778 0.39190 -7.573 3.65e-14 ***
jdate.z -0.72258 0.17915 -4.033 5.50e-05 ***
I(jdate.z^2) 0.10091 0.08068 1.251 0.21102
I(jdate.z^3) 0.25025 0.08506 2.942 0.00326 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) jdat.z I(.^2)
jdate.z 0.078
I(jdat.z^2) -0.222 -0.154
I(jdat.z^3) -0.071 -0.910 0.199
The glmmTMB model + summary:
Tyntest<- glmmTMB(MYOSOD.P ~ jdate.z + I(jdate.z^2) + I(jdate.z^3) +
(1|area/SiteID), family = binomial("logit"), data = sodpYN)
> summary(Tyntest)
Family: binomial ( logit )
Formula: MYOSOD.P ~ jdate.z + I(jdate.z^2) + I(jdate.z^3) + (1 | area/SiteID)
Data: sodpYN
AIC BIC logLik deviance df.resid
1260.8 1295.1 -624.4 1248.8 2246
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
SiteID:area (Intercept) 1.6490 1.2841
area (Intercept) 0.6253 0.7908
Number of obs: 2252, groups: SiteID:area, 27; area, 9
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.96965 0.39638 -7.492 6.78e-14 ***
jdate.z -0.72285 0.18250 -3.961 7.47e-05 ***
I(jdate.z^2) 0.10096 0.08221 1.228 0.21941
I(jdate.z^3) 0.25034 0.08662 2.890 0.00385 **
---
ggpredict outputs
testg<-ggpredict(yntest, terms ="jdate.z[all]")
> testg
# Predicted probabilities of MYOSOD.P
# x = jdate.z
x predicted std.error conf.low conf.high
-1.95 0.046 0.532 0.017 0.120
-1.51 0.075 0.405 0.036 0.153
-1.03 0.084 0.391 0.041 0.165
-0.58 0.072 0.391 0.035 0.142
-0.14 0.054 0.390 0.026 0.109
0.35 0.039 0.399 0.018 0.082
0.79 0.034 0.404 0.016 0.072
1.72 0.067 0.471 0.028 0.152
Adjusted for:
* SiteID = 0 (population-level)
* area = 0 (population-level)
Standard errors are on link-scale (untransformed).
testgTMB<- ggpredict(Tyntest, "jdate.z[all]")
> testgTMB
# Predicted probabilities of MYOSOD.P
# x = jdate.z
x predicted std.error conf.low conf.high
-1.95 0.444 0.826 0.137 0.801
-1.51 0.254 0.612 0.093 0.531
-1.03 0.136 0.464 0.059 0.280
-0.58 0.081 0.404 0.038 0.163
-0.14 0.054 0.395 0.026 0.110
0.35 0.040 0.402 0.019 0.084
0.79 0.035 0.406 0.016 0.074
1.72 0.040 0.444 0.017 0.091
Adjusted for:
* SiteID = NA (population-level)
* area = NA (population-level)
Standard errors are on link-scale (untransformed).
The estimates are completely different and I have no idea why.
I did try to use both the ggeffects package from CRAN and the developer version in case that changed anything. It did not. I am using the most up to date version of glmmTMB.
This is my first time asking a question here so please let me know if I should provide more information to help explain the problem.
I checked and the issue is the same when using predict instead of ggpredict, which would imply that it is a glmmTMB issue?
GLMER:
dayplotg<-expand.grid(jdate.z=seq(min(sodp$jdate.z), max(sodp$jdate.z), length=92))
Dfitg<-predict(yntest, re.form=NA, newdata=dayplotg, type='response')
dayplotg<-data.frame(dayplotg, Dfitg)
head(dayplotg)
> head(dayplotg)
jdate.z Dfitg
1 -1.953206 0.04581691
2 -1.912873 0.04889584
3 -1.872540 0.05195598
4 -1.832207 0.05497553
5 -1.791875 0.05793307
6 -1.751542 0.06080781
glmmTMB:
dayplot<-expand.grid(jdate.z=seq(min(sodp$jdate.z), max(sodp$jdate.z), length=92),
SiteID=NA,
area=NA)
Dfit<-predict(Tyntest, newdata=dayplot, type='response')
head(Dfit)
dayplot<-data.frame(dayplot, Dfit)
head(dayplot)
> head(dayplot)
jdate.z SiteID area Dfit
1 -1.953206 NA NA 0.4458236
2 -1.912873 NA NA 0.4251926
3 -1.872540 NA NA 0.4050944
4 -1.832207 NA NA 0.3855801
5 -1.791875 NA NA 0.3666922
6 -1.751542 NA NA 0.3484646
I contacted the ggpredict developer and figured out that if I used poly(jdate.z,3) rather than jdate.z + I(jdate.z^2) + I(jdate.z^3) in the glmmTMB model, the glmer and glmmTMB predictions were the same.
I'll leave this post up even though I was able to answer my own question in case someone else has this question later.

Linear model in R - Multiplication Expression

I have 3 numerical variables A, B and C. I am trying to create a linear model capable of predicting A. The expression that I am using is the product of B*C in order to predict A; however, when looking at the output I am not able to get my equation because I get and extra variable that I don't know what is it.
Here is my code
MyData<-read.csv("...", header = T)
head(MyData,6)
str(MyData)
#Linear Model
#Expersion A= B*C
Model1<-lm(MyData$A~MyData$B*MyData$C)
summary(Model1)
Output of str(MyData)
> str(MyData)
'data.frame': 6 obs. of 3 variables:
$ A: num 2.5 3.4 2.7 3.6 2.5 2.1
$ B: num 0.01 0.02 0.015 0.017 0.018 0.01
$ C: num 0.1 0.2 0.27 0.19 0.17 0.16
Output of summary(Model1)
Call:
lm(formula = MyData$A ~ MyData$B * MyData$C)
Residuals:
1 2 3 4 5 6
-0.03945 -0.08386 -0.13925 0.67703 -0.40055 -0.01393
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.473 5.774 0.948 0.443
MyData$B -222.431 454.508 -0.489 0.673
MyData$C -26.482 36.222 -0.731 0.541
MyData$B:MyData$C 1938.961 2679.207 0.724 0.544
Residual standard error: 0.5688 on 2 degrees of freedom
Multiple R-squared: 0.6149, Adjusted R-squared: 0.03723
F-statistic: 1.064 on 3 and 2 DF, p-value: 0.5178
lm uses the Wilkinson-Rogers notation so "*" is an iteration, based on the output, right? is this true, how do I create my model using the product of my two variables?
If you just want a single term that is the literal product of the two variables, not an interaction, you can use I():
Model1 <- lm(MyData$A ~ I(MyData$B * MyData$C))
I think in practice, with 2 numeric variables, this ends up the same as Dan's suggestion to use x1:x2 to get just the interaction without the terms for each individual predictor, but it might differ in other cases.

car::Anova Way to have a covariate that does not interact with the within-subject factors

I would like to run an ANCOVA using car::Anova but cannot find out if there is a way to add a covariate only as a main effect (i.e., should not interact with anything).
As far as I understand ANCOVA, covariates are just another main effect added to the model (i.e., one more effect), thereby controlling for the overall additive influence of this covariate. Followingly, the covariate(s) do not interact with the other factors. However, I cannot add a variable to Anova that does not interact with the within-subject factors (i.e., my final model does not seem to ba an ANCOVA).
Let me illustrate my problem with an example from ?Anova. The OBrienKaiser data set has 2 between (treatment and gender) and 2 within (phase and hour) factors. Now lets assume we also recorded the age of the participants and would like to add it as a covariate to the any analysis.
require(car)
set.seed(1)
n.OBrienKaiser <- within(OBrienKaiser, age <- sample(18:35, size = 16, replace = TRUE))
# the next part is taken from ?Anova
# I only modified the mod.ok <- ... call by adding + age
phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5, 5)), levels=c("pretest", "posttest", "followup"))
hour <- ordered(rep(1:5, 3))
idata <- data.frame(phase, hour)
mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5, post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~ treatment*gender + age, data=n.OBrienKaiser)
(av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type = 3))
As the results show, the results contain interaction with the covariate age, namely of the within-subject (or repeated-measures) factors phase, hour and their interaction phase:hour:
Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
(Intercept) 1 0.129 1.33 1 9 0.278
treatment 2 0.443 3.58 2 9 0.072 .
gender 1 0.305 3.95 1 9 0.078 .
age 1 0.054 0.52 1 9 0.490
treatment:gender 2 0.222 1.28 2 9 0.323
phase 1 0.418 2.87 2 8 0.115
treatment:phase 2 0.871 3.47 4 18 0.029 *
gender:phase 1 0.084 0.37 2 8 0.703
age:phase 1 0.393 2.59 2 8 0.136
treatment:gender:phase 2 0.545 1.69 4 18 0.197
hour 1 0.565 1.95 4 6 0.222
treatment:hour 2 0.580 0.72 8 14 0.676
gender:hour 1 0.310 0.68 4 6 0.633
age:hour 1 0.508 1.55 4 6 0.301
treatment:gender:hour 2 0.707 0.96 8 14 0.504
phase:hour 1 0.975 9.56 8 2 0.098 .
treatment:phase:hour 2 1.145 0.50 16 6 0.873
gender:phase:hour 1 0.693 0.56 8 2 0.770
age:phase:hour 1 0.974 9.40 8 2 0.100 .
treatment:gender:phase:hour 2 1.314 0.72 16 6 0.723
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
My question is: Can one run a ANCOVA with car::Anova and if so is there a way to specify this ANCOVA without any interaction of age?
Update (July 22, 2012): I asked this question on R-help, but so far no responses. If there are news, I will post it here.
I asked this question on R-help which started a helpful discussion with John Fox (later joined by Peter Dalgaard). Unfortunately it got split up into two threads: one, two.
The punchline is:
"The within-subjects contrasts are constructed by Anova() to be orthogonal in the row-basis of the design, so you should be able to safely ignore the effects in which (for some reason that escapes me) you are uninterested." (John Fox)
So the answer to the question is: No one can't, but it doesn't matter because these interactions do not alter the other effects as they are orthogonal.

Resources