I'm trying to run a gam using the mgcv package with a response variable which is proportional data. The data is overdispered so initially I used a quasibinomial distribution. However because I'm using model selection that's not particularly useful as it does not produce AIC scores.
Instead I'm trying to use betar distribution, as I've read that it could be appropriate.
mRI_br <- bam(ri ~ SE_score + s(deg, k=7) + s(gs, k=7) + TL + species + sex + season + year + s(code, bs = 're') + s(station, bs = 're'), family=betar(), data=node_dat, na.action = "na.fail")
However I'm getting this warnings when I run the model.
Warning messages:
1: In estimate.theta(theta, family, y, mu, scale = scale1, ... :
step failure in theta estimation
And when I try and check the model summary I get this error.
> summary(mRI_br)
Error in chol.default(diag(p) + crossprod(S2 %*% t(R2))) :
the leading minor of order 62 is not positive definite
I would like to know:
What is causing these errors and warnings, and how can they be solved?
If not are there any other distributions that can be used with proportion data which enable me to subsequently use model selection techniques (such as the dredge function from the MuMIn package.
A copy of the dataset can be found here
Related
I ran a linear mixed-effects model looking at the effect of Stress and Lifestyle (HLI) on cognitive change over Time using the lme4 package in R...
mod <- lmer(3MS ~ age + sex + edu + Stress*Time*HLI + (1|ID), data=dflong, na.action = na.omit)
I'm most interested in decomposing the 3-way interaction between Stress*Lifestyle*Time. Specifically, I want to get the interaction contrasts to look at the conditional effects of Stress*Time at -1SD, mean, and +1SD of HLI. To do this, I am using the interactions package to decompose the interaction...
sim_slopes(
model=mod,
pred=Stress*Time,
modx=HLI,
data = dflong)
But I'm receiving the following error
Error in as(x, "matrix")[i = i, j = j, drop = drop] :
subscript out of bounds
In addition: Warning message:
Stress * Time and HLI are not included in an interaction with one another in
the model.
I'm not sure why I'm getting this error or how to go about fixing it? Or if there's another package that can do what I need in a better way? I'm not familiar with any though.
Thanks so much in advance!!
I am running a linear mixed effects models using the "nlme" package looking at stress and lifestyle as predictors of change in cognition over 4 years in a longitudinal dataset. All variables in the model are continuous variables.
I am able to create the model and get the summary statistics using this code:
mod1 <- lme(MS ~ age + sex + edu + GDST1*Time + HLI*Time + GDST1*HLI*Time, random= ~ 1|ID, data=NuAge_long, na.action=na.omit)
summary(mod1)
I am trying to use the "interactions" package to probe the 3-way interaction:
sim_slopes(model = mod1, pred = Time, modx = GDST1, mod2 = HLI, data = NuAge_long)
but am receiving this error:
Error in if (tcol == "df") tcol <- "t val." : argument is of length zero
I am also trying to plot the interaction using the same "interactions" package:
interact_plot(model = mod1, pred = Time, modx = GDST1, mod2 = HLI, data = NuAge_long)
and am receiving this error:
Error in UseMethod("family") : no applicable method for 'family' applied to an object of class "lme"
I can't seem to find what these errors mean and why I'm getting them. Any help would be appreciated!
From ?interactions::sim_slopes:
The function is tested with ‘lm’, ‘glm’,
‘svyglm’, ‘merMod’, ‘rq’, ‘brmsfit’, ‘stanreg’ models. Models
from other classes may work as well but are not officially
supported. The model should include the interaction of
interest.
Note this does not include lme models. On the other hand, merMod models are those generated by lme4::[g]lmer(), and as far as I can tell you should be able to fit this model equally well with lmer():
library(lme4)
mod1 <- lmer(MS ~ age + sex + edu + GDST1*Time + HLI*Time + GDST1*HLI*Time
+ (1|ID), data=NuAge_long)
(things will get harder if you want to specify correlation structures, e.g. correlation = corAR1(), which works for lme() but not lmer() ...)
Usually my mixed models contain several categorical variables with a lot of unique levels, so X matrix is very sparse.
I use glmmTMB package that handles X and Z matrices as sparse. This significantly reduced RAM usage during fitting the model.
The glmmTMB package is great, but there is one problem for me (maybe I'm missing something):
when I use interactions between a numeric variable and a categorical variable (as FE), the model is fitted without errors.
For example, this model works well:
fit = glmmTMB(Y ~ 0 + num1:factor1 + num2:factor1 + factor2 +
(0 + num3|subject) + (0 + num4|subject) + (1|subject),
model_data, REML = TRUE, sparseX=c(cond=TRUE))
But when I use any interactions between two categorical variables i.e. the formula looks like this:
fit = glmmTMB(Y ~ 0 + num1:factor1 + factor3:factor1 + factor2 +
(0 + num2|subject) + (0 + num3|subject) + (1|subject),
model_data, REML = TRUE, sparseX=c(cond=TRUE)),
I get the following error:
iter: 5 Error in newton(par = c(beta = 1, beta = 1, beta = 1, beta = 1, beta = 1, :
Newton failed to find minimum.
In addition: Warning message:
In (function (start, objective, gradient = NULL, hessian = NULL, :
NA/NaN function evaluation
outer mgc: NaN
Error in (function (start, objective, gradient = NULL, hessian = NULL, :
gradient function must return a numeric vector of length 4
At the same time, in mixed model theory, interactions between two categorical variables are valid.
Moreover, such model (with interactions between two factors) is successfully fitted with Julia MixedModels package.
Could you help me, please, to understand root of this error?
Is there a way to avoid it in a model with interactions between two categorical variables?
Why does such models work with Julia MixedModels and not work with glmmTMB?
I have run a zero-inflated negative binomial model using the glmmADMB package in R. From what I understand, the pz parameter is the zero-inflation parameter and it is fitted by the package to the model that you run- the pz value that best fits your data is searched for and the package starts searching from pz=0.2. This is the default and can be changed.
After you run the model, does anyone know how to find what pz value is chosen for the data?
The zero-inflation estimate can be obtained (along with its standard deviation) from the model object. See below using built-in data from the glmmADMB package:
library(glmmADMB)
# munge data
Owls = transform(Owls, Nest = reorder(Nest, NegPerChick),
logBroodSize = log(BroodSize), NCalls = SiblingNegotiation)
# fit model
fit_zinb = glmmadmb(NCalls ~ (FoodTreatment + ArrivalTime) * SexParent +
offset(logBroodSize) + (1 | Nest),
data = Owls, zeroInflation = TRUE,
family = "nbinom")
# overall summary, check for match
summary(fit_zinb)
# zero-inflation estimate
fit_zinb$pz
# zero-inflation standard deviation
fit_zinb$sd_pz
I have built a survival cox-model, which includes a covariate * time interaction (non-proportionality detected).
I am now wondering how could I most easily get survival predictions from my model.
My model was specified:
coxph(formula = Surv(event_time_mod, event_indicator_mod) ~ Sex +
ageC + HHcat_alt + Main_Branch + Acute_seizure + TreatmentType_binary +
ICH + IVH_dummy + IVH_dummy:log(event_time_mod)
And now I was hoping to get a prediction using survfit and providing new.data for the combination of variables I am doing the predictions:
survfit(cox, new.data=new)
Now as I have event_time_mod in the right-hand side in my model I need to specify it in the new data frame passed on to survfit. This event_time would need to be set at individual times of the predictions. Is there an easy way to specify event_time_mod to be the correct time to survfit?
Or are there any other options for achieving predictions from my model?
Of course I could create as many rows in the new data frame as there are distinct times in the predictions and setting to event_time_mod to correct values but it feels really cumbersome and I thought that there must be a better way.
You have done what is refereed to as
An obvious but incorrect approach ...
as stated in Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model vignette in version 2.41-3 of the R survival package. Instead, you should use the time-transform functionality, i.e., the tt function as stated in the same vignette. The code would be something similar to the example in the vignette
> library(survival)
> vfit3 <- coxph(Surv(time, status) ~ trt + prior + karno + tt(karno),
+ data=veteran,
+ tt = function(x, t, ...) x * log(t+20))
>
> vfit3
Call:
coxph(formula = Surv(time, status) ~ trt + prior + karno + tt(karno),
data = veteran, tt = function(x, t, ...) x * log(t + 20))
coef exp(coef) se(coef) z p
trt 0.01648 1.01661 0.19071 0.09 0.9311
prior -0.00932 0.99073 0.02030 -0.46 0.6462
karno -0.12466 0.88279 0.02879 -4.33 1.5e-05
tt(karno) 0.02131 1.02154 0.00661 3.23 0.0013
Likelihood ratio test=53.8 on 4 df, p=5.7e-11
n= 137, number of events= 128
The survfit though does not work when you have a tt term
> survfit(vfit3, veteran[1, ])
Error in survfit.coxph(vfit3, veteran[1, ]) :
The survfit function can not yet process coxph models with a tt term
However, you can easily get out the terms, linear predictor or mean response with predict. Further, you can create the term over time for the tt term using the answer here.