R equivalent to random residual by subject in SAS - r

I can code this problem in SAS with residual as a random effect ( I believe this is a r-side random intercept by fish)
proc glimmix data=one method=mmpl ;
class fish;
model increment =age growth_year age*growth_year;
random residual / subject=fish ;
run;
Here is the same analysis with AR(1) covariance structure.
proc glimmix data=one method=mmpl ;
class fish;
model increment =age growth_year age*growth_year;
random residual/ subject=fish type = ar(1) ;
run;
Here is my attempt in R to reproduce the first model that doesn't work.
model = lmer(increment ~ age + growth_year+ age*growth_year
+ (resid()|fish), data = SR_data)
Please Help, Use of lmer or glmer(gamma instead of normal distribution) or lme or any other package that I am unaware of.

The lme4 package doesn't allow R-side models, but nlme does. If you want correlation within fish without random effects of fish (i.e. R-side effects only, without any G-side effects), then I think you want to use gls: here's an example using the Orthodont data from the nlme package:
library("nlme")
gls(distance~age*Sex, correlation=corAR1(form=~1|Subject), data=Orthodont)
If you want to allow variation in the baseline value/intercept by group (both G- and R-side), then you'd use:
lme(distance~age*Sex, random = ~1|Subject,
correlation=corAR1(form=~1|Subject), data=Orthodont)
If you want variation in the baseline but not correlated residuals within subject (G-side only):
lme(distance~age*Sex, random=~1|Subject, data=Orthodont)
or
library(lme4)
lmer(distance~age*Sex + (1|Subject), data=Orthodont)

Related

Is there a way to include an autocorrelation structure in the gam function of mgcv?

I am building a model using the mgcv package in r. The data has serial measures (data collected during scans 15 minutes apart in time, but discontinuously, e.g. there might be 5 consecutive scans on one day, and then none until the next day, etc.). The model has a binomial response, a random effect of day, a fixed effect, and three smooth effects. My understanding is that REML is the best fitting method for binomial models, but that this method cannot be specified using the gamm function for a binomial model. Thus, I am using the gam function, to allow for the use of REML fitting. When I fit the model, I am left with residual autocorrelation at a lag of 2 (i.e. at 30 minutes), assessed using ACF and PACF plots.
So, we wanted to include an autocorrelation structure in the model, but my understanding is that only the gamm function and not the gam function allows for the inclusion of such structures. I am wondering if there is anything I am missing and/or if there is a way to deal with autocorrelation with a binomial response variable in a GAMM built in mgcv.
My current model structure looks like:
gam(Response ~
s(Day, bs = "re") +
s(SmoothVar1, bs = "cs") +
s(SmoothVar2, bs = "cs") +
s(SmoothVar3, bs = "cs") +
as.factor(FixedVar),
family=binomial(link="logit"), method = "REML",
data = dat)
I tried thinning my data (using only every 3rd data point from consecutive scans), but found this overly restrictive to allow effects to be detected due to my relatively small sample size (only 42 data points left after thinning).
I also tried using the prior value of the binomial response variable as a factor in the model to account for the autocorrelation. This did appear to resolve the residual autocorrelation (based on the updated ACF/PACF plots), but it doesn't feel like the most elegant way to do so and I worry this added variable might be adjusting for more than just the autocorrelation (though it was not collinear with the other explanatory variables; VIF < 2).
I would use bam() for this. You don't need to have big data to fit a with bam(), you just loose some of the guarantees about convergence that you get with gam(). bam() will fit a GEE-like model with an AR(1) working correlation matrix, but you need to specify the AR parameter via rho. This only works for non-Gaussian families if you also set discrete = TRUE when fitting the model.
You could use gamm() with family = binomial() but this uses PQL to estimate the GLMM version of the GAMM and if your binomial counts are low this method isn't very good.

Longitudinal analysis using sampling weigths in R

I have longitudinal data from two surveys and I want to do a pre-post analysis. Normally, I would use survey::svyglm() or svyVGAM::svy_vglm (for multinomial family) to include sampling weights, but these functions don't account for the random effects. On the other hand, lme4::lmer accounts for the repeated measures, but not the sampling weights.
For continuous outcomes, I understand that I can do
w_data_wide <- svydesign(ids = ~1, data = data_wide, weights = data_wide$weight)
svyglm((post-pre) ~ group, w_data_wide)
and get the same estimates that I would get if I could use lmer(outcome ~ group*time + (1|id), data_long) with weights [please correct me if I'm wrong].
However, for categorical variables, I don't know how to do the analyses. WeMix::mix() has a parameter weights, but I'm not sure if it treats them as sampling weights. Still, this function can't support multinomial family.
So, to resume: can you enlighten me on how to do a pre-post test analysis of categorical outcomes with 2 or more levels? Any tips about packages/functions in R and how to use/write them would be appreciated.
I give below some data sets with binomial and multinomial outcomes:
library(data.table)
set.seed(1)
data_long <- data.table(
id=rep(1:5,2),
time=c(rep("Pre",5),rep("Post",5)),
outcome1=sample(c("Yes","No"),10,replace=T),
outcome2=sample(c("Low","Medium","High"),10,replace=T),
outcome3=rnorm(10),
group=rep(sample(c("Man","Woman"),5,replace=T),2),
weight=rep(c(1,0.5,1.5,0.75,1.25),2)
)
data_wide <- dcast(data_long, id~time, value.var = c('outcome1','outcome2','outcome3','group','weight'))[, `:=` (weight_Post = NULL, group_Post = NULL)]
EDIT
As I said below in the comments, I've been using lmer and glmer with variables used to calculate the weights as predictors. It happens that glmer returns a lot of problems (convergence, high eigenvalues...), so I give another look at #ThomasLumley answer in this post and others (https://stat.ethz.ch/pipermail/r-help/2012-June/315529.html | https://stats.stackexchange.com/questions/89204/fitting-multilevel-models-to-complex-survey-data-in-r).
So, my question is now if a can use participants id as clusters in svydesign
library(survey)
w_data_long_cluster <- svydesign(ids = ~id, data = data_long, weights = data_long$weight)
summary(svyglm(factor(outcome1) ~ group*time, w_data_long_cluster, family="quasibinomial"))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.875e+01 1.000e+00 18.746 0.0339 *
groupWoman -1.903e+01 1.536e+00 -12.394 0.0513 .
timePre 5.443e-09 5.443e-09 1.000 0.5000
groupWoman:timePre 2.877e-01 1.143e+00 0.252 0.8431
and still interpret groupWoman:timePre as differences in the average rate of change/improvement in the outcome over time between sex groups, as if I was using mixed models with participants as random effects.
Thank you once again!
A linear model with svyglm does not give the same parameter estimates as lme4::lmer. It does estimate the same parameters as lme4::lmer if the model is correctly specified, though.
Generalised linear models with svyglm or svy_vglm don't estimate the same parameters as lme4::glmer, as you note. However, they do estimate perfectly good regression parameters and if you aren't specifically interested in the variance components or in estimating the realised random effects (BLUPs) I would recommend just using svy_glm.
Another option if you have non-survey software for random effects versions of the models is to use that. If you scale the weights to sum to the sample size and if all the clustering in the design is modelled by random effects in the model, you will get at least a reasonable approximation to valid inference. That's what I've seen recommended for Bayesian survey modelling, for example.

How to test significant improvement of LRM model

Using the rms package of Frank Harrell I constructed a predictive model using the lrm function.
I want to compare if this model has a significant better predictive value on a binomial event in comparison with another (lrm-) model.
I used different functions like anova(model1, model2) or the pR2 function of the pscl library to compare the pseudo R^2, but they all don't work with the lrm based model.
What can I do best to see if my new model is significant beter than the earlier model?
Update: Here is a example (where I want to predict the chance on bone metastasis) to check if size or stage (in addition to other variabele) gives the best model:
library(rms)
getHdata(prostate)
ddd <- datadist(prostate)
options( datadist = "ddd" )
mod1 = lrm(as.factor(bm) ~ age + sz + rx, data=prostate, x=TRUE, y=TRUE)
mod2 = lrm(as.factor(bm) ~ age + stage + rx, data=prostate, x=TRUE, y=TRUE)
It seems fundamentally the question is about comparing two non-nested models.
If you fit your models using the glm function you can use the -vuong- function in -pscl- package.
To test the fit of 2 nested models, you can use the lrtest function from the "rms" package.
lrtest(mod1,mod2)

How to extract the value of the loss function of Cox models from glmnet in R?

I fit a given data using Cox model via glmnet R package and my
little R example is:
library(fastcox);data(FHT);attach(FHT) #
library(glmnet)
library(survival)
fit = glmnet(x,Surv(y,status),family="cox",alpha=1)
From the help document, we know glmnet fits penalized models like
-loglik/nobs + λ*penalty
i.e., objective function = loss function + penalty function.
I want to fetch -loglik/nobs (loss function value,
the negative partial log-likelihood of the fitted model
or two term
Taylor series expansions of the log likelihoods) from the fit object.
Any idea? Tks
BTW, we also tried
fit0 = glmnet(x,Surv(y,status),family="cox",alpha=1,lambda=0)
according to -loglik/nobs + λ*penalty, but it shows errors.

How do you fit a linear mixed model with an AR(1) random effects correlation structure in R?

I am trying to use R to rerun someone else's project, so we need to use some macros in R.
Here comes a very basic question:
m1.nlme = lme(log.bp.dia ~ M25.9to9.ma5iqr + temp.c.9to9.ma4iqr + o3.ma5iqr + sea_spring + sea_summer + sea_fall + BMI + male + age_ini, data=barbara.1.clean, random = ~ 1|study_id)
Since the model is using AR(1) [autocorrelation 1 covariance model] in SAS for within person variance, I am not sure how to do this in R.
And where I can see the index for different models, like unstructured?
Thanks
I don't know what you mean by "index" for different models, but to specify an AR(1) covariance structure for the residuals, you can add corr=corAR1() to your lme call.
The correlation at lag $1$ is say $r$, where $-1< r <1$ for a stationary $AR(1)$ model. The correlation at lag $k \geq 1$ is $r^k$. This gives you the autocovariance matrix by just multiplying by the variance of $X_t$.

Resources