lmmlasso - how to specify a random intercept, and make a prediction? - r

I'm new to R and statistical modelling, and am looking to use the lmmlasso library in r to fit a mixed effects model, selecting only the best fixed effects out of ~300 possible variables.
For this model I'd like to include both a fixed intercept, a random effect, and a random intercept. Looking at the manual on CRAN, I've come across the following:
x: matrix of dimension ntot x p including the fixed-effects
covariables. An intercept has to be included in the first column as
(1,...,1).
z: random effects matrix of dimension ntot x q. It has to be a matrix,
even if q=1.
While it's obvious what I need to do for the fixed intercept I'm not quite sure how to include both a random intercept and effect. Is it exactly the same as the fixed matrix, where I include (1...1) in my first column?
In addition to this, I'm looking to validate the resulting model I get with another dataset. For lmmlasso is there a function similar to predict in lme4 that can be used to compute new predictions based on the output I get? Alternatively, is it viable/correct to construct a new model using lmer using the variables with non-zero coefficients returned by lmmlasso, and then use predict on the new model?
Thanks in advance.

Related

Heterocesdastic model of mixed effects via lmer function

I am adjusting a mixed effects model which, due to the observed heteroscedasticity, it was necessary to include an effect to accommodate it. Therefore, using the lme function of the nlme package, this was easy to be solved, see the code below:
library(nlme)
library(lme4)
Model1 <- lme(log(Var1)~log(Var2)+log(Var3)+
(Var4)+(Var5),
random = ~1|Var6, Data1, method="REML",
weights = varIdent(form=~1|Var7))
#Var6: It is a factor with several levels.
#Var7: It is a Dummy variable.
However, I need to readjust the model described above using the lme4 package, that is, using the lmer function. It is known and many are the materials that inform some limitations existing in the lme4, such as, for example, modeling heteroscedasticity. What motivated me to readjust this model is the fact that I have an interest in using a specific package that in cases of mixed models it only accepts if they are adjusted through the lmer function. How could I resolve this situation? Below is a good part of the model adjusted using the lmer function, however, this model is not considering the effect to model the observed heteroscedasticity.
Model2 <- lmer(log(Var1)~log(Var2)+log(Var3)+
(Var4)+(Var5)+(1|Var6),
Data1, REML=T)
Regarding the choice of the random effect (Var6) and the inclusion of the effect to consider the heterogeneity by levels of the variable (Var7), these were carefully analyzed, however, I will not put here the whole procedure so as not to be an extensive post and to be more objective .
This is hackable. You need to add an observation-level random effect that is only applied to the group with the larger residual variance (you need to know this in advance!), via (0+dummy(Var7,"1")|obs); this has the effect of multiplying each observation-level random effect value by 1 if the observation is in group "1" of Var7, 0 otherwise. You also need to use lmerControl() to override a few checks that lmer does to try to make sure you are not adding redundant random effects.
Data1$obs <- factor(seq(nrow(Data1)))
Model2 <- lmer(log(Var1)~log(Var2)+log(Var3)+
(Var4)+(Var5) + (1|Var6) +
(0+dummy(Var7,"1")|obs),
Data1, REML=TRUE,
control=lmerControl(check.nobs.vs.nlev="ignore",
check.nobs.vs.nRE="ignore"))
all.equal(REMLcrit(Model2), c(-2*logLik(Model1))) ## TRUE
all.equal(fixef(Model1), fixef(Model2), tolerance=1e-7)
If you want to use this model with hnp you need to work around the fact that hnp doesn't pass the lmerControl option properly.
library(hnp)
d <- function(obj) resid(obj, type="pearson")
s <- function(n, obj) simulate(obj)[[1]]
f <- function(y.) refit(Model2, y.)
hnp(Model2, newclass=TRUE, diagfun=d, simfun=s, fitfun=f)
You might also be interested in the DHARMa package, which does similar simulation-based diagnostics.

Covariance structure in lme - AR(1)

My response variable is Yijk corresponding to the recovery time of
patient i (i=1,...,I)
with treatment j (j=1,...,J)
and measured at time k (k=1,...,K)
I would like to fit the following model:Model equation, where:
μ is a global fixed intercept
αj is a fixed effect for the treatment
bik is a random effect with the following covariance structure. Denote bi the K-dimensional vector of effect for the patient i, then its variance-covariance matrix would have the following AR(1) structure.
Variance covariance matrix
uijk is the usual error term with variance σ²
Consider the following line of command:
lme(recovery ~ treatment, method="REML", random=~1|patient, correlation=corAR1,form=~time|patient,data=data)
Several questions:
What does this correlation argument correspond to? The structure of covariance of what? Is that the var-cov matrix which I defined as R?
Does the line actually do what I would like to?
If not, what does it do?
If not, is there a way to do what I would like to?
Thank you in advance!
First, you have a command lme, I will assume that is meant to be nlme because a) lme isn't an R command in any package that I know of or that R could find and b) correlation isn't an option in lme4
Second, in the documentation for nlme they have this:
an optional corStruct object describing the within-group correlation
structure. See the documentation of corClasses for a description of
the available corStruct classes. Defaults to NULL, corresponding to no
within-group correlations.
and in corClasses it says
corAR1 autoregressive process of order 1.
So, the answers to your first two questions appears to be "Yes".

Use of multiple imputation in R for two-level binary logistic regression model

I am using the glmer function of the R library lme4 to fit a General Linear Mixed (GLM) models using the Laplace approximation and with a binary response variable, BINARY_r , say. I have one level two fixed effects variables (‘FIXED’, say) and two level two cross-classified random effects variables (‘RANDOM1’ and ‘RANDOM2’, say). The level one binary response variable, BINARY_r, is nested separately within each of the two level two variables. The logit function is used as a link function for representing the non-Gaussian nature of the response variable. The interactive effect between the two random effects variables is represented by ‘RANDOM1:RANDOM2’. All three independent variables are categorical. The model takes the form,
BINARY_r ~ FIXED + RANDOM1 + RANDOM2 + RANDOM1:RANDOM2.
There are missing data for ‘FIXED’ and ‘BINARY_r’ and I wish to explore the improvement in the model through applying multiple imputation for each of these two variables.
I am very unclear, however, as to how to use MI to generate a new function in R using glmer which is identical to the original one but now includes imputed data for FIXED and BINARY_r. Can you help, please?
Many thanks in advance

PLM falling into the dummy variable trap -- how to fix?

An example:
load(url('BROKEN LINK'))
head(sdat)
library(plm)
fem = plm(y~T+G:t,data=sdat,effect="twoways",model="within",index=c("ID","t"))
summary(fem)
lsdvm = lm(y~ID+T+G:t,data=sdat)
summary(lsdvm)
fem$coef
fem is the fixed-effects model (fit with plm), and lsdv is the equivalent least-squares dummy variable model (fit with lm)
It is clear that plm is estimating the coefficients, and indeed that the coefficients are identical in the two models, as they should be. But when I go to summarize the results, plm is having a hard time, and I'm pretty sure that the reason is the timeXgroup fixed effects, some of which need to be auto-omitted because of the dummy variable trap. (lm, for example, seems to know how to automatically remove variables that are exact linear combinations of each other).
How do I get around this? I'd prefer to stay with plm, as it gives much more parsimonious output than lm with dummy variables for each cross-sectional unit.

Using stepAIC to make out of sample predictions

just had a quick question on using Step AIC to make prediction. I'm a beginner in R, so please pardon if the solution is obvious. Tried searching around but couldn't really find what I was looking for.
So I'm trying to predict the response variable, after running stepwise AIC on a main model (main model has all the explanatory variables). The stepAIC gives out a new model that has a reduced number of variables. My question is how do I do an out of sample prediction using the new reduced model. In other words, how does I reduce the dataset so that when I feed it into predict.lm, it only has the variables that were selected in the reduced model.
Here's my code below:
# Specify start and end row of the first 5 year window
start_row=1
end_row=60
#declare matrix that will contain the predicted returns by specifying dimensions
predicted=matrix(0,179,7)
y_var=as.matrix(orig_data[start_row:end_row,2:7])
x_var=as.matrix(orig_data[start_row:end_row,8:27])
# Perform linear regression on all factors and then select factors using stepwise AIC method
initial_model<- lm(y_var[,1]~x_var[,1]+x_var[,2]+x_var[,3]+x_var[,4]+x_var[,5]+x_var[,6]+x_var[,7]+x_var[,8]+x_var[,9]+x_var[,10]+x_var[,11]+x_var[,12]+x_var[,13]+x_var[,14]+x_var[,15]+x_var[,16]+x_var[,17]+x_var[,18]+x_var[,19]+x_var[,20])
reduced_model<-stepAIC(initial_model, direction="both")
reduced_coefs<-t(as.matrix(coef(reduced_model)))
x_input<-as.matrix(x_var[60,])
Basically how do I multiply the coefficients that I get from the reduced model to only the corresponding explanatory variables in "x_var" (which has all the explanatory variables)
Thanks a lot for your help!

Resources