Use FE estimates for OLS - r

I am analyzing a panel data set and I am interested in some time-independent explanatory variables(z). The Hausmann Test shows that I should use a fixed effects model instead of a random effects model.
Downside is, that the model will not estimate any coefficients for the time-independent explanatory variables.
So one idea is to take the estimated coefficients(b) for the time-dependent variables(x) from the FE model and use them on the raw data which means, take out the effects from the already estimated explanatory variables. Then use this corrected values as dependent variable for an OLS model with the time-independent variables as explanatory variables. This leads to:
y - x'b = z'j + u (with j as coefficients of interesst)
Do these two models exclude each other with any necessary assumption or is it just that the standard errors of the OLS model need to be corrected?
Thanks for every hint!

Related

Is there a way to include an autocorrelation structure in the gam function of mgcv?

I am building a model using the mgcv package in r. The data has serial measures (data collected during scans 15 minutes apart in time, but discontinuously, e.g. there might be 5 consecutive scans on one day, and then none until the next day, etc.). The model has a binomial response, a random effect of day, a fixed effect, and three smooth effects. My understanding is that REML is the best fitting method for binomial models, but that this method cannot be specified using the gamm function for a binomial model. Thus, I am using the gam function, to allow for the use of REML fitting. When I fit the model, I am left with residual autocorrelation at a lag of 2 (i.e. at 30 minutes), assessed using ACF and PACF plots.
So, we wanted to include an autocorrelation structure in the model, but my understanding is that only the gamm function and not the gam function allows for the inclusion of such structures. I am wondering if there is anything I am missing and/or if there is a way to deal with autocorrelation with a binomial response variable in a GAMM built in mgcv.
My current model structure looks like:
gam(Response ~
s(Day, bs = "re") +
s(SmoothVar1, bs = "cs") +
s(SmoothVar2, bs = "cs") +
s(SmoothVar3, bs = "cs") +
as.factor(FixedVar),
family=binomial(link="logit"), method = "REML",
data = dat)
I tried thinning my data (using only every 3rd data point from consecutive scans), but found this overly restrictive to allow effects to be detected due to my relatively small sample size (only 42 data points left after thinning).
I also tried using the prior value of the binomial response variable as a factor in the model to account for the autocorrelation. This did appear to resolve the residual autocorrelation (based on the updated ACF/PACF plots), but it doesn't feel like the most elegant way to do so and I worry this added variable might be adjusting for more than just the autocorrelation (though it was not collinear with the other explanatory variables; VIF < 2).
I would use bam() for this. You don't need to have big data to fit a with bam(), you just loose some of the guarantees about convergence that you get with gam(). bam() will fit a GEE-like model with an AR(1) working correlation matrix, but you need to specify the AR parameter via rho. This only works for non-Gaussian families if you also set discrete = TRUE when fitting the model.
You could use gamm() with family = binomial() but this uses PQL to estimate the GLMM version of the GAMM and if your binomial counts are low this method isn't very good.

Multilevel model using glmer: Singularity issue

I'm using R to run a logistic multilevel model with random intercepts. I'm using the frequentist approach (glmer). I'm not able to use Bayesian methods due to the research centre's policy.
When I run my code it says that my model is singular. I'm not sure why or how to fix the issue. Any advice would be appreciated!
More information about the multilevel model I used:
I'm using a multilevel modelling method used in intersectionality research called multilevel analysis of individual heterogeneity and discriminatory accuracy (MAIHDA). The method uses individual level data as level 2 (the intersection group) and nests individuals within their intersections.
My outcome is binary and I have three categorical variables as fixed effects (gender, martial status, and disability). The random effect (level 2) is called intersect1 which includes each unique combination of the categorical variables (gender x marital x disability).
This is the code:
MAIHDA_full <- glmer(IPV_pos ~ factor(sexgender) + factor(marital) + factor(disability) + (1|intersect1), data=Data, family=binomial, control=glmerControl(optimizer=”bobyqa”,optCtrl=list(maxfun=2e5)))
The usual reason for a singular fit with mixed effects models is that either the random structure is overfitted - typically because of the inclusion of random slopes, or in the case such as this where we only have random intercepts, then the variation in the intercepts is so small that the model cannot detect it.
Looking at your model formula I suspect the issue is:
The random effect (level 2) is called intersect1 which includes each unique combination of the categorical variables (gender x marital x disability).
If I have understood this correctly, the model is equivalent to:
IPV_pos ~ sexgender + marital + disability + (1 | sexgender:marital:disability)
It is likely that any variation in sexgender:marital:disability is captured by the fixed effects, leading to near-zero variation in the random intercepts.
I suspect you will find almost identical results if you don't use any random effect.

Model formula for Multilevel Model where different sets of predictors are used for random intercept and random slope

I have been searching for many web sources just to try to get any information on this issue. I am working on a two-level multilevel model.
Problem Background
In level one, I have one response variable with one predictor variable. Therefore, I have one intercept and one slope coefficient at this level.
In level two, I have one model that predicts the level-one intercept and another model that predicts the level-one slope. In my study, I will be using different predictors in both level-two models. Nearly all tutorials in the internet and the books I have read assume the same level-two predictors are being used to predict the level-one intercept and level-one slope.
Question
How should I specify my model in r? (Packages in use: lmerTest, blmer, and brms. They all use the same model formulation)
List of Variables
y - level 1 response variable
L1x - level 1 predictor variable
L2a - level 2 predictor for the level 1 intercept
L2b - level 2 predictor for the level 1 slope
g - grouping variable
What I Know
Null Model: This is simple. I think I have done it correctly.
y ~ (1|g)
Random Intercept Model: I am pretty sure this is correct too. L1x will be a fixed effect predictor and this only allows the incept to be varying across the different groups.
y ~ L1x + (1 | g)
What I Don't Know
How do I make a formula for random intercept and random slope and beyond? I know that when you have the same level-two predictors for both the intercept and the slope, it is
y ~ L1x + (L2b | g)
But to my understanding, this assumes L2b to be the level-two predictor of both Level-one intercept and slope. How do I formulate my model when the level-two predictors for level-one intercept and slope are different? I hope my question makes sense to everybody.
Note:
This is my first time posting. Please let me know what I should do to make the question clearer to you. Thank you.
I could not figure out how to use LaTex code here, so I am adding the model as images.
Level One Model
Level Two Models

How to get each independent variables R^2 in a multivariate linear regression

When you do a multivariate linear regression you get the multiple R-squared, like this:
My question is, if I can get the R-squared for each independent variable, without having to make a regression for each of the predictor variables.
For example, is it possible to get the R-squared for each of the predictor variables, next to the p value:
In regression models, individual variables do not have an R-squared. There is only ever an R-squared for a complete model. The variance explained by any single independent variable in a regression model is depending on the other independent variables.
If you need some added value of an independent variable, that is, the variance this IV explains above all others, you can compute two regression models. One with this IV and one without. The difference in R-squared is the variance this IV explains after all others have explained their share. But if you do this for all variables, the differences won't add up to the total R-squared.
Alternatively, you may use squared Beta weights to roughly estimate the effect size of a variable in a model. But this value is not directly comparable to R-squared.
This said, this question would better be posted in CrossValidated than StackOverflow.

Fixed effects logit lasso model

My data set has a binary dependent variable (0/1) and a lot of continuous independent variables for many individuals and three time periods. Therefore, I am facing a panel data set with a binary dependent variable, which asks for the use of a non-linear panel data model. However, I also have a lot of independent variables, which asks for the use of a variable selection method. Therefore, I want to apply lasso on a fixed effects logit model.
As far as I know, there is only the possibility in cv.glmnet to estimate a logit lasso model by the function cv.glmnet(x, y, weights, offset, lambda, type.measure='binomial', nfolds, foldid, grouped, keep, parallel, ...) using type.measure='binomial'. This estimation procedure pools all individuals as it is a cross-sectional estimation procedure and does not take the panel component of my data set.
Therefore, I would like to adjust the cv.glmnet function such that I can take as input for example type.measure='fe binomial' and so it runs a fixed effects logit lasso model.
In conclusion, it is possible to run a fixed effects logit model and a lasso model separately but I want to combine both. How can I do this in R?
(Also, in the attachment I wrote my model down in more detail)
Explanation model

Resources