saturated regression with interaction variables - r

I'm working with the following data in R. I need to make a saturated regression with ed76 as a dependent variable. From what I understand, a saturated regression has to include all of the explanatory variables in addition to the interaction between dummy variables. So, say I have the following column variables nearc2, nearc4, momdad14, step14, ed76, south66, wage, iq. It is my understanding that the regression should look like this:Reg <- lm(ed76 ~ nearc2 + nearc4 + momdad14 + step14 + ed76 + south66 + wage + iq + nearc2*nearc4 + nearc2*momdad14 + nearc2*step14 + ... +) is there a more efficient way to create the interaction terms with all of one's dummy variables for the purpose of making a saturated regression model?

A saturated model requires as many parameters as data points. See e.g., this answer. So this is likely not what you want as saturated a not commonly used in linear models AFAIK. At least, I am not sure what you would use it for. However, the model can be fit with
lm(ed76 ~ as.factor(seq_along(ed76)))
#G. Grothendieck answer will only give you a saturated model with lm if it leads to as many parameters as there are observations.

Related

Multilevel model using glmer: Singularity issue

I'm using R to run a logistic multilevel model with random intercepts. I'm using the frequentist approach (glmer). I'm not able to use Bayesian methods due to the research centre's policy.
When I run my code it says that my model is singular. I'm not sure why or how to fix the issue. Any advice would be appreciated!
More information about the multilevel model I used:
I'm using a multilevel modelling method used in intersectionality research called multilevel analysis of individual heterogeneity and discriminatory accuracy (MAIHDA). The method uses individual level data as level 2 (the intersection group) and nests individuals within their intersections.
My outcome is binary and I have three categorical variables as fixed effects (gender, martial status, and disability). The random effect (level 2) is called intersect1 which includes each unique combination of the categorical variables (gender x marital x disability).
This is the code:
MAIHDA_full <- glmer(IPV_pos ~ factor(sexgender) + factor(marital) + factor(disability) + (1|intersect1), data=Data, family=binomial, control=glmerControl(optimizer=”bobyqa”,optCtrl=list(maxfun=2e5)))
The usual reason for a singular fit with mixed effects models is that either the random structure is overfitted - typically because of the inclusion of random slopes, or in the case such as this where we only have random intercepts, then the variation in the intercepts is so small that the model cannot detect it.
Looking at your model formula I suspect the issue is:
The random effect (level 2) is called intersect1 which includes each unique combination of the categorical variables (gender x marital x disability).
If I have understood this correctly, the model is equivalent to:
IPV_pos ~ sexgender + marital + disability + (1 | sexgender:marital:disability)
It is likely that any variation in sexgender:marital:disability is captured by the fixed effects, leading to near-zero variation in the random intercepts.
I suspect you will find almost identical results if you don't use any random effect.

Linear regression with ordinal variable

I have five cognitive variables (memory, cognitive flexibility, critical thinking, verbal, and attention) and one ordinal variable (adversity scores from 1-10). I have cortical thickness as my outcome variable (or dependent variable).
I was wondering how I can set up my regression?
I was thinking to do this:
lm(cortical_thickness ~ memory + cognitive_flexibility + critical_thinking + verbal + attention + adversity_score)
or should I set it up like this instead:
lm(cortical_thickness ~ (memory + cognitive_flexibility + critical_thinking + verbal + attention)* adversity_score)
In my opinion your question is a more statistical question. And it is not as trivial as it sounds. The question is how to deal with ordinal predictors in the context of multiple linear regression.
The simple answer is treat your 10 scaled ordinal predictor as a continuous variable and then I would use:
model1 <- lm(cortical_thickness ~ memory + cognitive_flexibility + critical_thinking + verbal + attention + adversity_score, data=yourdataset)
How to model depends strong on your data. Therefore I think you should ask this question with your data here: https://stats.stackexchange.com/
Although you can run lm, ordinal regression models do not satisfy the assumptions of linear regression so model comparisons and other tests won't be valid.
R does have ordinal regression functions which you may wish to try. Four such packages are listed here.
Regarding which model to use, run both models and compare them. If fm1 and fm2 are the two models then
anova(fm1, fm2)
will compare them and works for at least clm and polr. See ?clm in ordinal package or ?polr in the MASS package for examples including the use of anova.

plot interaction terms in a mixed-model

I would like to visualize the effect of a significant interaction term in the following mixed effects model.
c1 <- lmer(log.weight ~ time + I(time^2) + temp + precip + time:precip + time:temp + (1|indiv), data = noctrl)
This model includes fixed effects of 'time' (simple and quadratic term), 'temperature', 'precipitation' and two interactions on the logarithmized response 'weight'. All terms are significant and the models assumptions of normality and homogeneity are met.
I’ve been using the 'effects' package to produce interaction plots to show the effect of the interactions. When I try to show the interaction of time and temperature (time:temp) with the following code I’m not sure whether the resulting plot correctly shows this interaction.
ef2 <- effect(term="time:temp", c1, multiline=TRUE)
y <- as.data.frame(ef2)
ggplot(y , aes(time, fit, color=temp)) + geom_point() + geom_errorbar(aes(ymin=fit-se, ymax=fit+se), width=0.4)
I need help understanding the resulting plot please. How come despite this interaction term is highly significant, the SEs overlap at each value of x?
Am I using the effects package correctly? Or is it because I need to include the quadratic term of time I(time^2) in the interaction terms as well?
Thank you very much for the clarification.

Testing a General Linear Hypothesis in R

I'm working my way through a Linear Regression Textbook and am trying to replicate the results from a section on the Test of the General Linear Hypothesis, but I need a little bit of help on how to do so in R.
I've already taken a look at a number of other posts, but am hoping someone can give me some example code. I have data on twenty-six subjects which has the following form:
Group, Weight (lb), HDL Cholesterol mg/decaliters
1,163.5,75
1,180,72.5
1,178.5,62
2,106,57.5
2,134,49
2,216.5,74
3,163.5,76
3,154,55.5
3,139,68
Given this data I am trying to test to see if the regression lines fit to the three groups of subjects have a common slope. The models postulated are:
y=βo + β1⋅x + ϵ
y=γ0 + γ1⋅xi + ϵ
y= δ0 + δ1⋅xi + ϵ
So the hypothesis of interest is H0: β1 = γ1 = δ1
I have been trying to do this using the linearHypothesis function in the car library, but have been having trouble knowing what the model object should be, and am not confident that this is the correct approach (or package) to be using.
Any help would be much appreciated – Thanks!
Tim, your question doesn't seem so much to be about R code. Instead, it appears that you have questions about how to test the interaction of your Group and Weight (lb) variables on the outcome HDL Cholesterol mg/decaliters. You don't state this specifically, but I'm taking a guess that these are your predictors and outcome, respectively.
So essentially, you're trying to see if the predictor Weight (lb) has differential effects depending on the level of the variable Group. This can be done in a number of ways using the linear model. A simple regression approach would be lm(hdl ~ 1 + group + weight + group*weight). And then the coefficient for the interaction term group*weight would tell you whether or not there is a significant interaction (i.e., moderation) effect.
However, I think we would have a major concern. In particular, we ought to worry that our hypothesized effect is that the group variable and the hdl variable do not interact. That is, you're essentially predicting the null. Furthermore, you're predicting the null despite having a small sample size. Therefore, it would be rather unlikely that we have sufficient statistical power to detect an effect, even if there were one to be observed.

Mixed Modelling - Different Results between lme and lmer functions

I am currently working through Andy Field's book, Discovering Statistics Using R. Chapter 14 is on Mixed Modelling and he uses the lme function from the nlme package.
The model he creates, using speed dating data, is such:
speedDateModel <- lme(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality,
random = ~1|participant/looks/personality)
I tried to recreate a similar model using the lmer function from the lme4 package; however, my results are different. I thought I had the proper syntax, but maybe not?
speedDateModel.2 <- lmer(dateRating ~ looks + personality + gender +
looks:gender + personality:gender +
(1|participant) + (1|looks) + (1|personality),
data = speedData, REML = FALSE)
Also, when I run the coefficients of these models I notice that it only produces random intercepts for each participant. I was trying to then create a model that produces both random intercepts and slopes. I can't seem to get the syntax correct for either function to do this. Any help would be greatly appreciated.
The only difference between the lme and the corresponding lmer formula should be that the random and fixed components are aggregated into a single formula:
dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+ (1|participant/looks/personality)
using (1|participant) + (1|looks) + (1|personality) is only equivalent if looks and personality have unique values at each nested level.
It's not clear what continuous variable you want to define your slopes: if you have a continuous variable x and groups g, then (x|g) or equivalently (1+x|g) will give you a random-slopes model (x should also be included in the fixed-effects part of the model, i.e. the full formula should be y~x+(x|g) ...)
update: I got the data, or rather a script file that allows one to reconstruct the data, from here. Field makes a common mistake in his book, which I have made several times in the past: since there is only a single observation in the data set for each participant/looks/personality combination, the three-way interaction has one level per observation. In a linear mixed model, this means the variance at the lowest level of nesting will be confounded with the residual variance.
You can see this in two ways:
lme appears to fit the model just fine, but if you try to calculate confidence intervals via intervals(), you get
intervals(speedDateModel)
## Error in intervals.lme(speedDateModel) :
## cannot get confidence intervals on var-cov components:
## Non-positive definite approximate variance-covariance
If you try this with lmer you get:
## Error: number of levels of each grouping factor
## must be < number of observations
In both cases, this is a clue that something's wrong. (You can overcome this in lmer if you really want to: see ?lmerControl.)
If we leave out the lowest grouping level, everything works fine:
sd2 <- lmer(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+
(1|participant/looks),
data=speedData)
Compare lmer and lme fixed effects:
all.equal(fixef(sd2),fixef(speedDateModel)) ## TRUE
The starling example here gives another example and further explanation of this issue.

Resources