linear models for ANOVA - r

I'm a bit desperate, as my exam is tomorrow.
Say I have the data for an ANOVA with 2 independent factors. According to my teacher, I would write the linear model on RStudio as:
lm(score ~ 1+ A + B + A:B, data=mydata1, contrasts=list(A=contr.sum, B=contr.sum))
I've faced an exercise in which he says, essencially, would this model be correct for a 2 way ANOVA?
lm(score ~ A + B + A:B, data=mydata1, contrasts=list(A=contr.sum, B=contr.sum))
I'm not sure what difference the "+1" makes on the LM, I assumed it changed the y for x=0, but I'm really not sure. Would it be appropriate to use?
Could anyone help me? Sorry if terms are wrong, my first language isn't english.

Related

what does a colon (:) do in a linear mixed effects model analysis?

Let me clarify that I am a complete beginner at R.
I'm working on a problem and having a bit of trouble understanding a formula I'm supposed to use in a linear mixed effects model analysis of a dataset, more specifically this formula,
ModelName <- lmer(outcome ~ predictor1 + predictor2 + predictor1:predictor2 + (random_structure), data = DatasetName)
I don't know what the predictor1:predictor2 part of it means, could anyone please help me understand or link to something I can read to understand?
I've run the code and it gives an additional output for the predictor2 part of the formula which doesnt happen when you dont include that part.
Wow! You may be new to R, but you ask a great question!
As you probably know already, the + operator separates terms in a model.
Y ~ a + b + c means that the response is modeled by a linear combination of a, b, and c.
The colon operator denotes interaction between the items it separates, for example:
Y ~ a + b + a:b means that the response is modeled by a linear combination of a, b, and the interaction between a and b.
I hope this helps!
Rose Hartman explains how interactions affect linear models, and why it’s important to consider them in Understanding Interactions in Linear Models https://education.arcus.chop.edu/understanding-interactions/

Fitting random factors for a linear model using lme4

I have 4 random factors and I want to provide its linear model using lme4. But struggled to fit the model.
Assuming A is nested within B (2 levels), which in turn nested within each of xx preceptors (P). All responded to xx Ms (M).
I want to fit my model to get variances for each factor and their interactions.
I have used the following codes to fit the model, but I was unsuccessful.
lme4::lmer(value ~ A +
(1 + A|B) +
(1 + P|A),
(1+ P|M),
data = myData, na.action = na.exclude)
I also read interesting materials here, but Still, I struggle to fit the model. Any help?
At a guess, if the nesting structure is ( P (teachers) / B (occasions) / A (participants) ), meaning that the occasions for one teacher are assumed to be completely independent of the occasions for any other teacher, and that participants in turn are never shared across occasions or teachers, but questions (M) are shared across all teachers and occasions and participants:
value ~ 1 + (1| P / B / A) + (1|M)
Some potential issues:
as you hint in the comments, it may not be practical to fit random effects for factors with small numbers of levels (say, < 5); this is likely to lead to the dreaded "singular model" message (see the GLMM FAQ for more detail).
if all of the questions (M) are answered by every participant, then in principle it's possible to fit a model that takes account of the among-question correlation within participants: the maximal model would be ~ 1 + (M | P / B / A) (which would look for among-question correlations at the level of teacher, occasion within teacher, and participant within occasion within teacher). However, this is very unlikely to work in practice (especially if each participant answers each question only once, in which case the teacher:occasion:participant:question variance will be confounded with the residual variance in a linear model). In this case, you will get an error about "probably unidentifiable": see e.g. this question for more explanation/detail.

R: Linear Regression with N Features

I saw quite a few examples of how to do regression (linear, multiple... etc.) but on every example I saw, you had to define every single feature in the formula...
linearMod <- lm(Y ~ x1 + x2 + x3 + ..., data=myData)
Well, we used TSFresh to generate more features. Around 100. So how am I supposed to do this now? I don't really want to type in x1 .. all the way to .. x100.
In Phyton scikit-learn I could just put in all the data:
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
And then repeat this for each 'feature group' to create a multiple linear regression.
Is there a way to do this in R? Or am I doing it wrong? Maybe another approach?
Originally we had 8 features/properties per Row. And with TSFresh we gernerated more of those. (Mean, STD and so on)
And every one of those features has a pretty linear influence on the Y result. So how can I now define something like a multiple linear model that just uses all extended features? Ideally without me having to tell it by hand each time.
So for example (one formulare would probably be feature 1-12 for Y) the next one (13-24 for Y) and so on. Is there a easy way to do this?
If you want to regress on all variables except Y you can do
lm(Y ~ ., data = myData)

R: Understanding formulas

I'm trying to get a better understanding of what R formulas mathematically mean.
For example: lm(y ~ x) would fit a line to y = Ax + B
Would lm(y ~ x + z) be fitting to the plane y = Ax + Bz + C?
Would lm(y ~ x + z + x:z) be fitting to the plane y = Ax + Bz + Cxz + D?
Your understanding is correct! Though it may be helpful to understand it a bit more abstractly. Your linear model (lm) only means that it's fitting parameters on a one-dimensional dependance (Ax not Ax^2 or Asin(x) or anything fancier than that).
But that does not mean it only fits 1 to 3 parameters. Imagine that foods represent dimensions: grains, fruits, vegetables, meats, and dairy make up our 5 "dimensions of food". These things are clearly relatable--and maybe not even independent--but still not totally describable in exactly the same ways. We can think of our model as the tool which gauges our coefficients--which in this food example we can imagine as "flavors", like sweet, spicy, sour, etc.
Our model then takes points in different dimensions (food groups) and attempts to relate them by their coefficient values (flavors) for a function. This model then allows us to describe other foods/flavors. This is really what most models "do": they "train" themselves on annotated data and build a relationship--linear models just treat flavors as directly proportional to the amount of food group.
I hope this explanation was helpful. If there's anything that's unclear, please let me know. Also, I would have made this as a comment but have not yet accumulated the required 50 pts. Sorry!

R - plotting the predictions from a mixed model with more than two predictors (continuous and factor)

I found this answer by Ben Bolker to a post and it is really helpful (How to plot random intercept and slope in a mixed model with multiple predictors?). However, if my model looks more like this: /n
mod <- lmer(resp ~ pred1 + pred2 + factor(pred3) + (1|RF1),data=d) and I also want to plot the factor's influence on the response keeping the other two constant, how would I create the nd dataframe instead? Also, how would I go about plotting random slopes? Thank you very much in advance!
EDIT: Ben, thank you very much for the answer and I apologize, of course it makes sense to give a reproducible example.
So, the first question: how can I plot the influence of a predictor keeping the others constant (as described in your answer to the above linked question) if I have a factor variable in my model?
Here is my example data: https://www.dropbox.com/s/ytlocw868fsnpu7/realdatasample.csv?dl=0, please treat confidentially :).
So the model would be:
moddata <- lmer(meanQUALNEW ~ meanDBH + meanCRRATIO + richn_tar + (1|region),data=realdatasample)
From what I understand, the example given in the link above is about constructing a plot for one predictor while keeping the other constant and then vice versa and taking into account the random effect. But how do I expand that code to account for three variables and especially if it is a factor?
The second question:
How can I visualize the random slopes in a model like this?
moddata1 <- lmer(meanQUALNEW ~ meanDBH + meanCRRATIO + richn_tar + (richn_tar-1|region),data=realdatasample)
As far as I understand, the packages visreg and effects provide ways to visualize the fixed part of such models in the accepted way (change in one predictor keeping others constant). But they don't work (as far as I know) for nice visualizations of the random effects variance components.
I realize that there is probably a lot of information about this out there, but I like the clear code example from above very much and would like to understand how to do these things "by hand".
Thanks so much for any help!

Resources