Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
The model I want to construct/test is: dependent variable = factor A + factor B + factor C + interaction between factors A and C + interaction between factors B and C + factor B nested within factor A
An example I came across online is described in the file "ANOVA: advanced designs" (http://web.grinnell.edu/individuals/kuipers/stat2labs/Handouts/DOE%20Advancede.pdf) (thanks to the author(s) for sharing this file online). In this file, the example described in Split Plot/Repeated Measures Designs (slides 9-10) is similar to my case. Here, factor A is brand, factor B is box, and factor C is temp. If we assume (1) box is a fixed effect (i.e. those 3 boxes represent all possible levels of the factor "box"), (2) all bags within each box are assigned to a temp, and (3) there are more than two levels of temperatures (e.g. there are four levels of temperature, 10, 20, 30, 40) and the number of bags within each box assigned to a certain temp is randomly determined (i.e. the numbers of bags assigned to different temperatures are not equal and it could be that no bag is assigned to a certain temperature in some boxes), then this example is almost the same as what I am trying to describe. Also, my design is not balanced.
I want to test which factors and how these factors contribute to the dependent variable. The hypotheses are the hypotheses for a 3-way (in the example of popcorn, brand, temperature, box) anova. In the example of popcorn, the null hypothesis might be: brand, temp and/or box do not influence % popped kernels. The alternative hypothesis is just the opposite to null. Also, probably box in my case could also be a random effect, just as box, but I would like to take both these two situations into consideration (box as fixed and random effect).
What is the appropriate way to address this question?
Thanks.
I'm not 100% sure we agree on terminology, but I'll take a shot ...
You say you want
factor A + factor B + factor C + interaction between factors A and C + interaction between factors B and C + factor B nested within factor A
The main thing to note is that "B nested within A" is equivalent, at least in the world that I'm familiar with, to "include the main effect of A and the interaction between A and B, but not the main effect of B" (i.e. ~A/B == ~A+A:B. But then you say you do want the main effect of factor B, so this seems a little strange. Following your specification exactly would give
~ A + B + C + A:C + B:C + A/B
but this is equivalent to
~ A + B + C + A:C + B:C + A + A:B
R automatically discards the redundant A term, so this is also equivalent to
~ A + B + C + A:C + B:C + A:B
But since this is essentially the main effects plus all two-way interactions, you could also write it as
~(A+B+C)^2
Because redundant terms are discarded you could write this equivalently in many different ways: ~A*B+A*C+B*C (A*B is equivalent to A+B+A:B) or ~A*C+B*C+A/B ... if you want to check what R has actually produced, you can use colnames(model.matrix(my_formula,my_data)).
This is all assuming we're working in the lm()/fixed-effect context ...
Related
I am helping another researcher with their coding in R. I did not work with them during the planning of the experiment design and now I could really use some help with this tricky design. I have four fixed factor: FactorA, FactorB, FactorC, and FactorD. The experiment is not a fully factorial design. There are missing cells (combinantions of factors that are not available) in addition to umbalaced number of samples. For the combinations FactorA:FactorB, FactorA:FactorC, and FactorB:FactorC, I have the proper amount of cells (treatment combinations). I also have a random factor: Block, which is nested within FactorD. In my field, it is common for people (even in high impact journals) just to run different ANOVAs for each factor to avoid dealing with this type of problem, but I wonder if I could write a model that comprises all those factors.
Please, could I use something like this?
lmerTest::lmer(Response ~ FactorA + FactorB + FactorC + FactorD +
FactorA:FactorB + FactorA:FactorC + FactorB:FactorC +
(1|FactorD/Block),indexes)
I appreciate any suggestions you may have!
Assuming that what you're missing from the design are some combinations of factor D with the other factors, this is close.
You can express this a little more compactly as
Response ~ (FactorA + FactorB + FactorC)^2 + FactorD + (1|FactorD:Block)
You shouldn't use (1|FactorD/Block), because that will expand to (1|FactorD) + (1|FactorD:Block) and give you a redundant term (FactorD will be specified as both a fixed and a random effect)
Unbalanced numbers of observations don't matter as long as a factor combination is not completely missing/has at least one observation.
I understand that * means that we check for the association between two predictor variables and + means that we add another predictor variable to the model we already have. But how do I write a function (lm) for the following question: Variable A (dependent variable) shall be impacted by variable B (predictor variable) and variable C. How does C -moderate B, how do both B and C directly impact A, how does the association between B and C impact A and how is all this moderated by the variables D and E?
Variables A until D are all continuous variables that are about personal characters, variable E is gender (male / female).
A part of the formula should look like this:
M1 <- lm(A ~ B_centered * C_centered ..... * gender, data = data)
The middle of the equation is complicated, because it stays unclear to me when to use * and + and how I need to connect the single calculations which each other and if * is used for both the correlations as well as for the moderation.
Sorry if the question sounds strange I am new to R! Thank you for reading and trying to help.
I tried to find a way to connect the single terms of the formula to each other, but got confused. I listed the single interactions and calculations that need to be performed in the model one by one. Searched google, youtube etc. I put gender at the end as last independent variable in the formula and variable A as dependent variable and I put the two single independent variables B and C in. What is missing are the interactions and moderations between the independent variables and variable D.
I have 4 random factors and I want to provide its linear model using lme4. But struggled to fit the model.
Assuming A is nested within B (2 levels), which in turn nested within each of xx preceptors (P). All responded to xx Ms (M).
I want to fit my model to get variances for each factor and their interactions.
I have used the following codes to fit the model, but I was unsuccessful.
lme4::lmer(value ~ A +
(1 + A|B) +
(1 + P|A),
(1+ P|M),
data = myData, na.action = na.exclude)
I also read interesting materials here, but Still, I struggle to fit the model. Any help?
At a guess, if the nesting structure is ( P (teachers) / B (occasions) / A (participants) ), meaning that the occasions for one teacher are assumed to be completely independent of the occasions for any other teacher, and that participants in turn are never shared across occasions or teachers, but questions (M) are shared across all teachers and occasions and participants:
value ~ 1 + (1| P / B / A) + (1|M)
Some potential issues:
as you hint in the comments, it may not be practical to fit random effects for factors with small numbers of levels (say, < 5); this is likely to lead to the dreaded "singular model" message (see the GLMM FAQ for more detail).
if all of the questions (M) are answered by every participant, then in principle it's possible to fit a model that takes account of the among-question correlation within participants: the maximal model would be ~ 1 + (M | P / B / A) (which would look for among-question correlations at the level of teacher, occasion within teacher, and participant within occasion within teacher). However, this is very unlikely to work in practice (especially if each participant answers each question only once, in which case the teacher:occasion:participant:question variance will be confounded with the residual variance in a linear model). In this case, you will get an error about "probably unidentifiable": see e.g. this question for more explanation/detail.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have 3 random variables, x, y z ( all random effect)
x is nested in y, but y is crossed in z
I use the following function in lme4, but it does not work.
<- lmer(A ~ 1 + (1 | x/y) + (1 | y*z) + (1|x/y*z), my data)
Does anyone help me? Many thanks
I'm afraid this is still very unclear. More context would be useful. My guess is that you want
A ~ 1 + (1|y)+ (1|z) + (1|y:z) + (1|y:x)
or equivalently
A ~ 1 + (1|y*z) + (1|y:x)
but it's almost impossible to know for sure.
the first two random effects terms give among-y and among-z variances
the third term gives the variance among combinations of y and z -- you will only want this if you have multiple observations for each {y,z} combination
the last term gives the effect of x nested within y.
The expression A ~ 1 + (1|y/x) + (1|z/y) should give you the same results, because a/b expands in general to a + a:b (order matters for / but not for :), but it's less clear.
Crossed random effects are generally denoted by (1|y) + (1|z), or by (1|y*z) (which expands to (1|y) + (1|z) + (1|y:z)) if as discussed above there are multiple observations per {y,z} combination.
I am a complete novice when it comes to survival analysis. I am working on a project that requires I use the coxph function in the "survival" package, but I am running into trouble because I do not understand what is required by the formula object.
Most descriptions I can find about the function are as follows:
"a formula object, with the response on the left of a ~ operator, and the terms on the right. The response must be a survival object as returned by the Surv function. "
I know what needs to be on the left of the operator, the issue is what the function expects from the right-hand side.
Here is a link of what my data looks like (The actual data set is much larger, I'm only displaying the first 20 data points for brevity):
Short explanation of data:
-Row 1 is the header
-Each row after that is a separate patient
-The first column is the age of the patient at the time of the study
-columns 2 through 14 (headed by x2-x13), and 19 (x18) and 20 (x19) are covariates such as race, relationship status, medical conditions that take on either true (1) or false (0) values.
-columns 15 (x14) through 18 (x17) are covariates such as tumor size, which take on whole number values greater than 0.
-The second to last column "sur" is the number of months survived, and "index" is whether or not that is a right-censored time (1 for true, 0 for false).
Given this data I need to plot a Cox Proportional hazard curve, but I end up with an incorrect plot because the right hand side of the formula object is wrong.
Here is my code, "temp4" is the name I gave to the data table:
library("survival")
temp4 <- read.table("~/data.txt", header=TRUE)
seerCox <- coxph(Surv(sur, index)~ temp4$x1 + temp4$x2 + temp4$x3 + temp4$x4 + temp4$x5 + temp4$x6 + temp4$x7 + temp4$x8 + temp4$x9 + temp4$x10 + temp4$x11 + temp4$x12 + temp4$x13 + temp4$x14 + temp4$x15 + temp4$x16 + temp4$x17 + temp4$x18 + temp4$x19, data=temp4, singular.ok=TRUE)
plot(survfit(seerCox), main= "Cox Estimate", mark.time=FALSE, ylab="Probability", xlab="Survival Time in Months", col=c("blue", "red", "green"))
I should also note that I have tried replacing the right hand side that you're seeing with the number 1, a period, leaving it blank. These methods produce a kaplan-meier curve.
The following is the console output:
Each new line is an example of the error produced depending on how I filter the data. (ie if I only include patients with ages greater than 85, etc.)
If someone could explain how it works, it would be greatly appreciated.
PS- I have searched for over a week to my solution, and I am asking for help here as a last resort.
You should not be using the prefix temp$ if you are also using a data argument. The whole purpose of supplying a data argument is to allow dropping those in the formula.
seerCox <- coxph( Surv(sur, index) ~ . , data=temp4, singular.ok=TRUE)
The above would use all of the x-variables in your temp data.frame. This will use just the first 3:
seerCox <- coxph( Surv(sur, index) ~ x1+x2+x3 , data=temp4)
Exactly what the warnings signify depends on the data (as you have in one sense already exemplified by producing different sorts of collinearity with different subsets.) If you have collinear columns, then you get singularities in the inversion of the model matrix and the software will attempt to drop aliased columns with a warning. This is really telling you that you do not have enough data to build the large models you are attempting. Exploring that possibility with table calls is often informative.
Bottom line: This is not a problem with your formula construction, so much as it is a problem of not understanding the limitations of the chosen method with the dataset you have assembled. You need to be more careful about defining your goals. What is the highest priority in this research? Do you really need every variable? Is it possible to aggregate some of these anonymous variables into clinically meaningful categories such as diagnostic categories or comorbities?