Code contrasts for dummy variable with three groups - dummy-variable

I have a data set and I am struggeling to define the following contrasts.
In my data I want predict the level of intention (intention score) with a set of predictors. Also I want to determine wether the impact of a given predictor differs between country A,B, and C. I know that I can use e.g. country A as a reference category in a dummy coding. However that only gives me the comparisons of country A & B and country A & C, but no between country B & C. Is there any possability to define codings for the contrasts that allow me to make all three possible comparisons?
Thank you

Related

How do I build a multiple Regression with four predictors that include interactions (moderators) and in which correlations shall be calculated?

I understand that * means that we check for the association between two predictor variables and + means that we add another predictor variable to the model we already have. But how do I write a function (lm) for the following question: Variable A (dependent variable) shall be impacted by variable B (predictor variable) and variable C. How does C -moderate B, how do both B and C directly impact A, how does the association between B and C impact A and how is all this moderated by the variables D and E?
Variables A until D are all continuous variables that are about personal characters, variable E is gender (male / female).
A part of the formula should look like this:
M1 <- lm(A ~ B_centered * C_centered ..... * gender, data = data)
The middle of the equation is complicated, because it stays unclear to me when to use * and + and how I need to connect the single calculations which each other and if * is used for both the correlations as well as for the moderation.
Sorry if the question sounds strange I am new to R! Thank you for reading and trying to help.
I tried to find a way to connect the single terms of the formula to each other, but got confused. I listed the single interactions and calculations that need to be performed in the model one by one. Searched google, youtube etc. I put gender at the end as last independent variable in the formula and variable A as dependent variable and I put the two single independent variables B and C in. What is missing are the interactions and moderations between the independent variables and variable D.

Trying to run regression analysis with multiple factors

If I run glm(company rank ~ sales + region, ...) I get a rough set of company rank to sales. Company rank is a factor from A:D, so even when I try something like multinom(), I get factors B:D but not A. I understand A acts as the intercept but I'd like to get the individual values to every single sale in a region.
What would be the best way to go about getting every comparable value from A:D?

Model with Matched pairs and repeated measures

I will delete if this is too loosely programming but my search has turned up NULL so I'm hoping someone can help.
I have a design that has a case/control matched pairs design with repeated measurements. Looking for a model/function/package in R
I have 2 measures at time=1 and 2 measures at time=2. I have Case/Control status as Group (2 levels), and matched pairs id as match_id and want estimate the effect of Group, time and the interaction on speed, a continuous variable.
I wanted to do something like this:
(reg_id is the actual participant ID)
speed_model <- geese(speed ~ time*Group, id = c(reg_id,match_id),
data=dataforGEE, corstr="exchangeable", family=gaussian)
Where I want to model the autocorrelation within a person via reg_id, but also within the matched pairs via match_id
But I get:
Error in model.frame.default(formula = speed ~ time * Group, data = dataFullGEE, :
variable lengths differ (found for '(id)')
Can geese or GEE in general not handle clustering around 2 sets of id? Is there a way to even do this? I'm sure there is.
Thank you for any help you can provide.
This is definatly a better question for Cross Validated, but since you have exactly 2 observations per subject, I would consider the ANCOVA model:
geese(speed_at_time_2 ~ speed_at_time_1*Group, id = c(match_id),
data=dataforGEE, corstr="exchangeable", family=gaussian)
Regarding the use of ANCOVA, you might find this reference useful.

Partially nested/blocked experimental design in R

The design of the experiment involves 10 participants. All of them go through conditions A, B, C, D for treatment, however for participants 1-5 go through conditions E,F and participants 6-10 go through conditions G,H.
I'm using the nlme package with lme function to deal with missing data and prevent list-wise deletion of participants. Measured variable = DV, fixed effect = condition, random effect = participant). When everything is just crossed this is what I have:
lme(DV~cond, random =~1|ppt, data = OutcomeData, method = "ML", na.action = na.exclude)
What is the statistics set up for when the first part (conditions A, B, C, D) is crossed whereas the second part E,F and G,H are nested.... any help or guidance would be greatly appreciated! Thanks.
I think your design can be considered a planned "missing" design, where a portion of subjects are not exposed to certain conditions in a planned way (see Enders, 2010). If these values are "missing completely at random" you can treat your data as obtained from a one-way repeated-measures design with missing values in conditions E-H.
I suggest you include a variable "block" that distinguish subjects going through conditions A-D plus E and F from the other subjects. Then you can specify your model as
summary(m1 <- lme(DV ~ cond, random=~1|block/ppt, data=OutcomeData, method = "REML"))
If you randomize the subjects into 2 blocks properly, there should not be significant variability associated with the blocks. You can test this by fitting another model without the block random effect and compare the 2 models like this:
summary(m0 <- lme(DV ~ cond, random=~1|ppt, data=OutcomeData, method = "REML"))
anova(m0, m1)
method = "REML" because we are comparing nested models that differ in random effects. To estimate the fixed effect, you can refit the model with better fit (hopefully m0) with method = "ML".
If you have not collected data yet, I strongly encourage you to randomly assign the subjects to the 2 blocks. Assigning subjects 1-5 to block 1 (i.e., going through conditions E and F) and subjects 6-10 to the other block can introduce confounding variables (e.g., time, technicians getting used to the procedure).

Multiple comparisions using glht with repeated measure anova

I'm using the following code to try to get at post-hoc comparisons for my cell means:
result.lme3<-lme(Response~Pressure*Treatment*Gender*Group, mydata, ~1|Subject/Pressure/Treatment)
aov.result<-aov(result.lme3, mydata)
TukeyHSD(aov.result, "Pressure:Treatment:Gender:Group")
This gives me a result, but most of the adjusted p-values are incredibly small - so I'm not convinced the result is correct.
Alternatively I'm trying this:
summary(glht(result.lme3,linfct=mcp(????="Tukey")
I don't know how to get the Pressure:Treatment:Gender:Group in the glht code.
Help is appreciated - even if it is just a link to a question I didn't find previously.
I have 504 observations, Pressure has 4 levels and is repeated in each subject, Treatment has 2 levels and is repeated in each subject, Group has 3 levels, and Gender is obvious.
Thanks
I solved a similar problem creating a interaction dummy variable using interaction() function which contains all combinations of the leves of your 4 variables.
I made many tests, the estimates shown for the various levels of this variable show the joint effect of the active levels plus the interaction effect.
For example if:
temperature ~ interaction(infection(y/n), acetaminophen(y/n))
(i put the possible leves in the parenthesis for clarity) the interaction var will have a level like "infection.y:acetaminophen.y" which show the effect on temperature of both infection, acetaminophen and the interaction of the two in comparison with the intercept (where both variables are n).
Instead if the model was:
temperature ~ infection(y/n) * acetaminophen(y/n)
to have the same coefficient for the case when both vars are y, you would have had to add the two simple effect plus the interaction effect. The result is the same but i prefer using interaction since is more clean and elegant.
The in glht you use:
summary(glht(model, linfct= mcp(interaction_var = 'Tukey'))
to achieve your post-hoc, where interaction_var <- interaction(infection, acetaminophen).
TO BE NOTED: i never tested this methodology with nested and mixed models so beware!

Resources