Repeated measures anova with ezanova amd the wid function - r

So I'm trying to run a repeated measures anova. I basically have two IVs with scores from 1-7 and a dv. Every participant has a score in both IVs and the vd of course whitch makes them within subjects. Even after consulting ?ezAnova I can't figure out what my "wid" is.
My code looks like this:
ezANOVA(data=df,
wid =
dv = df$scoredv,
between = .(df$scoreIV1, df$scoreIV2))
or this:
model<- lme(df$dv ~ df$iv1 + df$iv2, random = ~1|subject/df$iv1 + df$iv2, data=df)
In this case I don't know what belongs in the subject function.
As far as I understood wid or subject is the number of the participant but I dont have a special column for that. Do I have to create one?

According to ez's docs, wid = should be the participant's univocal ID, in this case.
It's good practice to always have one (especially for data wrangling operations or individual-level data analyses). You can probably create one quickly just for the purpose with
df$id <- 1:nrow(df)

Related

mixed models: r spss difference

I want to do a mixed model analysis on my data. I used both R and SPSS to verify whether my R results where correct, but the results differ enormous for one variable. I can't figure out why there is such a large difference myself, your help would be appreciated! I already did various checks on the dataset.
DV: score on questionnaire (QUES)
IV: time (after intervention, 3 month follow-up, 9 month follow-up)
IV: group (two different interventions)
IV: score on questionnaire before the intervention (QUES_pre)
random intercept for participants
SPSS code:
MIXED QUES BY TIME GROUP WITH QUES_pre
/CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0,
ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
/FIXED=TIME GROUP QUES_pre TIME*GROUP | SSTYPE(3)
/METHOD=REML
/PRINT=SOLUTION TESTCOV
/RANDOM=INTERCEPT | SUBJECT(ID) COVTYPE(AR1)
/REPEATED=Index1 | SUBJECT(ID) COVTYPE(AR1).
R code:
model1 <- lme(QUES ~ group + time + time:group + QUES_pre, random = ~1|ID, correlation = corAR1(0, form =~1|Onderzoeksnummer), data = data,na.action=na.omit, method = "REML")
The biggest difference lies in the effect of group. For the SPSS code, the p value is .045, for the R code the p-value is .28. Is there a mistake in my code, or has anyone a suggestion of something else that might go wrong?

lm() saving residuals with group_by with R- confused spss user

This is complete reEdit of my orignal question
Let's assume I'm working on RT data gathered in a repeated measure experiment. As part of my usual routine I always transform RT to natural logarytms and then compute a Z score for each RT within each partipant adjusting for trial number. This is typically done with a simple regression in SPSS syntax:
split file by subject.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT rtLN
/METHOD=ENTER trial
/SAVE ZRESID.
split file off.
To reproduce same procedure in R generate data:
#load libraries
library(dplyr); library(magrittr)
#generate data
ob<-c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3)
ob<-factor(ob)
trial<-c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)
rt<-c(300,305,290,315,320,320,350,355,330,365,370,370,560,565,570,575,560,570)
cond<-c("first","first","first","snd","snd","snd","first","first","first","snd","snd","snd","first","first","first","snd","snd","snd")
#Following variable is what I would get after using SPSS code
ZreSPSS<-c(0.4207,0.44871,-1.7779,0.47787,0.47958,-0.04897,0.45954,0.45487,-1.7962,0.43034,0.41075,0.0407,-0.6037,0.0113,0.61928,1.22038,-1.32533,0.07806)
sym<-data.frame(ob, trial, rt, cond, ZreSPSS)
I could apply a formula (blend of Mark's and Daniel's solution) to compute residuals from a lm(log(rt)~trial) regression but for some reason group_by is not working here
sym %<>%
group_by (ob) %>%
mutate(z=residuals(lm(log(rt)~trial)),
obM=mean(rt), obSd=sd(rt), zRev=z*obSd+obM)
Resulting values clearly show that grouping hasn't kicked in.
Any idea why it didn't work out?
Using dplyr and magrittr, you should be able to calculate z-scores within individual with this code (it breaks things into the groups you tell it to, then calculates within that group).
experiment %<>%
group_by(subject) %>%
mutate(rtLN = log(rt)
, ZRE1 = scale(rtLN))
You should then be able to do use that in your model. However, one thing that may help your shift to R thinking is that you can likely build your model directly, instead of having to make all of these columns ahead of time. For example, using lme4 to treat subject as a random variable:
withRandVar <-
lmer(log(rt) ~ cond + (1|as.factor(subject))
, data = experiment)
Then, the residuals should already be on the correct scale. Further, if you use the z-scores, you probably should be plotting on that scale. I am not actually sure what running with the z-scores as the response gains you -- it seems like you would lose information about the degree of difference between the groups.
That is, if the groups are tight, but the difference between them varies by subject, a z-score may always show them as a similar number of z-scores away. Imagine, for example, that you have two subjects, one scores (1,1,1) on condition A and (3,3,3) on condition B, and a second subject that scores (1,1,1) and (5,5,5) -- both will give z-scores of (-.9,-.9,-.9) vs (.9,.9,.9) -- losing the information that the difference between A and B is larger in subject 2.
If, however, you really want to convert back, you can probably use this to store the subject means and sds, then multiply the residuals by subjSD and add subjMean.
experiment %<>%
group_by(subject) %>%
mutate(rtLN = log(rt)
, ZRE1 = scale(rtLN)
, subjMean = mean(rtLN)
, subjSD = sd(rtLN))
mylm <- lm(x~y)
rstandard(mylm)
This returns the standardized residuals of the function. To bind these to a variable you can do:
zresid <- rstandard(mylm)
EXAMPLE:
a<-rnorm(1:10,10)
b<-rnorm(1:10,10)
mylm <- lm(a~b)
mylm.zresid<-rstandard(mylm)
See also:
summary(mylm)
and
mylm$coefficients
mylm$fitted.values
mylm$xlevels
mylm$residuals
mylm$assign
mylm$call
mylm$effects
mylm$qr
mylm$terms
mylm$rank
mylm$df.residual
mylm$model

Model with Matched pairs and repeated measures

I will delete if this is too loosely programming but my search has turned up NULL so I'm hoping someone can help.
I have a design that has a case/control matched pairs design with repeated measurements. Looking for a model/function/package in R
I have 2 measures at time=1 and 2 measures at time=2. I have Case/Control status as Group (2 levels), and matched pairs id as match_id and want estimate the effect of Group, time and the interaction on speed, a continuous variable.
I wanted to do something like this:
(reg_id is the actual participant ID)
speed_model <- geese(speed ~ time*Group, id = c(reg_id,match_id),
data=dataforGEE, corstr="exchangeable", family=gaussian)
Where I want to model the autocorrelation within a person via reg_id, but also within the matched pairs via match_id
But I get:
Error in model.frame.default(formula = speed ~ time * Group, data = dataFullGEE, :
variable lengths differ (found for '(id)')
Can geese or GEE in general not handle clustering around 2 sets of id? Is there a way to even do this? I'm sure there is.
Thank you for any help you can provide.
This is definatly a better question for Cross Validated, but since you have exactly 2 observations per subject, I would consider the ANCOVA model:
geese(speed_at_time_2 ~ speed_at_time_1*Group, id = c(match_id),
data=dataforGEE, corstr="exchangeable", family=gaussian)
Regarding the use of ANCOVA, you might find this reference useful.

Partially nested/blocked experimental design in R

The design of the experiment involves 10 participants. All of them go through conditions A, B, C, D for treatment, however for participants 1-5 go through conditions E,F and participants 6-10 go through conditions G,H.
I'm using the nlme package with lme function to deal with missing data and prevent list-wise deletion of participants. Measured variable = DV, fixed effect = condition, random effect = participant). When everything is just crossed this is what I have:
lme(DV~cond, random =~1|ppt, data = OutcomeData, method = "ML", na.action = na.exclude)
What is the statistics set up for when the first part (conditions A, B, C, D) is crossed whereas the second part E,F and G,H are nested.... any help or guidance would be greatly appreciated! Thanks.
I think your design can be considered a planned "missing" design, where a portion of subjects are not exposed to certain conditions in a planned way (see Enders, 2010). If these values are "missing completely at random" you can treat your data as obtained from a one-way repeated-measures design with missing values in conditions E-H.
I suggest you include a variable "block" that distinguish subjects going through conditions A-D plus E and F from the other subjects. Then you can specify your model as
summary(m1 <- lme(DV ~ cond, random=~1|block/ppt, data=OutcomeData, method = "REML"))
If you randomize the subjects into 2 blocks properly, there should not be significant variability associated with the blocks. You can test this by fitting another model without the block random effect and compare the 2 models like this:
summary(m0 <- lme(DV ~ cond, random=~1|ppt, data=OutcomeData, method = "REML"))
anova(m0, m1)
method = "REML" because we are comparing nested models that differ in random effects. To estimate the fixed effect, you can refit the model with better fit (hopefully m0) with method = "ML".
If you have not collected data yet, I strongly encourage you to randomly assign the subjects to the 2 blocks. Assigning subjects 1-5 to block 1 (i.e., going through conditions E and F) and subjects 6-10 to the other block can introduce confounding variables (e.g., time, technicians getting used to the procedure).

"Simulating" a large number of regressions with different predictor values

Let's say I have the following data and I'm interested in examining some counterfactuals. In particular, I want to examine whether there would be changes in predicted income given a change in income. The best way I can think to do this is to write a loop that runs this regression 1:n. However, how do I also make adjustments to the data frame while running through the loop. I'm really hoping that there is a base R function or something in a package that someone can point me to.
df = data.frame(year=c(2000,2001,2002,2003,2004,2005,2006,2007,2009,2010),
income=c(100,50,70,80,50,40,60,100,90,80),
age=c(26,30,35,30,28,29,31,34,20,35),
gpa=c(2.8,3.5,3.9,4.0,2.1,2.65,2.9,3.2,3.3,3.1))
df
mod = lm(income ~ age + gpa, data=df)
summary(mod)
Here are some counter factuals that may be worth considering when looking at the relationship between age, gpa, and income.
# What is everyone in the class had a lower/higher gpa?
df$gpa2 = df$gpa + 0.55
# what if one person had a lower/higher gpa?
df$gpa2[3] = 1.6
# what if the most recent employee/person had a lower/higher gpa?
df[10,4] = 4.0
With or without looping, what would be the best way to "simulate" a large (1000+) number of regression models in order examine various counter factuals, and then save those results in some data structure? Is there a "counter factual" analysis package which could save me a bit of work?

Resources