Model with Matched pairs and repeated measures - r

I will delete if this is too loosely programming but my search has turned up NULL so I'm hoping someone can help.
I have a design that has a case/control matched pairs design with repeated measurements. Looking for a model/function/package in R
I have 2 measures at time=1 and 2 measures at time=2. I have Case/Control status as Group (2 levels), and matched pairs id as match_id and want estimate the effect of Group, time and the interaction on speed, a continuous variable.
I wanted to do something like this:
(reg_id is the actual participant ID)
speed_model <- geese(speed ~ time*Group, id = c(reg_id,match_id),
data=dataforGEE, corstr="exchangeable", family=gaussian)
Where I want to model the autocorrelation within a person via reg_id, but also within the matched pairs via match_id
But I get:
Error in model.frame.default(formula = speed ~ time * Group, data = dataFullGEE, :
variable lengths differ (found for '(id)')
Can geese or GEE in general not handle clustering around 2 sets of id? Is there a way to even do this? I'm sure there is.
Thank you for any help you can provide.

This is definatly a better question for Cross Validated, but since you have exactly 2 observations per subject, I would consider the ANCOVA model:
geese(speed_at_time_2 ~ speed_at_time_1*Group, id = c(match_id),
data=dataforGEE, corstr="exchangeable", family=gaussian)
Regarding the use of ANCOVA, you might find this reference useful.

Related

Metric for evaluating agreement at inter-rater reliability for a single subject by multiple raters

I'm making a rating survey in R (Shiny) and I'm tryng to find a metric that can evaluate the agreement but for only one of the "questions" in the survey. The ratings range from 1 to 5. There is multiple raters and each rater rates a set of 10 questions according to the ratings.
I've used Fleiss Kappa and Krippendorff Alpha for the whole set of questions and raters and it works but when evaluating each question separately these metrics give negative value. I tried calculating them by hand (formulas) and I still get the same results so I guess that they don't work for a small sample of subjects (in this case a sample of 1).
I've looked at other metrics like rwg in the multilevel package but thus far I can't seem to make it work. According to r documentation:
rwg(x, grpid, ranvar=2)
Where:
x = A vector representing the construct on which to estimate agreement.
grpid = A vector identifying the groups from which x originated.
Can someone explain me what the rwg function expects from me?
If someone know some other agreement metric that might work better please let me know.
Thanks.

Panel Data including Subgroups or Pooled OLS

I am analysizing a dataset which is seperated by countries, but also in age groups and gender cohorts, 5 annual periods are included. An intervention took place inbetween the years.
As the data is sparse, I want to compare the effects of each subgroup , so I may reach statistic significance afterall (e.g. female between 10-20 years for both countries, this variable I will call ID, existing only one number per country).
I have tried panel analysis with the plm package, however, I have tried to index country, year, and ID, but this does not work as it is not unique.
Is it even possible to include country effects but have subgroups of the country? (see code below)
I have tried Difference in differences, by using lmList and saving the coefficients. This for each subgroup seperated through the ID´s. (see code below)
This has worked, but through limited periods, no statistic significance is reached, even though the coefficients are all same direction. So I wonder if there is a possibility of combining those models again, and by that reaching reliable results?
1. fixed <- plm(FE ~ x , data=df, index=c("ID","country", "year"), model="within")
2. list <- coef(lmList(y~ treated + time + did | ID, data=df))
Error from 1.
duplicate couples (id-time)
In addition: Warning messages:
1: In pdata.frame(data, index) :
duplicate couples (id-time) in resulting pdata.frame
to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany")
2: In is.pbalanced.default(index[[1]], index[[2]]) :
duplicate couples (id-time)
For 2.
I do get a dataframe which contains all coefficients, but any ideas how I could properly summerize or display those? Just taking the mean of a coefficient seems to be a bit low-skilled.
Any help highly appreaciated.
I adress the first issue (just coding). plm requires a panel structure of index=c("individual", "time"). But you can define a new ID for whatever your unit of observation is. Here you can combine those numerical variables from your identifier with dplyr:
library(dplyr) df <- transform(df,GID=paste0(ID,country))
library(plm) summary(plm(y ~ x, index=c("GID", "year"), data = df,
model = "within"))
In general, you can define all other kind of observational groups. Is your "ID" numeric or string? You should add a more detailed data description or give some example data.

Count-process datasets for Non-proportional Hazard (Cox) models with interaction variables

I am trying to run a nonproportional cox regression model featuring an interaction-with-time variable, as described in Chapter 15 (section 15.3) of Applied Longitudinal Data Analaysis by Singer and Willett. However I cannot seem to get answers that agree with the book.
The data used in this book and source code is supplied at this fantastic website. Unfortunarely no R code is supplied for the final chapter and the supplied dataset for R for the example discussed in-text is incomplete and provides incorrect answers for the simplest model (which I do know how to run). Instead, to obtain the complete dataset for this example, one must click the 'Download' link in the 'SAS' column (which has the correct dataset) and then, after installing the haven package (which allows one to read in foreign data formats), read in the dataset in question via:
haven::read_sas("alda/lengthofstay.sas7bdat")
This dataset indicates participants' (variable ID) length of stay (variable DAYS) in inpatient treatment in a hospital. The censoring variable is CENSOR. The researchers hypothesised that two different types of treatment (binary variable TREAT) would predict differential values of hazard of checking out of treatment. In addition they anticipated that the between-group difference in hazard would not be constant over time, therefore requiring the creation of an interaction term. I can get the simple main effect model to work, returning the same hazard coefficients reported in the book (which is how i eventually found out the .csv file supplied with the R code was incomplete).
summary(modA <- coxph(Surv(DAYS,1-CENSOR) ~ TREAT, data = los))
coef exp(coef) se(coef) z Pr(>|z|)
TREAT 0.1457 1.1568 0.1541 0.945 0.345
I tried to follow the procedure laid out here, and here, and the sources listed therein (e.g. Therneau vignette on time-varying covariates in the survival package), and, of course, when I am copy-pasting someone else's code and running that it all works fine. But I am trying to do this for myself from scratch with a dataset whose results I can compare against mine. And I just can't make it work.
first I created an EVENT variable
los$EVENT <- 1 - los$CENSOR
there is a duplicate id number in the dataset that causes issues. So we have to change it to a new ID number
los$ID[which(duplicated(los$ID))] <- 842
Now, based on what I read here and here the dataframe needs to be split so that, for every participant, there is one row indicating the EVENT status at every point prior to their event (or censorship) time when any other participant experienced an event. Therefore we need to create a vector of all the unique event times, then split the dataset on those event times
cutPoints <- sort(unique(los$DAYS[los$EVENT == 1]))
# now split the dataset
longLOS <- survSplit(Surv(DAYS,EVENT)~ ., data = los, cut = cutPoints)
# and (just because I'm anal) rename the interval upper bound column (formerly "DAYS")
names(longLOS)[5] <- "tstop"
When I looked at this dataset it appeared to be what I was after, with (1) as many rows for each participant as there are intervals prior to their event time when anyone else in the dataset experienced an event, (2) two columns indicating the lower and upper bounds of each interval, and (3) an event column with a 0 for all rows when the respondent did not experience the event, and a 1 in the final row when they either did experience the event or were censored.
Next I created the interaction-with-time variable, subtracting 1 from the 'interval upper bound' column so that main effect of TREAT represents the treatment effect on the first day of hospitalisation.
longLOS$TREATINT <- longLOS$EVENT*(longLOS$tstop - 1)
And ran the model
summary(modB <- coxph(Surv(tstart, tstop, EVENT) ~ TREAT + TREATINT, data = longLOS))
But it doesn't work! I got the (fairly unhelpful) error message
Error in fitter(X, Y, strats, offset, init, control, weights = weights, :
routine failed due to numeric overflow.This should never happen. Please contact the author.
What am I doing wrong? I have been slowly working through Singer and Willett for almost three years (I started while still a grad student), and now the final chapter is proving to be by far my greatest challenge. I have thirty pages to go; any help would be incredibly appreciated.
I figured out what I was doing wrong. A stupid error when I created the interaction variable TREATINT. instead of
longLOS$TREATINT <- longLOS$EVENT*(longLOS$tstop - 1)
it should have been
longLOS$TREATINT <- longLOS$TREAT*(longLOS$tstop - 1)
Now when you run the model
summary(modB <- coxph(Surv(tstart, tstop, EVENT) ~ TREAT + TREATINT, data = longLOS))
Not only does it work, it yields coefficients that match those reported in the Singer and Willett book.
coef exp(coef) se(coef) z Pr(>|z|)
TREAT 0.706411 2.026705 0.292404 2.416 0.0157
TREATINT -0.020833 0.979383 0.009207 -2.263 0.0237
Given how dumb my mistake was I was tempted to just delete this whole post but I think I'll leave it up for others like me who want to know how to do interaction with time Cox models in R.

Nested model in R

I'm having a huge problem with a nested model I am trying to fit in R.
I have response time experiment with 2 conditions with 46 people each and 32 measures each. I would like measures to be nested within people and people nested within conditions, but I can't get it to work.
The code I thought should make sense was:
nestedmodel <- lmer(responsetime ~ 1 + condition +
(1|condition:person) + (1|person:measure), data=dat)
However, all I get is an error:
Error in checkNlevels(reTrms$flist, n = n, control) :
number of levels of each grouping factor must be < number of observations
Unfortunately, I do not even know where to start looking what the problem is here.
Any ideas? Please, please, please? =)
Cheers!
This might be more appropriate on CrossValidated, but: lme4 is trying to tell you that one or more of your random effects is confounded with the residual variance. As you've described your data, I don't quite see why: you should have 2*46*32=2944 total observations, 2*46=92 combinations of condition and person, and 46*32=1472 combinations of measure and person.
If you do
lf <- lFormula(responsetime ~ 1 + condition +
(1|condition:person) + (1|person:measure), data=dat)
and then
lapply(lf$reTrms$Ztlist,dim)
to look at the transposed random-effect design matrices for each term, what do you get? You should (based on your description of your data) see that these matrices are 1472 by 2944 and 92 by 2944, respectively.
As #MrFlick says, a reproducible example would be nice. Other things you could show us are:
fit the model anyway, using lmerControl(check.nobs.vs.nRE="ignore") to ignore the test, and show us the results (especially the random effects variances and the statement of the numbers of groups)
show us the results of with(dat,table(table(interaction(condition,person))) to give information on the number of replicates per combination (and similarly for measure)

Multiple comparisions using glht with repeated measure anova

I'm using the following code to try to get at post-hoc comparisons for my cell means:
result.lme3<-lme(Response~Pressure*Treatment*Gender*Group, mydata, ~1|Subject/Pressure/Treatment)
aov.result<-aov(result.lme3, mydata)
TukeyHSD(aov.result, "Pressure:Treatment:Gender:Group")
This gives me a result, but most of the adjusted p-values are incredibly small - so I'm not convinced the result is correct.
Alternatively I'm trying this:
summary(glht(result.lme3,linfct=mcp(????="Tukey")
I don't know how to get the Pressure:Treatment:Gender:Group in the glht code.
Help is appreciated - even if it is just a link to a question I didn't find previously.
I have 504 observations, Pressure has 4 levels and is repeated in each subject, Treatment has 2 levels and is repeated in each subject, Group has 3 levels, and Gender is obvious.
Thanks
I solved a similar problem creating a interaction dummy variable using interaction() function which contains all combinations of the leves of your 4 variables.
I made many tests, the estimates shown for the various levels of this variable show the joint effect of the active levels plus the interaction effect.
For example if:
temperature ~ interaction(infection(y/n), acetaminophen(y/n))
(i put the possible leves in the parenthesis for clarity) the interaction var will have a level like "infection.y:acetaminophen.y" which show the effect on temperature of both infection, acetaminophen and the interaction of the two in comparison with the intercept (where both variables are n).
Instead if the model was:
temperature ~ infection(y/n) * acetaminophen(y/n)
to have the same coefficient for the case when both vars are y, you would have had to add the two simple effect plus the interaction effect. The result is the same but i prefer using interaction since is more clean and elegant.
The in glht you use:
summary(glht(model, linfct= mcp(interaction_var = 'Tukey'))
to achieve your post-hoc, where interaction_var <- interaction(infection, acetaminophen).
TO BE NOTED: i never tested this methodology with nested and mixed models so beware!

Resources