How make a emmeans plot with interaction and many factors - plot

I would like to make a emmeans plot taking all behaviour variable into account, which are columns, related to two other columns. The behaviour variables were create based on scale, from 1 to 10.
colA (numeric) | col B --------- to -------- J (factor) | col H (2 ordinal/factors)
I have made some clmm models on each behaviour variable in interaction with colA, and I don't know if I have to take all models to make emmeans plot or if I can just take the basic data of my dataset ?
Then, I would like to have in x-axis the scale of behaviour variable and in y-axis all the behaviour variables group by colH. I tried this but obviously it doesn't work..
Beh <- as.matrix(mydata[,c(6:33)]) #select all behaviour variables
Pers.emmeans <- emmeans(mydata,colA~Beh, by= colH)
plot(Pers.emmeans, comparison = TRUE) +theme_bw()
I also tried to check on internet how I can manage this but nothing about code
Please any ideas ?

Related

How can I visualize an interaction in cox model in r?

I fitted a model and had a significant interaction effect. How can I plot this in a graphic?
It follows a toy example (only for illustration purposes):
library(survival)
# includes bladder data set
library(survminer)
fit2 <- coxph(Surv(stop, event) ~ rx*enum, data = bladder )
# It plots only one single curve
ggadjustedcurves(fit2, data = bladder, variable = "rx")
I would like something like these:
ggadjustedcurves(fit2, data = bladder, variable = "rx") +
facet_wrap(~enum)
ggadjustedcurves(fit2, data = bladder, variable = c("enum","rx"))
It would be nice that the answer would work both for categoricalxcategorical interaction and categorical versus continuous interaction.
Categorical x categorical
If you consider your variables categorical, in variable "rx" you have 2 groups and in variable "enum" 4 groups, which gives you a total of 8 curves.
(1) One way to visualize them would be to plot all curves on the same graph:
bladder$rx_enum <- paste(as.character(bladder$rx), as.character(bladder$enum), sep="_")
ggadjustedcurves(fit2, data = bladder, method='average', variable = "rx_enum")
This is probably not the most elegant way, and you would also have to adjust the colours/linetypes to look nicer. I would probably try to set the line type according to "rx" and color according to "enum" in this case. Modifying color is relatively easy with palette-argument:
ggadjustedcurves(fit2, data = bladder, method='average', palette = c(1,2,3,4,1,2,3,4), variable = "rx_enum")
...while modifying line type is probably more tricky.
(2) Obviously, you can also make separate panels for different levels of either of variables. With "rx" variable you´ll have a panel for dataframe subset where "rx"==1 and another where "rx"==2. I probably wouldn´t use separate panels/graphs because you can visually represent all of the information on one plot - unless it is necessary/justified by your narrative. But if you want to go that way, let me know.
Categorical x Continuous
The same approach will work with continuous variable as well, if you categorize it. I am not sure how one could make a KM for continuous variable while keeping it continuous (not sure how it is even possible).
NB: This answer considered only the KM-plots which are the most common for survival analysis, but there are probably other options as well.

How do I perform a TukeyHSD like test on a GLM in R?

I'm trying to analyse a glm I created in R and what I'd like to do is get a pairwise comparison of which of my factors are significantly difference from eachother similar to the TukeyHSD test for Anovas. However I have been told that TukeyHSD does not work for GLM's. After doing some research I've found a couple of options and I'm not sure which one is correct or applicable those are the glht or contrast commands.
Here is the code for the GLM.
glm.mod <- glm(as.numeric(Ostra..Avg.body.size) ~ as.factor(Macrophytes)*as.factor(Leaves)*as.factor(MacrophyteintLeaves) = 'gaussian', data = main)
The body size variable is what I expect to change based on my factors of Macrophytes (which is the presence/absence of species of macrophyte with the options Without for none, C or E for different species). and the leaves have three options of without, q1 , q2 )
Here is an example of what my data looks like (with made up values)
Macrophyte Leaves Animals CODE Ostra. Avg body size
Without Q1 N 1 11000
E Q2 Y 2 11853
C without N 3 13422
Without Q1 Y 4 13838
How would I get an output that shows me if there is an effect for example
without Macrophytes Q1 Leaf - without Macrophytes Q2 leaf (Then a value denoting if they have are significantly different from each other such as a p value).
Any help would be greatly appreciated and thanks in advance.If there is any important info I have missed please tell me.
With a numeric response variable, and one (or several) categorical predictors, I would typically use the following function to get pairwise comparisons of significance for each main effect (for example, Macrophytes alone; Leaves alone) and interaction effects:
TukeyHDS(aov(as.numeric(Ostra..Avg.body.size) ~ as.factor(Macrophytes)*as.factor(Leaves)*as.factor(MacrophyteintLeaves)

R Survival: KM plots and statistics on two of three group values

brand new and could really use some help.
I'm working a cancer cohort in the 'survival' package. For each patient, there are values for survival (continuous, in days), censor (0/1) information, as well as about 30 genes with values of "High", "Low", or "Neither" for each gene.
When I've been working with just two category values, running the survival analysis to get the Kaplan-meier plots and log-rank test values is straightforward for me. My brain has been melting when trying to figure our the correct code to compare only two of three groups (i.e. "High" vs. "Low" for a particular gene). If I use survdiff for say ~data$geneA it will only return the statistic comparing all three groups. I basically want to exclude the "Neither" group. Can someone help me with the code to do a survdiff function to specify the test on only two groups of a certain value when there are 3 (or more) values.
Similarly, (but less of an issue as I can set the "Neither" group to the transparent color), how to I generate the KM plots for only two groups?
Thanks much,
Edit: Some of the code I'm currently using
> B<-read.table("~/Desktop/breast.csv",header=T,sep=",")
> BR1=survfit(Surv(B$death,B$status)~B$GeneA)
> BR2=survfit(Surv(B$death,B$status)~B$GeneB)
And so on for genes to 30. Then for the statistics and KM curves:
> survdiff(Surv(B$death,B$status)~B$GeneB,rho=0)
> plot(BR1, xlim=c(0,3000), col=c("yellow3", "blue3", "transparent"))
I understand I can use 'subset' to define one value, such as
> BR3=survfit(Surv(B$death,B$status)~B$GeneB, subset=B$GeneB=="High")
But does can it work with two values? Doing what makes logical sense to me:
> BR4=survfit(Surv(B$death,B$status)~B$GeneB, subset=B$GeneB==c("Low", "High")
Doesn't work correctly, it splits up one of the groups into two?

lmList diagnostic plots - is it possible to subset data during a procedure or do data frames have to be subset and then passed in?

I am new to R and am trying to produce a vast number of diagnostic plots for linear models for a huge data set.
I discovered the lmList function from the nlme package.
This works a treat but what I now need is a means of passing in a fraction of this data into the plot function so that the resulting plots are not minute and unreadable.
In the example below 27 plots are nicely displayed. I want to produce diagnostics for much more data.
Is it necessary to subset the data first? (presumably with loops) or is it possible to subset within the plotting function (presumably with some kind of loop) rather than create 270 data frames and pass them all in separately?
I'm sorry to say that my R is so basic that I do not even know how to pass variables into names and values together in for loops (I tried using the paste function but it failed).
The data and function for the example are below – I would be picking values of Subject by their row numbers within the data frame. I grant that the 27 plots here show nicely but for sake of example it would be nice to split them into say into 3 sets of 9.
fm1 <- lmList(distance ~ age | Subject, Orthodont)
# observed versus fitted values by Subject
plot(fm1, distance ~ fitted(.) | Subject, abline = c(0,1))
Examples from:
https://stat.ethz.ch/R-manual/R-devel/library/nlme/html/plot.lmList.html
I would be most grateful for help and hope that my question isn't insulting to anyone's intelligence or otherwise annoying.
I can't see how to pass a subset to the plot.lmList function. But, here is a way to do it using standard split-apply-combine strategy. Here, the Subjects are just split into three arbitrary groups of 9, and lmList is applied to each group.
## Make 3 lmLists
fits <- lapply(split(unique(Orthodont$Subject), rep(1:3, each=3)), function(x) {
eval(substitute(
lmList(distance ~ age | Subject, # fit the data to subset
data=Orthodont[Orthodont$Subject %in% x,]), # use the subset
list(x=x))) # substitue the actual x-values so the proper call gets stored
})
## Make plots
for (i in seq_along(fits)) {
dev.new()
print(plot(fits[[i]], distance ~ fitted(.) | Subject, abline = c(0,1)))
}

Multiple comparisions using glht with repeated measure anova

I'm using the following code to try to get at post-hoc comparisons for my cell means:
result.lme3<-lme(Response~Pressure*Treatment*Gender*Group, mydata, ~1|Subject/Pressure/Treatment)
aov.result<-aov(result.lme3, mydata)
TukeyHSD(aov.result, "Pressure:Treatment:Gender:Group")
This gives me a result, but most of the adjusted p-values are incredibly small - so I'm not convinced the result is correct.
Alternatively I'm trying this:
summary(glht(result.lme3,linfct=mcp(????="Tukey")
I don't know how to get the Pressure:Treatment:Gender:Group in the glht code.
Help is appreciated - even if it is just a link to a question I didn't find previously.
I have 504 observations, Pressure has 4 levels and is repeated in each subject, Treatment has 2 levels and is repeated in each subject, Group has 3 levels, and Gender is obvious.
Thanks
I solved a similar problem creating a interaction dummy variable using interaction() function which contains all combinations of the leves of your 4 variables.
I made many tests, the estimates shown for the various levels of this variable show the joint effect of the active levels plus the interaction effect.
For example if:
temperature ~ interaction(infection(y/n), acetaminophen(y/n))
(i put the possible leves in the parenthesis for clarity) the interaction var will have a level like "infection.y:acetaminophen.y" which show the effect on temperature of both infection, acetaminophen and the interaction of the two in comparison with the intercept (where both variables are n).
Instead if the model was:
temperature ~ infection(y/n) * acetaminophen(y/n)
to have the same coefficient for the case when both vars are y, you would have had to add the two simple effect plus the interaction effect. The result is the same but i prefer using interaction since is more clean and elegant.
The in glht you use:
summary(glht(model, linfct= mcp(interaction_var = 'Tukey'))
to achieve your post-hoc, where interaction_var <- interaction(infection, acetaminophen).
TO BE NOTED: i never tested this methodology with nested and mixed models so beware!

Resources