How can I visualize an interaction in cox model in r? - r

I fitted a model and had a significant interaction effect. How can I plot this in a graphic?
It follows a toy example (only for illustration purposes):
library(survival)
# includes bladder data set
library(survminer)
fit2 <- coxph(Surv(stop, event) ~ rx*enum, data = bladder )
# It plots only one single curve
ggadjustedcurves(fit2, data = bladder, variable = "rx")
I would like something like these:
ggadjustedcurves(fit2, data = bladder, variable = "rx") +
facet_wrap(~enum)
ggadjustedcurves(fit2, data = bladder, variable = c("enum","rx"))
It would be nice that the answer would work both for categoricalxcategorical interaction and categorical versus continuous interaction.

Categorical x categorical
If you consider your variables categorical, in variable "rx" you have 2 groups and in variable "enum" 4 groups, which gives you a total of 8 curves.
(1) One way to visualize them would be to plot all curves on the same graph:
bladder$rx_enum <- paste(as.character(bladder$rx), as.character(bladder$enum), sep="_")
ggadjustedcurves(fit2, data = bladder, method='average', variable = "rx_enum")
This is probably not the most elegant way, and you would also have to adjust the colours/linetypes to look nicer. I would probably try to set the line type according to "rx" and color according to "enum" in this case. Modifying color is relatively easy with palette-argument:
ggadjustedcurves(fit2, data = bladder, method='average', palette = c(1,2,3,4,1,2,3,4), variable = "rx_enum")
...while modifying line type is probably more tricky.
(2) Obviously, you can also make separate panels for different levels of either of variables. With "rx" variable you´ll have a panel for dataframe subset where "rx"==1 and another where "rx"==2. I probably wouldn´t use separate panels/graphs because you can visually represent all of the information on one plot - unless it is necessary/justified by your narrative. But if you want to go that way, let me know.
Categorical x Continuous
The same approach will work with continuous variable as well, if you categorize it. I am not sure how one could make a KM for continuous variable while keeping it continuous (not sure how it is even possible).
NB: This answer considered only the KM-plots which are the most common for survival analysis, but there are probably other options as well.

Related

How to do an Exploratory Factor Analysis on data tha some variables are ordinal and others dichotomous in R?

I am trying to do an exploratory factor analysis using a data set that contains items from questionnaires. Some questions were answered and have 4 levels and some had only two. However I am not sure what is the implementation in R when using both types of variables.
I have seen that psych is a good toolbox and have used the ordinal and dichotomous items separately well, but I am not sure how to combine both. The tetrachoric or polyphoric function does not seem to work for example when I am using both data types together.
My code as it stands is this:
poly_cor = polychoric(data)
rho = poly_cor$rho
cor.plot(rho, numbers=T, upper=FALSE, main = "Polychoric Correlation", show.legend = FALSE)
fa.parallel(data, fm="pa", fa="fa", main = "Scree Plot")
poly_model = fa(data, nfactor=4, cor="poly", fm="mle", rotate = "oblimin")
However the polychoric functions says that there is not an equal number of response alternatives.

How does the stratum function work in the clusrank package in R?

I'm working with the clusrank package in R to analyse insect abundance data, by using the clusWilcox.test function for clustered data. As far as I understand, this package allows you to add both a 'cluster' and a 'stratum' function when using the rgl method to cluster by multiple factors.
When adding a single factor as either only a cluster or only a stratum function to my code, the Z- and p-value is the same for both codes, which seems to indicate that the stratum function works. However, when I take the first factor as a cluster, and add a second, different one as stratum, the output is still identical to the cluster-only model. This makes me think only the cluster is taken into account, and the stratum function is ignored.
This problem should be reproducible by making a random test dataset (in this example called df) with four columns: the dependent variable (in my case 'abundance'), the grouping factor of which I want to know the effect (in my case 'treatment'), and two factors to add as cluster/stratum, let's call them 'factorA' and 'factorB'. In my own testdataset the factors have 2 levels each, in my real dataset 6 levels each, and the problem arises in both datasets.
My code is then as follows:
clusWilcox.test(abundance ~ treatment + cluster(factorA), data = df, method = "rgl")
Which gives the same Z- and p-value as adding factorA as stratum, with as only difference that number of clusters is now the number of rows in the testdataset, instead of the number of factor levels.
clusWilcox.test(abundance ~ treatment + stratum(factorA), data = df, method = "rgl")
And both exactly the same Z- and p-values as:
clusWilcox.test(abundance ~ treatment + cluster(factorA) + stratum(factorB), data = df, method = "rgl")
Which makes me think that the stratum function is ignored in this third line of code. If you switch factorA and factorB, the same problem arises, though with different output values, as the calculation is now based on factorB instead of factorA.
Does anyone know what happens here? Is my code wrong, or is the stratum function indeed not taken into account?

Visualising variance from random effects in a mixed model by group

I have run a linear mixed model in R using lmer. I am attempting to visualise the random effect structure. To produce a graph I have used print(dotplot(ranef(RT.model.4, condVar=T))[['part_no']]) where part_no is the random effect from the mixed model. It creates something like this:
This is great. However I want to be able to visually tell the difference between my two groups of participants (the random effect being discussed) in the graph. I have group A and group B. In my dataset I have a column for participant type and for each row it gives a value of A or B.
I would like to either colour code the graph to show participants from groups A and B. Or perhaps better would be to create two separate panels, one for each group.
Any suggestions on how to do this would be very much appreciated.
This is a way using ggplot rather than lattice (just because I am more familiar with it) using code from the examples in ?dotplot.ranef.mer. You need to match your treatment group in the data to the random effects grouping variables returned by ranef. I don't see how this can be done automatically within dotplot.ranef.mer.
Create a small example with a treatment group; each subject is assigned to one treatment group.
library(lme4)
library(ggplot2)
sleepstudy$trt = as.integer(sleepstudy$Subject %in% 308:340)
m = lmer(Reaction ~ trt + (1|Subject), sleepstudy)
Convert the random effects to a dataframe and match in the treatment groups
dd = as.data.frame(ranef(m, condVar=TRUE), "Subject")
dd$trt = with(sleepstudy, trt[match(dd$grp, Subject)])
You can then plot how you want, say using facet_'s or assigning a colour to each group, or ...
ggplot(dd, aes(y=grp,x=condval, colour=factor(trt))) +
geom_point() + facet_wrap(~term,scales="free_x") +
geom_errorbarh(aes(xmin=condval -2*condsd,
xmax=condval +2*condsd), height=0)
ggplot(dd, aes(y=grp,x=condval)) +
geom_point() +
geom_errorbarh(aes(xmin=condval -2*condsd,
xmax=condval +2*condsd), height=0)+
facet_wrap(~trt)
You should be able to use the groups= option in dotplot(). Assuming your data is in a dataframe called df with the group variable being in group, you could use
print(dotplot(ranef(RT.model.4, condVar=T), groups=df$group)[['part_no']])

Vegan RDA and biplot, remove values contributing >10% of variance

I am using the vegan package to do RDA and want to plot the data using biplot. In my data I have hundreds of values. What I would like to do is limit the variance explained to a set limit so in the example below to 0.1. So instead of having 44 of arrows I might only have say 8
library (vegan) # Load library
library(MASS) # load library
data(varespec) # Dummy data
vare.pca <- rda(varespec, scale = TRUE) # RDA anaylsis
biplot(vare.pca, scaling = 3,display = "species") # Plot data but includes all
## extracts the percentage##
x =(sort(round(100*scores(vare.pca, display = "sp", scaling = 0)[,1]^2, 3), decreasing = TRUE))
## Plot percentage
plot(length(x):1,sort(x)) # plot rank on value of y
Any help would be appreciated :)
Depending on the size of the data-set it would be possible to use either ordistep or ordiR2step to reducing the amount of "unimportant" variables in your plot (see https://www.rdocumentation.org/packages/vegan/versions/2.4-2/topics/ordistep). However, these functions use step-wise selection, which need to be used cautiously. Step-wise selection can select your included parameters based on AIC values, R2 values or p-values. It does not not select values based on the importance of these for the purpose of your question. It also does not mean that these variables have any meaning towards organisms or biochemical interactions. Nevertheless, step-wise selection can be helpful giving an idea on which parameters might be of strong influence on the overall variation in the data-set. Simple example below.
rda0 <- rda(varespec ~1, varespec)
rda1 <- rda(varespec ~., varespec)
rdaplotp <- ordistep(rda0, scope = formula(rda1))
plot(rdaplotp, display = "species", type = "n")
text(rdaplotp, display="bp")
Thus, by using the ordistep function the number of species displayed in the plot has been greatly reduced (see Fig 1 below). If you want to remove more variables (which I do not suggest) an option could be to look at the output of the biplot and throw out the variables which have the least amount of correlation with the principle components (see below), but I would advise against it.
sumrda <- summary(rdaplotp)
sumrda$biplot
What would be wise, is to first check which question you want to answer and see if any of the included variables could be left out on forehand. This would already reduce the amount. Minor edit: I am also a bit confused why you want to remove parameters strongly contributing to your captured variation.

text rpart decision tree model -- how to suppress long list of values at each split node

I create a decision tree model with all categorical variables. Some of these categorical variables has over 100 possible values.
Here is my code:
model = rpart(score ~., data = dataset);
plot(model)
text(model)
The problem is that text(model) annotates each split node with a long list of values for the corresponding categorical variable. And the values are squeezed into each other and hard to look at. I am looking for an option for text(model) to display only the variable name and suppress all the values. That way at least the plotted tree is clear and shows which variable are used at each node.
Thanks in advance!
Leo
The prp function in rpart.plot might help?
There are a number of options for plotting different tree layouts and you can abbreviate the split levels using the faclen command.
Something like;
library(rpart.plot)
model = rpart(score ~., data = dataset)
prp(model, faclen = 2)
Might help tidy it up. (Note: Setting faclen to 1 means each factor level will be assigned a single letter in alphabetical order).

Resources