Plot how the estimated survival depends upon the value of a covariate of interest. Problems with relevel - r

I want to plot how the estimated survival from a Cox model depends upon the value of a covariate of interest, while the rest of variables are fixed to their average values (if they are continuous variables) or lowest values for dummy. Following this example http://www.sthda.com/english/wiki/cox-proportional-hazards-model , I have construct a new data frame with three rows, one for each value of my variable of interest; and the other covariates are fixed. Among these covariates I have two factor vectors. I created the new dataset and later it is passed to survfit() via the newdata argument.
When I passed the data frame to survfit(), I obtain the following error message error in relevel.default(occupation) : 'relevel' only for factors. Where is the source of problem? If the source of problem is related to the factor vectors, how I can solve it? Below find an example of the code. Unfortunately, I cannot share the data or find a dataset that produces the same error message:
I have transformed the factor variables into integer vectors in the cox model and in the new dataset. it did not work.
I have deleated all the factor variables and it works.
I have tried to implement this strategy, but it did not work: Plotting predicted survival curves for continuous covariates in ggplot
fit <- coxph(Surv(entry, exit, event == 1) ~ status_plot +
exp_national + relevel(occupation, 5) + age + gender + EDUCATION , data = data)
data_rank <- with(data,
data.frame(status_plot = c(1,2,3), # factor vector of interest
exp_national=rep(mean(exp_national, na.rm = TRUE), 3),
occupation = c(5,5,5), # factor with 6 categories, number 5 is the category of reference in the cox model
age=rep(mean(age, na.rm = TRUE), 3),
gender = c(1,1,1),
EDUCATION=rep(mean(EDUCATION, na.rm = TRUE), 3) ))
surv.fin <- survfit(fit, newdata=data_rank) # this produces the error

Looking at the code it appears you probably attempted to take the mean of a factor. So do post at least str(data) as an edit to the body of your question. You should also realize that you can give a single value to a column in a data.frame call and have it recycled to the correct length, you all the meanss could be entered as a single item rather thanrep`-ng.

Related

R Error in mclogit::mblogit() solve.default(X[[i]], ...) : 'a' (4 x 1) must be square

I am trying to use a multinomial logistic regression model to determine how different factors influence the liklihood of several behavioral states among two species of shark.
19 individual animals, comprising two distinct species, were each tracked for ~100 days each and a different behavioral state was identified for each day data were collected.
I would like to code individual shark as a categorical random effect variable (with 19 levels) within species, a categorical fixed effect variable (with 2 levels).
With this general idea, the code that I am currently trying to run is:
mclogit::mblogit(cluster ~ species, random = ~1|individual %in% species, data = df, method = "MQL")
The model appears to run normally but produces the error message:
Error in *tmp*[[k]] : subscript out of bounds
Reversing the order of the random effect interaction term produces a different error message. Now the code reads:
mclogit::mblogit(cluster ~ species, random = ~1|species %in% individual, data = df, method = "MQL")
And produces the error:
Error in solve.default(X[[i]], ...) : 'a' (6 x 1) must be square
Here is a sample of the raw data with which I am trying to fit my model:
df <- data.frame(
Date = c("2015-11-25", "2016-01-24", "2016-02-27", "2016-03-27", "2017-12-02", "2017-12-06", "2015-10-30", "2015-10-31"),
cluster = factor(c(3,3,4,6,3,1,3,2)),
species = factor(c("I.oxyrinchus", "I.oxyrinchus", "I.oxyrinchus", "I.oxyrinchus", "P.glauca", "P.glauca", "P.glauca", "P.glauca")),
individual = factor(c("141257", "141257", "141254", "141254", "141256", "141256", "141255", "141255")))
Attempting to run the code with this reduced dataset produces only the second of the two error messages.
My questions are two fold:
What are the meanings of these two error messages, and how might I address one or both of them?
Why might the order of the terms in the random effect portion of the model formula produce two different results?
Thank you.

Issue with multiple logistical regresstion code with multiple predictors

I am trying to perform multiple logistical regression with some of the variables that came out as statistically significant for a diseased conditions with univariate analysis. We took the cut off for that as p<0.2 since our sample size was ~300. I made a new dataframe for these variables
regression1df <- data.frame(dgfcriteria, recipientage, ESRD_dx,bmirange,graftnumber, dsa_class_1, organ_tx, transfuse01m, transfuse1yr, readmission1yr, citrange1, switrange, anastamosisrange, donorage, donorgender, donorcriteria, donorionotrope, intubaterange, kdpirange, kdrirange, eptsrange, proteinuria, terminalurea, na.rm=TRUE)
I'm using variables to predict for disease condition, which is DGF (dgfcriteria==1), and non-disease is no DGF (dgfcriteria==0).
Here is structure of the data.
When I tried to run the entire list of variables with the glm code I got:
predictors1 <- glm(dgfcriteria ~.,
data = predictors1df,
family = "binomial" )
Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels.
But when I run it with only some of the variables of the dataframe, there is an output.
predictors1 <- glm(dgfcriteria ~ recipientage + ESRD_dx + bmirange + graftnumber + dsa_class_1 + organ_tx + transfuse01m + transfuse1yr + readmission1yr +citrange1 +switrange + anastamosisrange+ donorage+ donorgender + donorcriteria + donorionotrope,
data = predictors1df,
family = "binomial" )
This output looks really strange though with alot of NAs.
Where have I gone wrong?
Looking at your data structure, you've got a lot of missing values. Quite a few of your variables look to have only 2 or 3 non-missing values in the first 10 rows. When you run regression on data with missing values, the default is to drop all rows that have any missing values.
Apparently some of your data has bad overlaps, so that when all the rows with missing values are dropped (see na.omit(your_data) for what is left over), some variables only have one level left and are therefore no longer fit for regression. Of course, when you only use some variables, fewer rows will be dropped and you may be in a better situation.
So, you'll have to decide what to do with your missing values. This should depend on your goals and your understanding of the reasons for missingness. Common possibilities include omission, imputation, creating new "missing" levels, and taking level of missingness into account in your variable selection.

How would I run an ANOVA in R on this long form data?

My factors are constraint (high or low), picture type type (a,b,c,d), and electrode (29 values).
My dependent variable is the amplitude measured from the electrodes. So it is 2 x 4 x 29. How can I run the ANOVA so that R does not think that each of the electrodes is another measurement but included in the 'electrode' factor?
This is what I tried so far but I get an error
anova1 <- ezANOVA(data=dat, dv=n_400, wid=subject, within=.(constraint, ending, electrode), type="III")
>Error in ezANOVA_main(data = data, dv = dv, wid = wid, within = within, :
One or more cells is missing data. Try using ezDesign() to check your data.

How to obtain Tukey compact letter display from a GLM with interactions

I have set of data that I've analyzed with a generalized linear model that has three categorical factors in 3-way interaction (factorA, factorB, factorC) and a fourth continuous factor (factorD) that is simply added in the model. I am trying to obtain a set of Tukey letter groups (ie, compact letter display) from the model but haven't found a way to include the interaction successfully. I'm not interested in including factorD, just the three in the interaction.
I have gotten the Tukey-adjusted pairwise comparisons with this:
lsmeans(my.glm, factorA*factorB*factorC)
But I was not able to figure out how to produce a compact letters display from that. It can be done with multcomp package but I could only find ways to do it with main effects with that package, not interactions.
So then I tried the agricolae package, as this post (https://stats.stackexchange.com/questions/31547/how-to-obtain-the-results-of-a-tukey-hsd-post-hoc-test-in-a-table-showing-groupe) discusses that that should work. However, following the instructions in that answer led to a non-functional response from HSD.test. Specifically, I could get the main effects tests to work fine, e.g. HSD.test(my.glm,"factorA") but I could not get the interactions to work. I tried this:
intxns<-with(my.data, interaction(factorA,factorB,factorC))
HSD.test(my.glm,"intxns",group=TRUE)
But a get an error that indicates the HSD.test function didn't recognize "intxns" as a valid object, it looks like this (also, I checked the intxns object and it looks good and the number of rows matched the number of residuals of my glm):
Name: inxtns
factorA factorB factorC factorD
I get that same error if I just put nonsense into the factor field in the HSD.test function call. I checked the inxtns object and it looks good and the number of rows matched the number of residu
The agricolae notes don't actually cover the use of interactions in HSD.test, but I assume it can work.
Does anyone know how to get HSD.test to work with interactions? Or is there any other function you've gotten to work to produce compact letter displays for a glm with interactions?
I've been working on this for a number of days now and haven't been able to find a solution, hopefully I'm not missing something obvious.
Thanks!
I don't know how you've specified your glm model, but for HSD.test, it's looking to match the particular treatment name with the same name specified in the glm formula as well as the data frame. This is why your main effect, factorA will work, but not the 3-way interaction. For multiple comparison tests on interactions, I find it easiest to generate the interactions separately and add them to the data frame as additional columns. The glm model can then be specified using the new variables which code for the interaction.
For example,
set.seed(42)
glm.dat <- data.frame(y = rnorm(1000), factorA = sample(letters[1:2],
size = 1000, replace = TRUE),
factorB = sample(letters[1:2], size = 1000, replace = TRUE),
factorC = sample(letters[1:2], size = 1000, replace = TRUE))
# Generate interactions explicitly and add them to the data.frame
glm.dat$factorAB <- with(glm.dat, interaction(factorA, factorB))
glm.dat$factorAC <- with(glm.dat, interaction(factorA, factorC))
glm.dat$factorBC <- with(glm.dat, interaction(factorB, factorC))
glm.dat$factorABC <- with(glm.dat, interaction(factorA, factorB, factorC))
# General linear model
glm.mod <- glm(y ~ factorA + factorB + factorC + factorAB + factorAC +
factorBC + factorABC, family = 'gaussian', data = glm.dat)
# Multiple comparison test
library(agricolae)
comp <- HSD.test(glm.mod, trt = "factorABC", group = TRUE)
giving
comp$groups giving
trt means M
1 a.a.a 0.070052189 a
2 a.b.b 0.035684571 a
3 b.a.a 0.020517535 a
4 b.b.b -0.008153257 a
5 a.b.a -0.036136140 a
6 a.a.b -0.078891136 a
7 b.a.b -0.080845419 a
8 b.b.a -0.115808772 a

R: How to make column of predictions for logistic regression model?

So I have a data set called x. The contents are simple enough to just write out so I'll just outline it here:
the dependent variable, Report, in the first column is binary yes/no (0 = no, 1 = yes)
the subsequent 3 columns are all categorical variables (race.f, sex.f, gender.f) that have all been converted to factors, and they're designated by numbers (e.g. 1= white, 2 = black, etc.)
I have run a logistic regression on x as follows:
glm <- glm(Report ~ race.f + sex.f + gender.f, data=x,
family = binomial(link="logit"))
And I can check the fitted probabilities by looking at summary(glm$fitted).
My question: How do I create a fifth column on the right side of this data set x that will include the predictions (i.e. fitted probabilities) for Report? Of course, I could just insert the glm$fitted as a column, but I'd like to try to write a code that predicts it based on whatever is in the race, sex, gender columns for a more generalized use.
Right now I the follow code which I will hope create a predicted column as well as lower and upper bounds for the confidence interval.
xnew <- cbind(xnew, predict(glm5, newdata = xnew, type = "link", se = TRUE))
xnew <- within(xnew, {
PredictedProb <- plogis(fit)
LL <- plogis(fit - (1.96 * se.fit))
UL <- plogis(fit + (1.96 * se.fit))
})
Unfortunately I get the error:
Error in eval(expr, envir, enclos) : object 'race.f' not found
after the cbind code.
Anyone have any idea?
There appears to be a few typo in your codes; First Xnew calls on glm5 but your model as far as I can see is glm (by the way using glm as name of your output is probably not a good idea). Secondly make sure the variable race.f is actually in the dataset you wish to do the prediction from. My guess is R can't find that variable hence the error.

Resources