iterating a coxph() model using various sets of covariates - r

I'm still a little new to R, so this may be a basic question.
I am looking for risk estimates for a joint-cox model using coxph(). I have to iterate the model for about 60 times using various combinations of variables. Since each iteration of the model will have different covariates (and main exposures), I want to write one function to do it. In the age-adjusted model I just had the main exposure, everything runs fine. I can add the covariates, it runs... I just need a way to write a single function where the "covars" can be whatever I put into the function call.
Note: this is a simplified version, it runs just fine, I just want to make it work without writing out 60 unique iterations of it.
subtype <- function(expo, covars){
temp <- coxph(Surv(FAIL, OUTCOME) ~ joint[[expo]]*strata(EVENT2)+
covars+
cluster(ID)+strata(AGE_INT),
na.action=na.exclude,
data=joint)
return(summary(temp))
}
results <- subtype("RACE", covars=...)
results2 <- subtype("GENDER", covers=...
When I did this macro programing in SAS, it was easy.
Thank you for your help.

Related

How to remove the models that failed convergence from a set of random questions?

I want to include some random replications of model estimations (e.g., GARCH model) in the question. The code uses a different data series randomly. In this process, some GARCH estimations for some random data series may not achieve numerical convergence. Therefore, I need to code the question/problem in such a way that it has to remove the models that failed convergence from the set of questions. How can I code this when I use R-exams?
Basic idea
In general when using random data in the generation of exercises, there is a chance that sometimes something goes wrong, e.g., the solution does not fall into a desired range (i.e., becomes too large or too small), or the solution does not even exist due to mathematical intractability or numerical problems (as you point out) etc.
Of course, it is best to avoid such problems in the data-generating process so that they do not occur at all. However, it is not always possible to do so or not worth the effort because problems occur very rarely. In such situations I typically use a while() loop to re-generate the random data if necessary. As this might run potentially for several iterations it is important, though, to make the probably sufficiently small that it is needed.
Worked example
A worked example can be found in the fourfold exercise that ships with the package. It randomly generates a fourfold table with probabilities that should subsequently be reconstructed from partial information in the actual exercise. In order for the exercise to be well-defined all entries of the table must be (strictly) between 0 and 1 and they must sum up to 1. The simulation code actually tries to assure that but edge cases might occur. Rather than writing more code to avoid these edge cases, a simple while() loop tries to catch them and sample a new table if needed:
ok <- FALSE
while(!ok) {
[...generate probabilities...]
tab <- cbind(c(prob1, prob3), c(prob2, prob4))
[...compute solutions...]
ok <- sum(tab) == 1 & all(tab > 0) & all(tab < 1)
}
Application to catching errors
The same type of strategy could also be used for other problems such as the ones you describe. You can wrap the model estimation into a code like
fit <- try(mymodel(...), silent = TRUE)
and then use something like
ok <- !inherits(fit, "try-error")
In addition to not producing an error you might require, say that all coefficients are positive (or something like that). Then you would do:
ok <- !inherits(fit, "try-error") && all(coef(fit) > 0)
Analogously, you could check the convergence of the model etc.

How to use get() to construct a complicated model of variables

I know how to use get() to construct an model on the fly from a variable, for example:
dvar="myResponse"
ivar="someIndependentVariable"
family="binomial"
myGLM <- glm(data=ds, get(dvar) ~ get(ivar),family=myFamily)
This is handy for looping through a list of variables, of course -- you could feed it a list of independent variables in a for() loop, and look at a number of different models. My question is, how would I use get(), eval(), or some similar commands to create more complex calls? For example, suppose I have two independent variables in a list:
dvar="myResponse"
ivar=c("independentVar1","independentVar2")
and what I want, in the end, is this:
myGLM<-glm(data=ds, myResponse ~ independentVar1 + independentVar2)
I know I could do this with three get() statements, given that I only have 1 dependent and 2 independent variables, but is there a general way to do it for an n-item list of independent variables? Basically, what I'm up to is something like stepwise regression, but I'm not happy with any of the existing options in caret, MASS, and so forth.
You want ?reformulate ...
dvar="myResponse"
ivar <- c("independentVar1","independentVar2")
form <- reformulate(ivar, response=dvar)
glm(form, myFamily = family_string, data= ...)
As a general rule,
solutions that use reformulate() or those that manipulate the formula directly (with quote(), substitute(), as.symbol() etc.) are more idiomatic/safer/more robust than ...
string-based solutions (deparse()/as.formula()) which are idiomatic/safer/more robust than ...
solutions with [m]get(), eval(), etc ...
(I'm actually cheating on this hierarchy a little bit here since reformulate() is actually string-based, but since it's a built-in function ...)

For Loop with MCMCglmm Regression

I've looked at some of the answers for this question already, there were only two I found helpful and I still cannot get my loop to execute. I am struggling to use a fixed formula for the MCMCglmm package. I have a lot of models to test with this package, and I would like to make a loop to make the work easier. Each time I run MCMCglmm my intention is to do so with a "fixed" formula, and through each iteration of the loop I want to change one of the variables and input a modified version of the "fixed" formula. Here is my code thus far:
for (i in 5:10){
fixed <- as.formula(paste(as$area_pva ~ as$apva_1yr + as$year + as.numeric(unlist(as[i]))))
print(fixed)
model <- MCMCglmm(fixed=fixed,
rcov=~units, family="gaussian",
data=as,start=NULL, prior=NULL, random=NULL, tune=NULL,
pedigree=NULL, nodes=NULL, scale=FALSE, nitt=30000,
thin=30, burnin=1000, pr=TRUE, pl=TRUE, verbose=TRUE,
DIC=TRUE, singular.ok=FALSE, saveX=TRUE, saveZ=TRUE,
saveXL=TRUE, slice=FALSE, ginverse=NULL)
summary(model)
}
Please, if you can help me make this loop execute properly I would appreciate it.
Never mind, I've got the answer. I needed to make the whole formula a series of strings, like this:
fixed <- as.formula(paste("as$area_pva~as$apva_1yr+as$year+", colnames(as)[i], sep=""))
It works perfectly now.

How to make a for loop to find interactions between several variables in R?

I have a data set with 17 variables
the data is available at this link
http://www.uwyo.edu/crawford/stat3050/final%20project/maxwellchandler.txt
I want to find significant interactions between the variables.
For example
fitcivilian<-lm(Civilian~Stock+Terrorism+log(Firepower)+Payload+Bombs*Temperature+FirstAid+Spies+Personnel+IG88, data=data)
where Bombs*Temperature is significant
What I want to do is test EVERY varaible against EVERY OTHER variable,
Like doing
Bombs*Temperature
Bombs*Napalm
IG88* Weapons
Missles*Firepower
etc. Till every combination of two is exhuasted
That way, I could find out if there are significant interactions between every variable.
I know how to do it manually, creating a linear model and then taking a summary of that model but I want to be able to create a loop that tests every variable because it would be a lot of entries to test everything.
I had done something similar. You will need to modify the loop for your need. Let me know if you need help with that.
vars=colnames(mydata)[-1]
for (i in vars) {
for (j in vars) {
if (i != j) {
factor= paste(i,j,sep='*')}
lm.fit <- lm(paste("Sales ~", factor), data=mydata)
print(summary(lm.fit))
}}

Combining ROCR performance objects

I have multiple performance objects created using ROCR. Each of these contain auc or fpr/tpr values for a class. In turn they have results for multiple test runs. So,
length(first.perf.obj#y.values)
gives something > 1.
I can plot average for a single class using
plot(first.perf.obj, avg="vertical")
as described in the ROCR manual. I want to combine these objects to calculate and plot their global average. Something like
global.perf.obj <- combine.perf.objects(first.perf.obj, second.perf.obj, third.perf.obj)
Is there an easy way to do this, or should I decompose each object and calculate values by hand?
I went back recreating prediction objects for the global case.
I'm calling the prediction function like
global.prediction <- prediction(c(cls1.likelihood,
cls2.likelihood,
cls3.likelihood,
cls4.likelihood,
cls5.likelihood),
c(duplicate.cols(cls1.labels, ncol(cls1.likelihood)),
duplicate.cols(cls2.labels, ncol(cls2.likelihood)),
duplicate.cols(cls3.labels, ncol(cls3.likelihood)),
duplicate.cols(cls4.labels, ncol(cls4.likelihood)),
duplicate.cols(cls5.labels, ncol(cls5.likelihood))),
label.ordering=c(FALSE, TRUE))
for duplicate.cols simply builds a data.frame of repeating labels.
Then I'm able to get any statistic for the global case by e.g. performance(global.prediction, "auc")
It's a bit slow, but I think it's simpler than trying to combine values from multiple performance objects.

Resources