I've made three Cox Regression models predicting for an outcome. I've the concordence index of all of them, but I want to compare them. In multiple articles, I've read the DeLong's test is the way to go. However, it does not work for me.
If used
coxph(Surv(daysprim, prim_outcome1)~variable1 + variable2, data= DB) as the creator of the models.
How can I compare the C-statistics of the different non-nested models?
Related
I am currently running some linear models and lmer (with replicate as a random effect) for continuous data and a glm and glmer (again, replicate as a random effect) for count data.
I was wondering if a lm, lmer, glm and glmer all need the data to be normally distributed and if not, do I need an alternative test?
Also, I have run a glm and looked at the pairwise differences and when reporting it other than P<0.001 I don't know what else I should report! As my glm output doesn't really give me that much. Thanks!
I usually use R to make my own statistical models based on data that I have.
However, I have recently read about a logistic regression model in a scientific publication and I want to replicate this model to make predictions on some of my own data, which includes the same variables.
Is there a way to "declare" a model in R, based on the coefficients published in the paper?
I am currently working on regression analysis to model EQ-5D-5L health data. This data is inflated at the upper bound i.e. 1, and one of the approaches I use to model is with two-part models. By doing that, I combine a logistic model with binary data (1 or not 1), and continuous data.
The issue comes when trying to cross-validate (K-fold) the two-part models: I cannot find a way to include both "parts" of the model in the caret package in R, and I have not been able to find anybody that has solved the problem for me.
When I generate the predictions from the two-part model, it is essentially the coefficients from the two separate models that are multiplied together. So the models are developed separately, as they model different things from the same variable (binary and continuous outcome), but joined together when used to predict values.
Could it be possible to somehow cross-validate each part of the model separately, and get some kind of useful answer out of it?
Hope you guys can help.
I have a data structure with binary 0-1 variable (click & Purchase; click & not-purchase) against a vector of the attributes. I used logistic regression to get the probabilities of the purchase. How can I use Random Forest to get the same probabilities? Is it by using Random Forest regression? or is it Random Forest classification with type='prob' in R which gives the probability of categorical variable?
It won't give you the same result since the structure of the two method are different. Logistic regression is given by a definitive linear specification, where RF is a collective vote from multiple independent/random trees. If specification and input feature are properly tuned for both, they can produce comparable results. Here is the major difference between the two:
RF will give more robust fit against noise, outliers, overfitting or multicollinearity etc which are common pitfalls in regression type of solution. Basically if you don't know or don't want to know much about whats going in with the input data, RF is a good start.
logistic regression will be good if you know expertly about the data and how to properly specify the equation. Or somehow want to engineer how the fit/prediction works. The explicit form of GLM specification will allow you to do that.
Is there a way in SAS to compare two regression models using ANOVA. What i want to replicate is - in R if i have 2 models - model1 & model2 i can directly run anova(model1, model2) to find if there is a significant difference between the two.
Is there a way to do the same in SAS.
No, because SAS doesn't store models that way. However, you can run each model (in PROC GLM or whatever) and then compare the results. You can get some of this by looking at the different "types" of error, too.