Linear predictor from coefficients of Cox PH model - r

I need to calculate the linear predictor of a Cox PH model by hand.
I can get continuous and binary variables to match the output of predict.coxph (specifying 'lp') but I can't seem to figure out how to calculate it for categorical variables with more than 2 levels.
My aim is to assess calibration of a published model in my own data-I only have coefficients so need to be able to do this by hand.
This previous post describes how to calculate for continuous variables...
(Coxph predictions don't match the coefficients)
Any advice would be appreciated! Thanks

Related

mnrval Matlab function in R

Can someone please help me with the equivalent of the mnrval function in R? I have not been able to find one where predicted probabilities are returned based on arguments, coefficient estimates and predictor values. I tried to rewrite the Matlab function in R but was unable to because one of the inner functions that was used was private. I would highly appreciate your help on this.
The documentation page on mnrval() states
MNRVAL Predict values for a nominal or ordinal multinomial regression model.
PHAT = MNRVAL(B,X) computes predicted probabilities for the nominal
multinomial logistic regression model with predictor values X. B is the
intercept and coefficient estimates as returned by the MNRFIT function. X
is an N-by-P design matrix with N observations on P predictor variables.
MNRVAL automatically includes intercept (constant) terms in the model; do
not enter a column of ones directly into X. PHAT is an N-by-K matrix of
predicted probabilities for each multinomial category.

Decision for category based on predicted probabilities of an ordinal logistic regression

Is there a simple way to decide for a category based on the predicted probabilities of an ordinal logistic regression?
In the binary case, I have so far set the criterion based on the distribution of the base data set. But this is not possible in the ordinal case.

Logistic regression without any outcome data

I am trying to perform logistic regression on data that contains a binary outcome. However, I do not have access to the outcome data.
I've calculated probabilities of a "1" outcome for each subject by assigning "risk points" to certain values of each variable and adding them up for each subject, so that the probability of a "1" is (sum of subject's risk points) / (total number of possible risk points). I then took the log of the odds ratio to calculate the logit, so I have a list of logit values between -3 and 2 for each subject.
However, I would like to use logistic regression to evaluate which variables have the greatest effect on the outcome probabilities. Is there a way in R to perform logistic regression using only the predictive variables and logit, without the binary outcome data? I have tried using glm and it does not work, because in order to do logisitic regression you need binary outcome data.
Thank you!

Use FE estimates for OLS

I am analyzing a panel data set and I am interested in some time-independent explanatory variables(z). The Hausmann Test shows that I should use a fixed effects model instead of a random effects model.
Downside is, that the model will not estimate any coefficients for the time-independent explanatory variables.
So one idea is to take the estimated coefficients(b) for the time-dependent variables(x) from the FE model and use them on the raw data which means, take out the effects from the already estimated explanatory variables. Then use this corrected values as dependent variable for an OLS model with the time-independent variables as explanatory variables. This leads to:
y - x'b = z'j + u (with j as coefficients of interesst)
Do these two models exclude each other with any necessary assumption or is it just that the standard errors of the OLS model need to be corrected?
Thanks for every hint!

Logistic Regression Model & Multicolinearity of Categorical Variables in R

I have a training dataset that has 3233 rows and 62 columns. The independent variable is Happy (train$Happy), which is a binary variable. The other 61 columns are categorical independent variables.
I've created a logistic regression model as follows:
logModel <- glm(Happy ~ ., data = train, family = binary)
However, I want to reduce the number of independent variables that go into the model, perhaps down to 20 or so. I would like to start by getting rid of colinear categorical variables.
Can someone shed some light on how to determine which categorical variables are colinear and what threshold that I should use when removing a variable from a model?
Thank you!
if your variables were categorical then the obvious solution would be penalized logistic regression (Lasso) in R it is implemented in glmnet.
With categorical variables the problem is much more difficult.
I was in a similar situation and I used the importance plot from the package random forest in order to reduce the number of variables.
This would not help you to find collinearity but only to rank the variables by importance.
You have only 60 variable and maybe you have a knowledge of the field so you can try to add to you model some variables that makes sense to you (like z=x1-x3 if you think that the value x1-x3 is important.) and then rank them according to a random forest model
You could use Cramer's V, or the related Phi or contingency coefficient (see a great paper at http://www.harding.edu/sbreezeel/460%20files/statbook/chapter15.pdf), to measure colinearity among categorical variables. If two or more categorical variables have a Cramer's V value close to 1, it means they're highly "correlated" and you may not need to keep all of them in your logistic regression model.

Resources