I am trying to perform logistic regression on data that contains a binary outcome. However, I do not have access to the outcome data.
I've calculated probabilities of a "1" outcome for each subject by assigning "risk points" to certain values of each variable and adding them up for each subject, so that the probability of a "1" is (sum of subject's risk points) / (total number of possible risk points). I then took the log of the odds ratio to calculate the logit, so I have a list of logit values between -3 and 2 for each subject.
However, I would like to use logistic regression to evaluate which variables have the greatest effect on the outcome probabilities. Is there a way in R to perform logistic regression using only the predictive variables and logit, without the binary outcome data? I have tried using glm and it does not work, because in order to do logisitic regression you need binary outcome data.
Thank you!
Related
Is there a simple way to decide for a category based on the predicted probabilities of an ordinal logistic regression?
In the binary case, I have so far set the criterion based on the distribution of the base data set. But this is not possible in the ordinal case.
I'm an R newbie. I'm using "clogitL1" to run a regularized conditional logistic regression for a matched case-control study with 1021 independent variables (metabolites). I'm not able to extract the regression coefficients. I've tried summary(x), coef(x), coefficient(x), x$beta - none of them work. I'm able to run it OK, and if I follow it with "cv.clogitL1" I can extract the cross-validated estimated coefficients, but not the estimated coefficients for the original model. Here's some of my code:
strata=sort(data.meta$MATCHED_NEW)
condlog <- clogitL1(y=data.meta$BCR, x=data.meta$ln_metab[, data.features ], strata,
numLambda=100, minLambdaRatio=0.000001, alpha = 1.0)
"strata" is a vector indicating pairing of cases & controls.
"data.meta$BCR" is a vector inicating case or control status
"data.meta$ln_metab" is a matrix with observations as rows and metabolite levels as columns
"data.features" is a vector indicating which metabolites passed several dimension reduction filters.
Appreciate any suggestions.
I need to calculate the linear predictor of a Cox PH model by hand.
I can get continuous and binary variables to match the output of predict.coxph (specifying 'lp') but I can't seem to figure out how to calculate it for categorical variables with more than 2 levels.
My aim is to assess calibration of a published model in my own data-I only have coefficients so need to be able to do this by hand.
This previous post describes how to calculate for continuous variables...
(Coxph predictions don't match the coefficients)
Any advice would be appreciated! Thanks
I have conducted a logistic regression on a binary dependent variable and 5 independent variables. The dataframe I drew these variables from is survey data asking whether a person has voted for or against a policy change (binary dependent variable), with the other variables being questions regarding their income, location and other such personal information that could inform whether they would vote for or against the vote.
Having conducted the regression, I'd now like to calculate the predicted probability that each person would have voted yes/no to see how informative those variables are. In total my dataframe has information on 3000 people and I'd like to calculate the predicted probability of voting for/against for every single row/person.
What methods are available for doing so?
Appreciate the help!
You can use the predict function in order to calculate the predicted probabilities.
predict(model, newdata, type="response")
With model our logistic regression (the result of the glm() function), newdata a dataset which contains all the variables defined in our model and for all the individuals for which you want a probability.
I’m working with the software R and XLStat. I’ve conducted an one-way ANOVA (my categorical variable is 3 modal (1,2,3) and my response variable is quantitative on scale 1-10).
I’ve conducted this ANOVA on R and XLStat and the outputs for the F fisher, p-value, coefficient estimations, t-values, std error … are exactly the same.
However, XLstat offers an extra output : the standardized coefficients (called too beta coefficients). Firstly, I was surprised, because I didn’t think we could calculate beta coefficient for a categorical variable and according to the bibliography I read, it doesn’t have any sense.
Anyway, I tried to find these coefficients with R, thanks to the unique formula I found : beta = estimate * sd(x)/sd(y). sd(x) being the standard deviation of the categorical variable (which is automatically transformed as numeric variable with R, in order to calculate sd(x), seems logical ) and sd(y) being the standard deviation of my response variable.
The first beta I obtained with R is the same than in XLstat , but not the second and the third. Given that the first one is the same with R and XLStat, I suppose that Xlstat convert too the categorical variable in numeric variable (which is senseless but this is not the question).
Moreover, I conducted the anova on Statistica in order to see if XLStat did any mistake but its outputs for the beta coefficients are the same than in Xlstat …
So, my question is this one : what is the formula to obtain the beta coefficient in a one way Anova ?
Then, I would like to ask you about the relevance of these beta coefficients for a categorical variable. According to my thoughts and publications I read, it doesn't have any sense …
ps contrasts in R and Xlstat are sum(ai)=0. For beta coefficients, XLStat remove the intercept. I guess this fact could be important but I don't know somehow
The formula for obtaining beta coefficients from metric coefficients for an ANOVA is the same as for a linear regression. The coefficients have no sensible interpretation (for categorical variables), but standardized coefficients are useful in comparing the relative effects of IVs with different metrics.
In R, either use scale() to transform the data to z-scores before fitting the model, or use lm.beta() instead of lm().
It is not clear why you would obtain different beta coefficients with XLStat, but it could have something to do with degrees of freedom if it's not an error. This example compares the lm.beta() function in R with SAS and obtains the same coefficients.