Obtaining predicted (i.e. expected) values from the orm function (Ordinal Regression Model) from rms package in R - r

I've run a simple model using orm (i.e. reg <- orm(formula = y ~ x)) and I'm having trouble understanding how to get predicted values for Y. I've never worked with models that use multiple intercepts. I want to know for each and every value of Y in my dataset what the predicted value from the model would be. I tried predict(reg, type="mean") and this produced values that are close to the predicted values from an OLS regression, but I'm not sure if this is what I want. I really just want something analogous to OLS where you can obtain the E(Y) given a set of predictors. If possible, please provide code I can run to do this with a very short explanation.

Related

Regression result showing just an intercept with no variable estimator

I was trying to fit a regression modell with several independent variables. When checking the summary() of that model I saw that one variable's estimator didnt show up. So, I tried to fit a model with just that independent variable that didn't show up, which you can see in the sample code below. I changed the variable names for an easier understanding. But basically what happened is that for this variable somehow no estimator is being calculated and it just shows the intercept. In other regressions the variable worked fine and showed an estimator. So I don't know why this happens here. I have a panel dataset, in case this matters and variable Y value changes from datapoint to datapoint. So it's not just a constant.
Does anyone have an idea why this happens?
Sample code:
> TestFit = plm(Y ~ X, data = dataset, model = "between", index = c("Index", "DatesNum"))
> TestFit
Model Formula: Y ~ X
Coefficients:
(Intercept)
0.00014546
You'd better show your dataset (e.g. de-identified if necessary) so people can better answer your question.
If you define Y and X outside plm(), you probably don't need plm.
When you have data = dataset, Y and X should be the column name of your model.
Does changing plm to lm work?

Predict Survival using RMS package in R?

I am using the function survest in the RMS package to generate survival probabilities. I want to be able to take a subset of my data and pass it through survest. I have developed a for loop that does this. This runs and outputs survival probabilities for each set of predictors.
for (i in 1:nrow(df)){
row <- df[i,]
print(row)
surv=survest(fit, row, times=365)
print(surv)
}
My first question is whether there is a way to use survest to predict median survival rather than having to specify a specific time frame, or alternatively is there a better function to use?
Secondly,I want to be able to predict survival using only four of the five predictors of my cox model, for example (as below), while I understand this will be less accurate is it possible to do this using survest?
survest(fit, expand.grid(Years.to.birth =NA, Tumor.stage=1, Date=2000,
Somatic.mutations=2, ttype="brca"), times=300)
To get median survival time, use the Quantile function generator, or the summary.survfit function in the survival package. The function created by Quantile can be evaluated for the 0.5 quantile. It is a function of the linear predict. You'll need to use the predict function on the subset of observations to get the linear predictor value to pass to compute the median.
For your other two questions, survest needs to use the full model you fitted (all the variables). You would need to use multiple imputation if a variable is not available, or a quick approximate refit to the model a la fastbw.
We are trying to do something similar with the missing data.
While MI is a good idea, a simpler idea for a single missing variable is to run the prediction multiple times, and replace the missing variable with a value sampled at random distribution of the missing variable.
E.g. If we have x1, x2 and x3 as predictors, and we want to model when x3 is missing, we run predictions using x1 and x2 and take_random_sample_from(x3), and then averaging the survival times over all of the results.
The problem with reformulating the model (e.g. in this case re-modelling so we only consider x1 and x2) is that it doesn't let you explore the impact of x3 explicitly.
For simple cases this should work - it is essentially averaging the survival prediction for a large range of x3, and therefore makes x3 relatively uninformative.
HTH,
Matt

Can/Should I use the output of a log-linear model as the predictors in a logistic regression model?

I have a data set with both continuous and categorical variables. In the end I want to build a logistic regression model to calculate the probability of a response dichotomous variable.
Is it acceptable, or even a good idea, to apply a log linear model to the categorical variables in the model to test their interactions, and then use the indicated interactions as predictors in the logistic model?
Example in R:
Columns in df: CategoricalA, CategoricalB, CategoricalC, CategoricalD, CategoricalE, ContinuousA, ContinuousB, ResponseA
library(MASS)
#Isolate categorical variables in new data frame
catdf <- df[,c("CategoricalA","CategoricalB","CategoricalC", "CategoricalD", "CategoricalE")]
#Create cross table
crosstable <- table(catdf)
#build log-lin model
model <- loglm(formula = ~ CategoricalA * CategoricalB * CategoricalC * CategoricalD * CategoricalE, data = crosstable)
#Use step() to build better model
automodel <- step(object = model, direction = "backward")
Then build a logistic regresion using the output of automodeland the values of ContinuousA and ContinuousB in order to predict ResponseA (which is binary).
My hunch is that this is not ok, but I cant find the answer definitively one way or the other.
Short answer: Yes. You can use any information in the model that will be available in out-of-time or 'production' run of the model. Whether this information is good, powerful, significant, etc. is a different question.
The logic is that a model can have any type of RHS variable, be it categorical, continuous, logical, etc. Furthermore, you can combine RHS variables to create one RHS variable and also apply transformations. The log linear model of categorical is nothing by a transformed linear combination of raw variables (that happen to be categorical). This method would not be violating any particular modeling framework.

R, Multinomial Regression: How to Find Conditional Probabilities?

In R, given a multinomial linear logit regression, I would need to obtain the conditional probability given some values of the predictors.
For example, using the function multinom from the package nnet, imagine to have computed fit <- multinom(response ~ predictor). From fit, how can I obtain the probability weights of the different response classes, given a certain value of the predictor?
I thought of using something like predict(fit,newdata,type=???), but I have no idea about how to continue.
I found a possible solution: predict(fit, newdata = predictor, "probs"). In this way, I was able to find the probability weights for all the values of the predictor: every row corresponds to a certain value.

How to fit a model I built to another data set and get residuals?

I fitted a mixed model to Data A as follows:
model <- lme(Y~1+X1+X2+X3, random=~1|Class, method="ML", data=A)
Next, I want to see how the model fits Data B and also get the estimated residuals. Is there a function in R that I can use to do so?
(I tried the following method but got all new coefficients.)
model <- lme(Y~1+X1+X2+X3, random=~1|Class, method="ML", data=B)
The reason you are getting new coefficients in your second attempt with data=B is that the function lme returns a model fitted to your data set using the formula you provide, and stores that model in the variable model as you have selected.
To get more information about a model you can type summary(model_name). the nlme library includes a method called predict.lme which allows you to make predictions based on a fitted model. You can type predict(my_model) to get the predictions using the original data set, or type predict(my_model, some_other_data) as mentioned above to generate predictions using that model but with a different data set.
In your case to get the residuals you just need to subtract the predicted values from observed values. So use predict(my_model,some_other_data) - some_other_data$dependent_var, or in your case predict(model,B) - B$Y.
You model:
model <- lme(Y~1+X1+X2+X3, random=~1|Class, method="ML", data=A)
2 predictions based on your model:
pred1=predict(model,newdata=A,type='response')
pred2=predict(model,newdata=B,type='response')
missed: A function that calculates the percent of false positives, with cut-off set to 0.5.
(predicted true but in reality those observations were not positive)
missed = function(values,prediction){sum(((prediction > 0.5)*1) !=
values)/length(values)}
missed(A,pred1)
missed(B,pred2)

Resources