Predict Survival using RMS package in R? - r

I am using the function survest in the RMS package to generate survival probabilities. I want to be able to take a subset of my data and pass it through survest. I have developed a for loop that does this. This runs and outputs survival probabilities for each set of predictors.
for (i in 1:nrow(df)){
row <- df[i,]
print(row)
surv=survest(fit, row, times=365)
print(surv)
}
My first question is whether there is a way to use survest to predict median survival rather than having to specify a specific time frame, or alternatively is there a better function to use?
Secondly,I want to be able to predict survival using only four of the five predictors of my cox model, for example (as below), while I understand this will be less accurate is it possible to do this using survest?
survest(fit, expand.grid(Years.to.birth =NA, Tumor.stage=1, Date=2000,
Somatic.mutations=2, ttype="brca"), times=300)

To get median survival time, use the Quantile function generator, or the summary.survfit function in the survival package. The function created by Quantile can be evaluated for the 0.5 quantile. It is a function of the linear predict. You'll need to use the predict function on the subset of observations to get the linear predictor value to pass to compute the median.
For your other two questions, survest needs to use the full model you fitted (all the variables). You would need to use multiple imputation if a variable is not available, or a quick approximate refit to the model a la fastbw.

We are trying to do something similar with the missing data.
While MI is a good idea, a simpler idea for a single missing variable is to run the prediction multiple times, and replace the missing variable with a value sampled at random distribution of the missing variable.
E.g. If we have x1, x2 and x3 as predictors, and we want to model when x3 is missing, we run predictions using x1 and x2 and take_random_sample_from(x3), and then averaging the survival times over all of the results.
The problem with reformulating the model (e.g. in this case re-modelling so we only consider x1 and x2) is that it doesn't let you explore the impact of x3 explicitly.
For simple cases this should work - it is essentially averaging the survival prediction for a large range of x3, and therefore makes x3 relatively uninformative.
HTH,
Matt

Related

Fit multiple linear regression without an intercept with the function lm() in R

can you please help with this question in R, i need to get more than one predictor:
Fit multiple linear regression without an intercept with the function lm() to train data
using variable (y.train) as a goal variable and variables (X.mat.train) as
predictors. Look at the vector of estimated coefficients of the model and compare it with
the vector of ’true’ values beta.vec graphically
(Tip: build a plot for the differences of the absolute values of estimated and true values).
i have already tried it out with a code i will post at the end but it give me only one predictor but in this example i need to get more than one predictor:
and i think the wrong one is the first line but i couldn't find a way to fix it :
i can't put the data set here it's large but i have a variable that stores 190 observation from a victor (y.train) and another value that stores 190 observation from a matrix (X.mat.trian).. should give more than one predictor but for me it's giving one..
simple.fit = lm(y.train~0+ X.mat.train) #goal var #intercept # predictor
summary(simple.fit)# Showing the linear regression output
plot(simple.fit)
abline(simple.fit)
n <- summary(simple.fit)$coefficients
estimated_coeff <- n[ , 1]
estimated_coeff
plot(estimated_coeff)
#Coefficients: X.mat.train 0.5018
v <- sum(beta.vec)
#0.5369
plot(beta.vec)
plot(beta.vec, simple.fit)

How to use weights in multivariate linear regression in R with lm?

I've got a linear regression that looks like:
multivariateModel = lm(cbind(y1, y2, y3)~., data=temperature)
I need to do two things with this, which I've found difficult to do. First is to extract the variances, and right now I'm using sigma(multivariateModel), which has returned
y1 y2 y3
31.22918 31.83245 31.01727
I would like to use those 3 sigmas to create variances (sd^2) and weight them against my regression. Currently, weights=cbind(31.22918, 31.83245, 31.01727) is not working, and it's also not working to use matrix 3 columns long with those values repeated.
Here is the dataset in question:
Is there a way to add these as a weighted matrix so that I can get out a fitted model with this, or is there another package I need to use besides lm for this? Thanks.
Here is a link to the dataset: https://docs.google.com/spreadsheets/d/1zm9pPqOnkBdsPekOf8IoXN8yLr82CCFBuc9EtxN5JII/edit?usp=sharing

Obtaining predicted (i.e. expected) values from the orm function (Ordinal Regression Model) from rms package in R

I've run a simple model using orm (i.e. reg <- orm(formula = y ~ x)) and I'm having trouble understanding how to get predicted values for Y. I've never worked with models that use multiple intercepts. I want to know for each and every value of Y in my dataset what the predicted value from the model would be. I tried predict(reg, type="mean") and this produced values that are close to the predicted values from an OLS regression, but I'm not sure if this is what I want. I really just want something analogous to OLS where you can obtain the E(Y) given a set of predictors. If possible, please provide code I can run to do this with a very short explanation.

Determining the degrees of freedom from 'chisq' result in ROCR package

Forgive me for what perhaps may be a simply question. I am relatively new to statistics and R, for that matter.
I am currently performing multiple logistic regression on the following model: E6~S+FA+FR+DV. Where the dependent variable E6 is dichotomous(0,1) and S, FA and FR are ordinal categorical independent variables with scales of 0:7, 1:3 and 1:5 respectively. DV is a dichotomous independent variable (0,1).
As a measure of performance I am currently using the ROCR package in R. Where I have created a prediction object and am then proceeding to produce a performance object using the measure 'chisq' for evaluating the models fit by test of independence.
This is all working fine and I receive the X2 value for the model fit, however the Degrees of Freedom(DF) are not returned. I should mention that the model has 4110 observations.
Is there a simple way of determining the DF for X2 for the above mentioned model. For example, is there a reason why ROCR returns the X2 value, but not the DF?
Any guidance would be great. Thanks in advance.

Pseudo R squared for cumulative link function

I have an ordinal dependent variable and trying to use a number of independent variables to predict it. I use R. The function I use is clm in the ordinal package, to perform a cumulative link function with a probit link, to be precise:
I tried the function pR2 in the package pscl to get the pseudo R squared with no success.
How do I get pseudo R squareds with the clm function?
Thanks so much for your help.
There are a variety of pseudo-R^2. I don't like to use any of them because I do not see the results as having a meaning in the real world. They do not estimate effect sizes of any sort and they are not particularly good for statistical inference. Furthermore in situations like this with multiple observations per entity, I think it is debatable which value for "n" (the number of subjects) or the degrees of freedom is appropriate. Some people use McFadden's R^2 which would be relatively easy to calculate, since clm generated a list with one of its values named "logLik". You just need to know that the logLikelihood is only a multiplicative constant (-2) away from the deviance. If one had the model in the first example:
library(ordinal)
data(wine)
fm1 <- clm(rating ~ temp * contact, data = wine)
fm0 <- clm(rating ~ 1, data = wine)
( McF.pR2 <- 1 - fm1$logLik/fm0$logLik )
[1] 0.1668244
I had seen this question on CrossValidated and was hoping to see the more statistically sophisticated participants over there take this one on, but they saw it as a programming question and dumped it over here. Perhaps their opinion of R^2 as a worthwhile measure is as low as mine?
Recommend to use function nagelkerke from rcompanion package to get Pseudo r-squared.
When your predictor or outcome variables are categorical or ordinal, the R-Squared will typically be lower than with truly numeric data. R-squared merely a very weak indicator about model's fit, and you can't choose model based on this.

Resources