Relative variable importance from CoxBoost - r

I am fitting time-to-event survival data using surv.CoxBoost in the mlr package. My question: is there any way to get relative importance for the variables in the fitted model? I have seen this post detailing variablke importance for cvglment but haven't seen any on CoxBoost.
Any idea?
below is an example of a model using CoxBoost`. You may need to install CoxBoost from here as seems no longer on CRAN.
library(randomForestSRC)
library(mlr)
library(survival)
library(CoxBoost)
data(pbc, package="randomForestSRC")
data <- na.omit(pbc)
set.seed(9512)
train <- sample(1:nrow(data), round(nrow(data)*0.7))
data.train <- data[train, ]
data.test <- data[-train, ]
task = makeSurvTask( data=data.train, target=c('days', 'status'))
learner= makeLearner("surv.CoxBoost")
trained.learner=train(learner,task)
CoxBoostfit <- trained.learner$learner.model
CoxBoostfit$coefficients

Related

How can I calculate survival function in gbm package analysis?

I would like to analysis my data based on the gradient boosted model.
On the other hand, as my data is a kind of cohort, I have a trouble understanding the result of this model.
Here's my code. Analysis was performed based on the example data.
install.packages("randomForestSRC")
install.packages("gbm")
install.packages("survival")
library(randomForestSRC)
library(gbm)
library(survival)
data(pbc, package="randomForestSRC")
data <- na.omit(pbc)
set.seed(9512)
train <- sample(1:nrow(data), round(nrow(data)*0.7))
data.train <- data[train, ]
data.test <- data[-train, ]
set.seed(9741)
gbm <- gbm(Surv(days, status)~.,
data.train,
interaction.depth=2,
shrinkage=0.01,
n.trees=500,
distribution="coxph")
summary(gbm)
set.seed(9741)
gbm.pred <- predict.gbm(gbm,
n.trees=500,
newdata=data.test,
type="response")
As I read the package documnet, "gbm.pred" is the result of cox's partial likelihood.
set.seed(9741)
lambda0 = basehaz.gbm(t=data.test$days,
delta=data.test$status,
t.eval=sort(data.test$days),
cumulative = FALSE,
f.x=gbm.pred,
smooth=T)
hazard=lambda0*exp(gbm.pred)
In this code, lambda0 is a baseline hazard fuction.
So, according to formula: h(t/x)=lambda0(t)*exp(f(x))
"hazard" is hazard function.
However, what I've wanted to calculte was the "survival function".
Because, I would like to compare the outcome of original data (data$status) to the prediction result (survival function).
Please let me know how to calculate survival function.
Thank you
Actually, the returns is cumulative baseline hazard function(integral part: \int^t\lambda(z)dz), and survival function can be computed as below:
s(t|X)=exp{-e^f(X)\int^t\lambda(z)dz}
f(X) is prediction of gbm, which is equal to log-hazard proportion.
I think this tutorial about gbm-based survival analysis would help to u!
https://github.com/liupei101/Tutorial-Machine-Learning-Based-Survival-Analysis/blob/master/Tutorial_Survival_GBM.ipynb

Why are the predict values of gbm (R package) negative?

I analyzed my data with 'gbm' R package. My data is based on a cohort study. Therefore, I ran 'gbm' model based on the 'coxph' results.
After constructing a model, I would like to see how this model can predict well. On the other hand, like the code below, the values of prediction are negative. So, I have a trouble understanding this phenomenon.
Please let me know how to interpret this value.
Here's my code.
install.packages("survival")
install.packages("randomForestSRC")
install.packages("gbm")
library(survival)
library(randomForestSRC)
library(gbm)
data(pbc, package="randomForestSRC")
data <- na.omit(pbc)
exposure <- names(data[, names(data.model) !=c("days", "status")])
formula <- as.formula(paste("Surv(days, status)~", paste(exposure, collapse="+")))
set.seed(123)
ex <- gbm(Surv(days, status)~.,
data=data,
distribution="coxph",
cv.folds=5,
shrinkage=.01,
n.trees=1000)
set.seed(123)
pred <- predict(ex, n.trees=1000, type="response")
Read the ?predict.gbm help page, particularly the parameter type. By default predictions are on the link scale.

How to plot SVM classifier using RTextTools package?

I am using the RTextTools package to create a Document Term Matrix, before using the associated container in range of classification models.
I have reviewed the package information and associated articles but I cannot find any indication on how to plot the results of tuning and predicting the classification models. For example, I am building a linear SVM model using svm.fit <- train_model(container, "SVM", kernel="linear", cost=1). How can I visualise svm.fit?
I am ideally looking to obtain similar results as if I was using plot.svm from the e1071 package. However, I cannot use this here as the class(container) is matrix_container and not an expected data frame.
The code that I am utilising is below. Thanks for your help.
#Create Training Container#
dtMatrix <- create_matrix(cbind.data.frame(Train.df$Keyword1, Train.df$Keyword2), removeSparseTerms=.998)
Train_container <- create_container(dtMatrix, Train.df$Result, trainSize=1:10000, virgin=FALSE)
#Create Validation Container#
trace("create_matrix", edit=T)
Validate_dtMatrix <- create_matrix(cbind.data.frame(Validate.df$Keyword1, Validate.df$Keyword2), originalMatrix=dtMatrix)
predSize = nrow(Validate.df)
ValidateContainer <- create_container(Validate_dtMatrix, labels=rep(0,predSize), testSize=1:predSize, virgin=FALSE)
#===SUPPORT VECTOR MACHINE===#
svm_linear <- train_model(container, "SVM", kernel="linear", cost=1)
predict_SVM.Linear <- classify_model(ValidateContainer, svm_linear)

Predict probabilities with bigrf

I am able to build a model with bigrf() package, but is there a way to predict probabilities instead of classes? For class prediction I use
predictions <- predict(forest, test, testset$y)
forest is a model. I tried type = "prob" but does not do anything. Is there a way to do this?
I have big data, so I need to use this package in order to be able to process it.
UPD:
library(bigrf)
library(randomForest)
data("iris")
iris <- iris[iris$Species != "virginica",]
x <- iris[,1:4]
y <- iris$Species
vars <- c(1:4)
s = sample(1:nrow(x), 60)
registerDoParallel(cores=detectCores(all.tests=TRUE))
forest <- bigrfc(x[s, ], y[s], ntree=5L, varselect=vars)
predictions <- predict(forest, x[-s, ])
So, the question is how to get probabilities in predictions instead of classes from object class bigrfc?
According to this post, it should be possible to obtain the class probabilities with
predictions_probs <- predictions#testvotes/rowSums(predictions#testvotes)
I haven't tested it though. HTH.

Random Forest Predictions

I am looking for some guidance on a homework assignment I am working on for a class. We are given a dataset with 14K observations and we are asked to build a prediction model. I subset the dataset into training and testing (4909 observations), here I am using the caret package, which predicts the last variable "classe". I pulled out the near zero variables and built the model but when I tried to do predictions I only get 97 predictions back. I reviewed the help files but still can't figure out where I am going wrong. Any hints would be appreciated.
Here is the Code:
set.seed(1234)
pml.training <- read.csv("./data/pml-training.csv")
#
library(caret)
inTrain <- createDataPartition(y=pml.training$classe, p=0.75, list=FALSE)
training <- pml.training[inTrain,]
testing <- pml.training[-inTrain,]
# Pull out the Near Zero Value (NZV)
nzv <- nearZeroVar(training, saveMetrics=TRUE)
omit <- which(nzv$nzv==TRUE)
training <- training[,-omit]
testing <- testing[,-omit]
# Fit the model
modFit <- train(classe ~., method="rf", data=training)
modFit
print(modFit$finalModel)
plot(modFit)
# Try and predict on the testing model
pred <- predict(modFit, newdata=testing)
testing$predRight <- pred==testing$classe
print(table(pred, testing$classe))
Thanks, Pat C.
Have you checked
sum(complete.cases(subset(testing, select = -classe)))
?

Resources