I analyzed my data with 'gbm' R package. My data is based on a cohort study. Therefore, I ran 'gbm' model based on the 'coxph' results.
After constructing a model, I would like to see how this model can predict well. On the other hand, like the code below, the values of prediction are negative. So, I have a trouble understanding this phenomenon.
Please let me know how to interpret this value.
Here's my code.
install.packages("survival")
install.packages("randomForestSRC")
install.packages("gbm")
library(survival)
library(randomForestSRC)
library(gbm)
data(pbc, package="randomForestSRC")
data <- na.omit(pbc)
exposure <- names(data[, names(data.model) !=c("days", "status")])
formula <- as.formula(paste("Surv(days, status)~", paste(exposure, collapse="+")))
set.seed(123)
ex <- gbm(Surv(days, status)~.,
data=data,
distribution="coxph",
cv.folds=5,
shrinkage=.01,
n.trees=1000)
set.seed(123)
pred <- predict(ex, n.trees=1000, type="response")
Read the ?predict.gbm help page, particularly the parameter type. By default predictions are on the link scale.
Related
I am fitting time-to-event survival data using surv.CoxBoost in the mlr package. My question: is there any way to get relative importance for the variables in the fitted model? I have seen this post detailing variablke importance for cvglment but haven't seen any on CoxBoost.
Any idea?
below is an example of a model using CoxBoost`. You may need to install CoxBoost from here as seems no longer on CRAN.
library(randomForestSRC)
library(mlr)
library(survival)
library(CoxBoost)
data(pbc, package="randomForestSRC")
data <- na.omit(pbc)
set.seed(9512)
train <- sample(1:nrow(data), round(nrow(data)*0.7))
data.train <- data[train, ]
data.test <- data[-train, ]
task = makeSurvTask( data=data.train, target=c('days', 'status'))
learner= makeLearner("surv.CoxBoost")
trained.learner=train(learner,task)
CoxBoostfit <- trained.learner$learner.model
CoxBoostfit$coefficients
I would like to analysis my data based on the gradient boosted model.
On the other hand, as my data is a kind of cohort, I have a trouble understanding the result of this model.
Here's my code. Analysis was performed based on the example data.
install.packages("randomForestSRC")
install.packages("gbm")
install.packages("survival")
library(randomForestSRC)
library(gbm)
library(survival)
data(pbc, package="randomForestSRC")
data <- na.omit(pbc)
set.seed(9512)
train <- sample(1:nrow(data), round(nrow(data)*0.7))
data.train <- data[train, ]
data.test <- data[-train, ]
set.seed(9741)
gbm <- gbm(Surv(days, status)~.,
data.train,
interaction.depth=2,
shrinkage=0.01,
n.trees=500,
distribution="coxph")
summary(gbm)
set.seed(9741)
gbm.pred <- predict.gbm(gbm,
n.trees=500,
newdata=data.test,
type="response")
As I read the package documnet, "gbm.pred" is the result of cox's partial likelihood.
set.seed(9741)
lambda0 = basehaz.gbm(t=data.test$days,
delta=data.test$status,
t.eval=sort(data.test$days),
cumulative = FALSE,
f.x=gbm.pred,
smooth=T)
hazard=lambda0*exp(gbm.pred)
In this code, lambda0 is a baseline hazard fuction.
So, according to formula: h(t/x)=lambda0(t)*exp(f(x))
"hazard" is hazard function.
However, what I've wanted to calculte was the "survival function".
Because, I would like to compare the outcome of original data (data$status) to the prediction result (survival function).
Please let me know how to calculate survival function.
Thank you
Actually, the returns is cumulative baseline hazard function(integral part: \int^t\lambda(z)dz), and survival function can be computed as below:
s(t|X)=exp{-e^f(X)\int^t\lambda(z)dz}
f(X) is prediction of gbm, which is equal to log-hazard proportion.
I think this tutorial about gbm-based survival analysis would help to u!
https://github.com/liupei101/Tutorial-Machine-Learning-Based-Survival-Analysis/blob/master/Tutorial_Survival_GBM.ipynb
I analyzed cox proportional hazard regression with example data.
My R code is below.
install.packages("randomForestSRC")
install.packages("survival")
library(randomForestSRC)
library(survival)
data(pbc, package="randomForestSRC")
pbc.na <- na.omit(pbc)
surv.f <- as.formula(Surv(days, status) ~ .)
cox.obj <- coxph(surv.f, data=pbc.na)
cox.obj
cox.obj$linear.predictors
What I want to know is that "linear.predictors".
All of the objects in my data seem to have this index, but I have no idea exactly.
Please let me know what it is.
Thans always.
I am using the RTextTools package to create a Document Term Matrix, before using the associated container in range of classification models.
I have reviewed the package information and associated articles but I cannot find any indication on how to plot the results of tuning and predicting the classification models. For example, I am building a linear SVM model using svm.fit <- train_model(container, "SVM", kernel="linear", cost=1). How can I visualise svm.fit?
I am ideally looking to obtain similar results as if I was using plot.svm from the e1071 package. However, I cannot use this here as the class(container) is matrix_container and not an expected data frame.
The code that I am utilising is below. Thanks for your help.
#Create Training Container#
dtMatrix <- create_matrix(cbind.data.frame(Train.df$Keyword1, Train.df$Keyword2), removeSparseTerms=.998)
Train_container <- create_container(dtMatrix, Train.df$Result, trainSize=1:10000, virgin=FALSE)
#Create Validation Container#
trace("create_matrix", edit=T)
Validate_dtMatrix <- create_matrix(cbind.data.frame(Validate.df$Keyword1, Validate.df$Keyword2), originalMatrix=dtMatrix)
predSize = nrow(Validate.df)
ValidateContainer <- create_container(Validate_dtMatrix, labels=rep(0,predSize), testSize=1:predSize, virgin=FALSE)
#===SUPPORT VECTOR MACHINE===#
svm_linear <- train_model(container, "SVM", kernel="linear", cost=1)
predict_SVM.Linear <- classify_model(ValidateContainer, svm_linear)
I am looking for some guidance on a homework assignment I am working on for a class. We are given a dataset with 14K observations and we are asked to build a prediction model. I subset the dataset into training and testing (4909 observations), here I am using the caret package, which predicts the last variable "classe". I pulled out the near zero variables and built the model but when I tried to do predictions I only get 97 predictions back. I reviewed the help files but still can't figure out where I am going wrong. Any hints would be appreciated.
Here is the Code:
set.seed(1234)
pml.training <- read.csv("./data/pml-training.csv")
#
library(caret)
inTrain <- createDataPartition(y=pml.training$classe, p=0.75, list=FALSE)
training <- pml.training[inTrain,]
testing <- pml.training[-inTrain,]
# Pull out the Near Zero Value (NZV)
nzv <- nearZeroVar(training, saveMetrics=TRUE)
omit <- which(nzv$nzv==TRUE)
training <- training[,-omit]
testing <- testing[,-omit]
# Fit the model
modFit <- train(classe ~., method="rf", data=training)
modFit
print(modFit$finalModel)
plot(modFit)
# Try and predict on the testing model
pred <- predict(modFit, newdata=testing)
testing$predRight <- pred==testing$classe
print(table(pred, testing$classe))
Thanks, Pat C.
Have you checked
sum(complete.cases(subset(testing, select = -classe)))
?