Forest plot from logistf - r

I have run a few models in for the penalized logistic model in R using the
logistf package. I however wish to plot some forest plots for the data.
The sjPlot package : http://www.strengejacke.de/sjPlot/custplot/
gives excellent function for the glm output, but no function for the logistf function.
Any assistance?

The logistf objects differ in their structure compared to glm objects, but not too much. I've added support for logistf-fitted models, however, 1) model summaries can't be printed and b) predicted probability plots are currently not supported with logistf-models.
I'll update the code on GitHub tonight, so you can try the updated sjp.glm function...
library(sjPlot)
library(logistf)
data(sex2)
fit<-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sex2)
# for this example, axisLimits need to be specified manually
sjp.glm(fit, axisLimits = c(0.05, 25), transformTicks = T)

Related

Can I do added variable plots for a beta regression

I am trying to plot the results of a beta regression using added variable plots / partial regression. However, this only seems to be feasible with linear model objects. I am unsing the betareg() function from the betareg package to fit the model and avPlot() from the car package for the visualization of the added variable plot. Is there a way to create added variable plots for beta regression? Thanks a lot!
This is what I have tried so far:
betareg (crop_coverage ~ soil_moisture * weed_coverage, data = df) -> model_a
avPlot(model_a)
Error in UseMethod("avPlot") :
no applicable method for 'avPlot' applied to an object of class "betareg"

computing concordance index with ranger (R package)

I'm trying to use predictions from a random survival forest computed using Ranger to calculate a c-index at specific time points. I know this can be done easily for a coxph model with the following code:
cox_model = coxph(Surv(time, status == 1) ~ ., data = train)
c_index_test <- pec::cindex(cox_model, formula = Cox_model$formula, data=test, eval.times= c(30, 90, 730))
#want to evaluate at 1 month, 3 months, and 2 years
However, although I can calculate a c-index at these time points easily with a random forest generated using rfsrc(), I haven't been able to do it using ranger.
In addition to the pec cindex() function (which doesn't work with objects of class "ranger", I've also tried the concordance.index function (part of the survcomp package) and tried different combinations of using the predict.ranger function to generate survival probability predictions, but nothing has worked.
If anyone can provide code as to how to calculate a the c-index of a ranger RSF (at specific time points and on an external validation set) I would appreciate it immensely!!! I've been able to do it with randomforestSRC but it just takes so long that often my R session will time out and I haven't actually been able to get ANY results with runs having >10 trees...
The ranger packages computes Harrell’s c-index, which is similar to the concordance statistic. If you have a fitted model rf, the attribute prediction.error is equivalent to 1 - Harrell's c-index. Have a look at the following link for more details.

How to extract the value of the loss function of Cox models from glmnet in R?

I fit a given data using Cox model via glmnet R package and my
little R example is:
library(fastcox);data(FHT);attach(FHT) #
library(glmnet)
library(survival)
fit = glmnet(x,Surv(y,status),family="cox",alpha=1)
From the help document, we know glmnet fits penalized models like
-loglik/nobs + λ*penalty
i.e., objective function = loss function + penalty function.
I want to fetch -loglik/nobs (loss function value,
the negative partial log-likelihood of the fitted model
or two term
Taylor series expansions of the log likelihoods) from the fit object.
Any idea? Tks
BTW, we also tried
fit0 = glmnet(x,Surv(y,status),family="cox",alpha=1,lambda=0)
according to -loglik/nobs + λ*penalty, but it shows errors.

PCA in R using the caret package vs prcomp PCA

I have a dataframe data with more than 50 variables and I am trying to do a PCA in R using the caret package.
library(caret)
library(e1071)
trans <- preProcess(data,method=c("YeoJohnson", "center","scale", "pca"))
If I understand this code correctly, it applies a YeoJohnson transformation (because data has zeros in it), standardises data and than applies PCA (by default, the function keeps only the PCs that are necessary to explain at least 95% of the variability in the data)
However, when I use the prcomp command,
model<-prcomp(data,scale=TRUE)
I can get more outputs like printing the summary or doing plot(data, type = "l") which I am not able to do in trans. Does anyone know if there are any functions in caret package producing the same outputs as in prcomp?
You can access the principal components themselves with the predict function.
df <- predict(trans, data)
summary(df)
You won't have exactly the same output as with prcomp: while caret uses prcomp(), it discards the original prcomp class object and does not return it.

Random forest evaluation in R

I am a newbie in R and I am trying to do my best to create my first model. I am working in a 2- classes random forest project and so far I have programmed the model as follows:
library(randomForest)
set.seed(2015)
randomforest <- randomForest(as.factor(goodkit) ~ ., data=training1, importance=TRUE,ntree=2000)
varImpPlot(randomforest)
prediction <- predict(randomforest, test,type='prob')
print(prediction)
I am not sure why I don't get the overall prediction for my model.I must be missing something in my code. I get the OOB and the prediction per case in the test set but not the overall prediction of the model.
library(pROC)
auc <-roc(test$goodkit,prediction)
print(auc)
This doesn't work at all.
I have been through the pROC manual but I cannot get to understand everything. It would be very helpful if anyone can help with the code or post a link to a good practical sample.
Using the ROCR package, the following code should work for calculating the AUC:
library(ROCR)
predictedROC <- prediction(prediction[,2], as.factor(test$goodkit))
as.numeric(performance(predictedROC, "auc")#y.values))
Your problem is that predict on a randomForest object with type='prob' returns two predictions: each column contains the probability to belong to each class (for binary prediction).
You have to decide which of these predictions to use to build the ROC curve. Fortunately for binary classification they are identical (just reversed):
auc1 <-roc(test$goodkit, prediction[,1])
print(auc1)
auc2 <-roc(test$goodkit, prediction[,2])
print(auc2)

Resources