I want to have a classification table for logistic regression using lrm function in rms package and then plot the roc curve.I have perfomed this using glm function.Example code
train<-sample(dim(data)[1],.8*dim(data)[1]) #80-20 training/test
datatrain<-data[train,]
datatest<-data[-train,]
fit<-glm(Target ~ ., data=datatrain,family=binomial()) #Target is 0/1 variable
prob=predict(fit,type=c("response"),datatest)
datatest$prob=prob
library(pROC)
ROC <- roc(Target==1 ~ prob, data = datatest)
plot(ROC)
confusion<-table(prob>0.5,datatest$Target)
errorrate<-sum(diag(confusion))/sum(confusion)
errorrate
How to get the confusion matrix using lrm function?
The lrm function returns a fit object that inherits from the glm-class. That is not explicitly stated in the lrm help page, but it's easy enough to verify. After running the setup code in the first example on the ?lrm page
> f <- lrm(ch ~ age)
> class(f)
[1] "lrm" "rms" "glm"
So you should be able to use the ordinary predict method you were using above. Prof Harrell advises against using split-sample validation and the use of ROC curves for model comparison. He provides mechanisms for better methods in his package.
Related
I have unbalanced panel data and I want to fit this type of regression:
Pr(y=1|xB) = G(xB+a)
where "y" is a binary variable, "x" vector of explanatory variables and "B" my coeff.
I want to implement random effect model with maximum likelihood estimation, however I didn't understand what I need to change in the plm function (of package plm) CRAN guide (vignette). As far I used this code:
library(plm)
p_finale <- plm.data(p_finale, index=c("idnumber","Year"))
attach(p_finale)
y <- (TotalDebt_dummy)
X_tot <- cbind(Size,ln_Age,liquidity,Asset_Tangibility,profitability,growth, sd_cf_risk1, family_dummy,family_manager,
sd_cf_risk1*family_dummy,
Ateco_A,Ateco_C,Ateco_D,Ateco_E,Ateco_F,Ateco_G,Ateco_H,Ateco_I,Ateco_J,Ateco_M,Ateco_N,
Ateco_Q,Ateco_R)
model1 <- plm(y~X_tot+factor(Year),data = p_finale, model="random")
I included the whole code, but the only thing I believe needs to be changed is the last row in plm.
Function plm from package plm does not use a maximum-likelihood approach for model estimation. It uses a GLS approach as is common in econometrics.
Please see the section about plm versus nlme and lme4 in the package's first vignette ("Panel data econometrics with R: the plm package" (https://cran.rstudio.com/web/packages/plm/vignettes/A_plmPackage.html). The section explains the differences between the appraoches and has code examples for boths (and refers to packages nlme and lme4 for the maximum-likelihood approach).
I estimated a robust mixed effect model with the rlmercommand from the robustlmmpackage. Is there a way to obtain the marginal and conditional R^2 values?
Just going to answer that myself. I could not find a package or rather a function in R that is equivalent to e.g. r.squaredGLMM in the case of lmerMod objects but I found a quick workaround that works with rlmerMod objects. Basically you just have to extract the variance components for the fixed effects, random effects and residuals and then manually calcualte the marginal and conditional R^2 based on the formula provided by Nakagawa & Schielzeth (2013).
library(robustlmm)
library(insight)
library(lme4)
data(Dyestuff, package = "lme4")
robust.model <- rlmer(Yield ~ 1|Batch, data=Dyestuff)
var.fix <- get_variance_fixed(robust.model)
var.ran <- get_variance_random(robust.model)
var.res <- get_variance_residual(robust.model)
R2m = var.fix/(var.fix+var.ran+var.res)
R2c = (var.fix+var.ran)/(var.fix+var.ran+var.res)
Literature:
Nakagawa, S. and Schielzeth, H. (2013), A general and simple method for obtaining R2 from generalized linear mixed‐effects models. Methods Ecol Evol, 4: 133-142. doi:10.1111/j.2041-210x.2012.00261.x
Using the rms package of Frank Harrell I constructed a predictive model using the lrm function.
I want to compare if this model has a significant better predictive value on a binomial event in comparison with another (lrm-) model.
I used different functions like anova(model1, model2) or the pR2 function of the pscl library to compare the pseudo R^2, but they all don't work with the lrm based model.
What can I do best to see if my new model is significant beter than the earlier model?
Update: Here is a example (where I want to predict the chance on bone metastasis) to check if size or stage (in addition to other variabele) gives the best model:
library(rms)
getHdata(prostate)
ddd <- datadist(prostate)
options( datadist = "ddd" )
mod1 = lrm(as.factor(bm) ~ age + sz + rx, data=prostate, x=TRUE, y=TRUE)
mod2 = lrm(as.factor(bm) ~ age + stage + rx, data=prostate, x=TRUE, y=TRUE)
It seems fundamentally the question is about comparing two non-nested models.
If you fit your models using the glm function you can use the -vuong- function in -pscl- package.
To test the fit of 2 nested models, you can use the lrtest function from the "rms" package.
lrtest(mod1,mod2)
I fit a given data using Cox model via glmnet R package and my
little R example is:
library(fastcox);data(FHT);attach(FHT) #
library(glmnet)
library(survival)
fit = glmnet(x,Surv(y,status),family="cox",alpha=1)
From the help document, we know glmnet fits penalized models like
-loglik/nobs + λ*penalty
i.e., objective function = loss function + penalty function.
I want to fetch -loglik/nobs (loss function value,
the negative partial log-likelihood of the fitted model
or two term
Taylor series expansions of the log likelihoods) from the fit object.
Any idea? Tks
BTW, we also tried
fit0 = glmnet(x,Surv(y,status),family="cox",alpha=1,lambda=0)
according to -loglik/nobs + λ*penalty, but it shows errors.
I have run a few models in for the penalized logistic model in R using the
logistf package. I however wish to plot some forest plots for the data.
The sjPlot package : http://www.strengejacke.de/sjPlot/custplot/
gives excellent function for the glm output, but no function for the logistf function.
Any assistance?
The logistf objects differ in their structure compared to glm objects, but not too much. I've added support for logistf-fitted models, however, 1) model summaries can't be printed and b) predicted probability plots are currently not supported with logistf-models.
I'll update the code on GitHub tonight, so you can try the updated sjp.glm function...
library(sjPlot)
library(logistf)
data(sex2)
fit<-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sex2)
# for this example, axisLimits need to be specified manually
sjp.glm(fit, axisLimits = c(0.05, 25), transformTicks = T)