How to detect Heteroscedasticity in Random Foreest Model? - r

I am working on a regression model in Random Forest, I want to judge whether there is heteroscedasticity in the model or not?
When I am developing Linear Model I can see that there is heteroscedasticity and the curve looks like below graph, I want to check similar residual plot for Random Forest Model.
I am working in R.
It's an Expense Model basis Income,Branch,TotalFamilyMember

We can recreate the plot with the residuals from the predicted values:
#Using the regression example from ?randomForest
ozone.rf <- randomForest(Ozone ~ ., data=airq, mtry=3,
importance=TRUE)
#Find residuals by subtracting predicted from acutal values
err <- ozone.rf$predicted - airq$Ozone
#Make data frame holding residuals and fitted values
df <- data.frame(Residuals=err, Fitted.Values=ozone.rf$predicted)
#Sort data by fitted values
df2 <- df[order(df$Fitted.Values),]
#Create plot
plot(Residuals~Fitted.Values, data=df2)
#Add origin line at (0,0) with grey color #8
abline(0,0, col=8)
#Add the same smoothing line from lm regression with color red #2
lines(lowess(df2$Fitted.Values, df2$Residuals), col=2)
Update
There is a much easier way. I realized that the plot is just a regression of residuals and fitted values, therefore this gives the same output:
fitted.values <- ozone.rf$predicted
residuals <- fitted.values - ozone.rf$y
plot(lm(residuals ~ fitted.values), which=1)

Related

How to obtain the QQ plot of a spline model R

I have a model that I've fitted using splines:
ssfit.3 <- smooth.spline(anage$lifespan ~ log(anage$Metabolic.by.mass),
df = 3)
I'm trying to obtain the model diagnostics such as the residual plot and the QQ plot for this model. I know for a linear model you can do
plot(lm)
which outputs all the different plots. How can I do this with spline models since plot(ssfit.3) does not output the same?
Extract the residuals and use qqnorm()/qqline().
example(smooth.spline) ## to get a model to work with
qqnorm(residuals(s2m))
qqline(residuals(s2m))

Plotting Single Covariate Regression Line in a Multivariate Model

I am able to create a multivariate linear regression model using
lmex = lm(overweight$h_egfr_cystc96 ~ overweightlogblood + overweight$age_96, data = overweight)
Which returns values for the intercept, estimate, p-value, etc.
I want to plot a single regression line for one of my variables: overweightlogblood
If I use
ggplot(overweight,aes(y=h_egfr_cystc96,x=overweightlogblood))+geom_point()+geom_smooth(method="lm")
It gives me a nice plot, but this is for the univariate model. I would like the plot to feature a regression line (with 95% CI) for the intercept and estimate of a single covariate in a multivariate model. Any ideas?
Thank you in advance!

plot multiple ROC curves for logistic regression model in R

I have a logistic regression model (using R) as
fit6 <- glm(formula = survived ~ ascore + gini + failed, data=records, family = binomial)
summary(fit6)
I'm using pROC package to draw ROC curves and figure out AUC for 6 models fit1 through fit6.
I have approached this way to plots one ROC.
prob6=predict(fit6,type=c("response"))
records$prob6 = prob6
g6 <- roc(survived~prob6, data=records)
plot(g6)
But is there a way I can combine the ROCs for all 6 curves in one plot and display the AUCs for all of them, and if possible the Confidence Intervals too.
You can use the add = TRUE argument the plot function to plot multiple ROC curves.
Make up some fake data
library(pROC)
a=rbinom(100, 1, 0.25)
b=runif(100)
c=rnorm(100)
Get model fits
fit1=glm(a~b+c, family='binomial')
fit2=glm(a~c, family='binomial')
Predict on the same data you trained the model with (or hold some out to test on if you want)
preds=predict(fit1)
roc1=roc(a ~ preds)
preds2=predict(fit2)
roc2=roc(a ~ preds2)
Plot it up.
plot(roc1)
plot(roc2, add=TRUE, col='red')
This produces the different fits on the same plot. You can get the AUC of the ROC curve by roc1$auc, and can add it either using the text() function in base R plotting, or perhaps just toss it in the legend.
I don't know how to quantify confidence intervals...or if that is even a thing you can do with ROC curves. Someone else will have to fill in the details on that one. Sorry. Hopefully the rest helped though.

Quantile regression analysis in R

I have noticed that whenever I try to plot the coefficient graphs with their confidence intervals (CI) with the normal OLS coefficients and their CI, I get an error whenever I force the regression through the origin.
So if I use this code (engel is data for an quantile regression example in R):
data(engel)
fit1 <- rq(foodexp ~ income, tau = c(0.1,0.25,0.5,0.75,0.9), data = engel)
plot(summary(fit1))
I have no problem and my coefficeint graphs are drawn. But if I use this:
data(engel)
fit1 <- rq(foodexp ~ 0+income, tau = c(0.1,0.25,0.5,0.75,0.9), data = engel)
plot(summary(fit1))
I have a problem because the intercept goes through the origin. How can I get the plots as in the first code for the quantile regression without the intercept.

Plotting a ROC curve from a random forest classification

I'm trying to plot ROC curve of a random forest classification. Plotting works, but I think I'm plotting the wrong data since the resulting plot only has one point (the accuracy).
This is the code I use:
set.seed(55)
data.controls <- cforest_unbiased(ntree=100, mtry=3)
data.rf <- cforest(type ~ ., data = dataset ,controls=data.controls)
pred <- predict(data.rf, type="response")
preds <- prediction(as.numeric(pred), dataset$type)
perf <- performance(preds,"tpr","fpr")
performance(preds,"auc")#y.values
confusionMatrix(pred, dataset$type)
plot(perf,col='red',lwd=3)
abline(a=0,b=1,lwd=2,lty=2,col="gray")
To plot a receiver operating curve you need to hand over continuous output of the classifier, e.g. posterior probabilities. That is, you need to predict (data.rf, newdata, type = "prob").
predicting with type = "response" already gives you the "hardened" factor as output. Thus, your working point is implicitly fixed already. With respect to that, your plot is correct.
side note: in bag prediction of random forests will be highly overoptimistic!

Resources