I have a model that I've fitted using splines:
ssfit.3 <- smooth.spline(anage$lifespan ~ log(anage$Metabolic.by.mass),
df = 3)
I'm trying to obtain the model diagnostics such as the residual plot and the QQ plot for this model. I know for a linear model you can do
plot(lm)
which outputs all the different plots. How can I do this with spline models since plot(ssfit.3) does not output the same?
Extract the residuals and use qqnorm()/qqline().
example(smooth.spline) ## to get a model to work with
qqnorm(residuals(s2m))
qqline(residuals(s2m))
Related
I am able to create a multivariate linear regression model using
lmex = lm(overweight$h_egfr_cystc96 ~ overweightlogblood + overweight$age_96, data = overweight)
Which returns values for the intercept, estimate, p-value, etc.
I want to plot a single regression line for one of my variables: overweightlogblood
If I use
ggplot(overweight,aes(y=h_egfr_cystc96,x=overweightlogblood))+geom_point()+geom_smooth(method="lm")
It gives me a nice plot, but this is for the univariate model. I would like the plot to feature a regression line (with 95% CI) for the intercept and estimate of a single covariate in a multivariate model. Any ideas?
Thank you in advance!
I am working on a regression model in Random Forest, I want to judge whether there is heteroscedasticity in the model or not?
When I am developing Linear Model I can see that there is heteroscedasticity and the curve looks like below graph, I want to check similar residual plot for Random Forest Model.
I am working in R.
It's an Expense Model basis Income,Branch,TotalFamilyMember
We can recreate the plot with the residuals from the predicted values:
#Using the regression example from ?randomForest
ozone.rf <- randomForest(Ozone ~ ., data=airq, mtry=3,
importance=TRUE)
#Find residuals by subtracting predicted from acutal values
err <- ozone.rf$predicted - airq$Ozone
#Make data frame holding residuals and fitted values
df <- data.frame(Residuals=err, Fitted.Values=ozone.rf$predicted)
#Sort data by fitted values
df2 <- df[order(df$Fitted.Values),]
#Create plot
plot(Residuals~Fitted.Values, data=df2)
#Add origin line at (0,0) with grey color #8
abline(0,0, col=8)
#Add the same smoothing line from lm regression with color red #2
lines(lowess(df2$Fitted.Values, df2$Residuals), col=2)
Update
There is a much easier way. I realized that the plot is just a regression of residuals and fitted values, therefore this gives the same output:
fitted.values <- ozone.rf$predicted
residuals <- fitted.values - ozone.rf$y
plot(lm(residuals ~ fitted.values), which=1)
I am using random-forest for a regression problem to predict the label values of Test-Y for a given set of Test-X (new values of features). The model has been trained over a given Train-X (features) and Train-Y (labels). "randomForest" of R serves me very well in predicting the numerical values of Test-Y. But this is not all I want.
Instead of only a number, I want to use random-forest to produce a probability density function. I searched for a solution for several days and here is I found so far:
"randomForest" doesn't produce probabilities for regression, but only in classification. (via "predict" and setting type=prob).
Using "quantregForest" provides a nice way to make and visualize prediction intervals. But still not the probability density function!
Any other thought on this?
Please see the predict.all parameter of the predict.randomForest function.
library("ggplot2")
library("randomForest")
data(mpg)
rf = randomForest(cty ~ displ + cyl + trans, data = mpg)
# Predict the first car in the dataset
pred = predict(rf, newdata = mpg[1, ], predict.all = TRUE)
hist(pred$individual)
The histogram of 500 "elementary" predictions looks like this:
You can also use quantregForest with a very fine grid of quantiles, convert them into a "cumulative distribution function (cdf)" with R-function ecdf and convert this cdf into a density estimation with a kernel density estimator.
I have a logistic regression model (using R) as
fit6 <- glm(formula = survived ~ ascore + gini + failed, data=records, family = binomial)
summary(fit6)
I'm using pROC package to draw ROC curves and figure out AUC for 6 models fit1 through fit6.
I have approached this way to plots one ROC.
prob6=predict(fit6,type=c("response"))
records$prob6 = prob6
g6 <- roc(survived~prob6, data=records)
plot(g6)
But is there a way I can combine the ROCs for all 6 curves in one plot and display the AUCs for all of them, and if possible the Confidence Intervals too.
You can use the add = TRUE argument the plot function to plot multiple ROC curves.
Make up some fake data
library(pROC)
a=rbinom(100, 1, 0.25)
b=runif(100)
c=rnorm(100)
Get model fits
fit1=glm(a~b+c, family='binomial')
fit2=glm(a~c, family='binomial')
Predict on the same data you trained the model with (or hold some out to test on if you want)
preds=predict(fit1)
roc1=roc(a ~ preds)
preds2=predict(fit2)
roc2=roc(a ~ preds2)
Plot it up.
plot(roc1)
plot(roc2, add=TRUE, col='red')
This produces the different fits on the same plot. You can get the AUC of the ROC curve by roc1$auc, and can add it either using the text() function in base R plotting, or perhaps just toss it in the legend.
I don't know how to quantify confidence intervals...or if that is even a thing you can do with ROC curves. Someone else will have to fill in the details on that one. Sorry. Hopefully the rest helped though.
I've fitted a COZIGAM model, that is like a GAM model but for Zero-Inflated data.
My model is:
library(COZIGAM)
t5.co<-cozigam(z1~s(y1, bs="cr")+s(lon1,lat1,sst1)+s(sst1,bs="cr"),
conv.crit.out=1e-3, family=poisson, data=data1)
How could I represent the second spline (s(lon1,lat1,sst1)) with a 3d plot?
I've tried this:
plot(t5.co, select=2, plot.2d="persp")
But it does not work.
Thanks!!