How to visualize natural cubic spline (ns) in the GAM - plot

> # simulate some data...
> dat <- gamSim(1,n=400,dist="normal",scale=2)
> # fit model&plot
> library(mgcv)
> library(splines)
> b0 <- gam(y~s(x1),data=dat)
> plot(b0)
Following the code above, I can get a plot like this: enter image description here
Now, I want to get a similar plot using the ns() function in GAM:
> b1 <- gam(y ~ ns(x1), data=dat)
> plot(b1)
But when I run the code in R, it shows "No term to plot", so I would like to know how to plot this picture? Thanks!

Because ns() is not a (penalised) spline indicated by s(), te(), t2() or ti(), it is not a member of class "mgcv.smooth". When you plot the fitted GAM object, the code looks to see if there are any mgcv smooths to plot. All other terms in the model are parametric terms, including your natural spline. If you do summary(b1) you'll see the ns() term in the Parametric effects section of the output.
Basically, gam() is just looking at your model as if it were a bunch of linear parametric terms. It doesn't know that those terms in the model matrix map to basis functions and hence to a natural spline.
Visualisation is not easy; plot(b1, all.terms = TRUE) will plot the linear effects of each basis function, so at least you see something, but typically this is not what you want. You will have to predict from the model over the range of the covariate x1 and then plot the predicted values against the grid of x1 values.
This begs the question; what were you expecting gam() to do with the ns() basis?

Related

How to obtain the QQ plot of a spline model R

I have a model that I've fitted using splines:
ssfit.3 <- smooth.spline(anage$lifespan ~ log(anage$Metabolic.by.mass),
df = 3)
I'm trying to obtain the model diagnostics such as the residual plot and the QQ plot for this model. I know for a linear model you can do
plot(lm)
which outputs all the different plots. How can I do this with spline models since plot(ssfit.3) does not output the same?
Extract the residuals and use qqnorm()/qqline().
example(smooth.spline) ## to get a model to work with
qqnorm(residuals(s2m))
qqline(residuals(s2m))

View slope of computed line Avplot function

I'm using the Avplots function in R. The function places a fitted line on the graph, and I'm wondering if there is a way to view the equation of that line. I know I could computationally reproduce it us the lm function, but I'm curious if there is a way to view the "back-end" code being used to produce the graph.
Thanks!
Below is some code. The function takes a linear model followed by the variables you want to create avPlots for (all against the regressor).
avPlots(mlm1,terms=~pctUrban+log(ppgdp))
I am not very familiar with Added-Variable Plots, but had an idea, though I'm not entirely sure what you are looking for. I hope this might be helpful.
Say you have an example using a linear model lm such as this (also from the Car package):
res <- avPlots(lm(prestige~income+education+type, data=Duncan))
This includes data on the prestige and other characteristics of 45 U. S. occupations in 1950.
The returned data res will have the data points for each of the four plots generated (see below). The avPlot function uses lsfit (least squares fit) for the fitted line. This can also be done from the returned data for each factor (e.g., for typeprof):
fit <- lsfit(res$typeprof[,1], res$typeprof[,2])
You could then get your slope from the coefficients (16.7):
fit$coefficients
Intercept X
4.178364e-16 1.665751e+01
As mentioned, this would give the same slopes from the lm model:
Call:
lm(formula = prestige ~ income + education + type, data = Duncan)
Coefficients:
(Intercept) income education typeprof typewc
-0.1850 0.5975 0.3453 16.6575 -14.6611

Plot Non-linear Mixed Model Over Original Fitted Data

I'm trying to plot the resultant curve from fitting a non-linear mixed model. It should be something like a curve of a normal distribution but skewed to the right. I followed previous links here and here, but when I use my data I can not make it happen for different difficulties (see below).
Here is the dataset
and code
s=read.csv("GRVMAX tadpoles.csv")
t=s[s$SPP== levels(s$SPP)[1],]
head(t)
vmax=t[t$PERFOR=="VMAX",]
colnames(vmax)[6]="vmax"
vmax$TEM=as.numeric(as.character(vmax$TEM));
require(lme4)
start =c(TEM=25)
is.numeric(start)
nm1 <- nlmer ( vmax ~ deriv(TEM)~TEM|INDIVIDUO,nlpars=start, nAGQ =0,data= vmax)# this gives an error suggesting nlpars is not numeric, despite start is numeric...:~/
After that, I want to plot the curve over the original data
with(vmax,plot(vmax ~ (TEM)))
x=vmax$TEM
lines(x, predict(nm1, newdata = data.frame(TEM = x, INDIVIDUO = "ACI5")))
Any hint?
Thanks in advance

How can I get the probability density function from a regression random forest?

I am using random-forest for a regression problem to predict the label values of Test-Y for a given set of Test-X (new values of features). The model has been trained over a given Train-X (features) and Train-Y (labels). "randomForest" of R serves me very well in predicting the numerical values of Test-Y. But this is not all I want.
Instead of only a number, I want to use random-forest to produce a probability density function. I searched for a solution for several days and here is I found so far:
"randomForest" doesn't produce probabilities for regression, but only in classification. (via "predict" and setting type=prob).
Using "quantregForest" provides a nice way to make and visualize prediction intervals. But still not the probability density function!
Any other thought on this?
Please see the predict.all parameter of the predict.randomForest function.
library("ggplot2")
library("randomForest")
data(mpg)
rf = randomForest(cty ~ displ + cyl + trans, data = mpg)
# Predict the first car in the dataset
pred = predict(rf, newdata = mpg[1, ], predict.all = TRUE)
hist(pred$individual)
The histogram of 500 "elementary" predictions looks like this:
You can also use quantregForest with a very fine grid of quantiles, convert them into a "cumulative distribution function (cdf)" with R-function ecdf and convert this cdf into a density estimation with a kernel density estimator.

Plotting a ROC curve from a random forest classification

I'm trying to plot ROC curve of a random forest classification. Plotting works, but I think I'm plotting the wrong data since the resulting plot only has one point (the accuracy).
This is the code I use:
set.seed(55)
data.controls <- cforest_unbiased(ntree=100, mtry=3)
data.rf <- cforest(type ~ ., data = dataset ,controls=data.controls)
pred <- predict(data.rf, type="response")
preds <- prediction(as.numeric(pred), dataset$type)
perf <- performance(preds,"tpr","fpr")
performance(preds,"auc")#y.values
confusionMatrix(pred, dataset$type)
plot(perf,col='red',lwd=3)
abline(a=0,b=1,lwd=2,lty=2,col="gray")
To plot a receiver operating curve you need to hand over continuous output of the classifier, e.g. posterior probabilities. That is, you need to predict (data.rf, newdata, type = "prob").
predicting with type = "response" already gives you the "hardened" factor as output. Thus, your working point is implicitly fixed already. With respect to that, your plot is correct.
side note: in bag prediction of random forests will be highly overoptimistic!

Resources