Calculating AUC from nnet model - r

For a bit of background, I am using the nnet package building a simple neural network.
My dataset has a number of factor and continuous variable features. To handle the continuous variables I apply scale and center which minuses each by its mean and divides by its SD.
I'm trying to produce an ROC & AUC plot from the results of neural network model.
The below is the code used to build my basic neural network model:
model1 <- nnet(Cohort ~ .-Cohort,
data = train.sample,
size = 1)
To get some predictions, I call the following function:
train.predictions <- predict(model1, train.sample)
Now, this assigns the train.predictions object to a large matrix consisting of 0 & 1 values. What I want to do, is getting the class probabilities for each prediction so I can plot an ROC curve using the pROC package.
So, I tried adding the following parameter to my predict function:
train.predictions <- predict(model1, train.sample, type="prob")
But I get an error:
Error in match.arg(type) : 'arg' should be one of “raw”, “class”
How can I go about getting class probabilities from outputs?

Assuming your test/validation data set is in train.test, and train.labels contains the true class labels:
train.predictions <- predict(model1, train.test, type="raw")
## This might not be necessary:
detach(package:nnet,unload = T)
library(ROCR)
## train.labels:= A vector, matrix, list, or data frame containing the true
## class labels. Must have the same dimensions as 'predictions'.
## computing a simple ROC curve (x-axis: fpr, y-axis: tpr)
pred = prediction(train.predictions, train.labels)
perf = performance(pred, "tpr", "fpr")
plot(perf, lwd=2, col="blue", main="ROC - Title")
abline(a=0, b=1)

Related

Plotting precision#k and recall#k in ROCR (R)

I'm evaluating a binary classifier in R with the ROCR package. My classifier outputs a score between 0 and 1 for target 0/1 labels.
I'd like to plot precision and recall # k but can't find a way to do it. Calling performance() without specifying the x-axis measure plots the precision value by score cutoff:
library(ROCR)
#df <- a two-dimensional dataframe with prediction scores and actual labels of my classifier
pred <- prediction(df$score, df$label)
pr_curve <- performance(pred, measure="prec")
For precision (or recall) at k, I'd need to plot the precision against the rank of each prediction, ordered by descending score:
pred <- prediction(df$score, df$label)
pr_curve <- performance(pred, measure="prec", x.measure="rank") #but there seems to be no "rank" in ROCR!
Is there a way to do this in ROCR? I'm open to use alternative libraries if this isn't the case.
Load libraries and define train and test set:
library(mlbench)
library(e1071)
library(ROCR)
data(BreastCancer)
df = BreastCancer
idx = sample(1:nrow(df),150)
trn = df[idx,]
test = df[-idx,]
Fit naives bayes
fit = naiveBayes(Class ~ .,data=trn)
In the manual for performance, it is written,
Precision/recall graphs: measure="prec", x.measure="rec".
Plot precision-recall:
pred = prediction(predict(fit,test,type="raw")[,2],test$Class)
#plot to see it is working correctly:
plot(performance(pred,measure="prec",x.measure="rec"))
Now for your case to do it at K, we can also do the precision recall from scratch:
#combine prob, predicted labels, and actual labels
res = data.frame(prob=predict(fit,test,type="raw")[,2],
predicted_label=predict(fit,test),
label = test$Class)
res = res[order(res$prob,decreasing=TRUE),]
res$rank = 1:nrow(res)
# calculate recall, which is the number of actual classes we get back
res$recall = cumsum(res$label=="malignant")/sum(res$label=="malignant")
# precision, number of malignant cases we predicted correctly
res$precision = cumsum(res$label=="malignant")/res$rank
# check the two plots
par(mfrow=c(1,2))
plot(performance(pred,measure="prec",x.measure="rec"))
plot(res$recall,res$precision,type="l")
Now you have it correct, getting or plotting precision at K is simply:
par(mfrow=c(1,2))
with(res,
plot(rank,precision,main="self-calculated",type="l"))
plot(pred#n.pos.pred[[1]],
pred#tp[[1]]/(pred#fp[[1]]+pred#tp[[1]]),
type="l",main="from RORC")
I am not aware of a way to use the .plot.performance function.. But you can use the variables stored under prediction object. pred#tp is the true positive, pred#fp is the false positive, so tp / fp+fp gives precision and pred#n.pos.pred gives the rank essentially.

predicted probability plot with robust errors for logit model

I'm trying to make a predicted probability plot for a logit model, using clustered robust standard errors. Supposedly the margins package should let you do this, using cplot(), but there seems to be a bug, such that cplot() doesn't recognize the optional vcov input. Below is a minimum working example. Does anyone know how to fix the bug or do this another way?
require("margins")
require("sandwich")
##Generating random numbers
set.seed(10)
y<-factor(rbinom(n=1000,size=1,prob=.5))
x <- rnorm(n=1000, mean=100,sd=1)
z<- rbinom(n=1000,size=3,prob=.5)
#creating a "dataset"
dta<-data.frame(x,y,z)
##Basic logit model
model <-glm(y~x,family="binomial"(link="logit"),data=dta)
##Creating variance-covariance matrix, clustered by z
vcov <- vcovCL(model, cluster=z)
##Making a plot
cplot(model,"x",vcov=vcov,what="prediction")
#can see below that vcov has no effect (if not obvious from plot)
print(cplot(model,"x",vcov=vcov,what="prediction",draw=FALSE))
print(cplot(model,"x",what="prediction",draw=FALSE))
You could use the following code:
# Predict values
pred.dta <- ggeffects::ggpredict(
model=model,
terms="x [all]",
vcov.fun="vcovCL",
vcov.type="HC1",
vcov.args=list(cluster=z)
)
# Plot predictions
ggplot2::ggplot(data=pred.dta,
ggplot2::aes(x=x, y=predicted))+
ggplot2::geom_line()+
ggplot2::geom_errorbar(ggplot2::aes(ymin=conf.low, ymax=conf.high), width=.1)
For comparison, this is the same code but without the clustered errors:
# Predict values
pred.dta <- ggeffects::ggpredict(
model=model,
terms="x [all]" )
# Plot predictions
ggplot2::ggplot(data=pred.dta,
ggplot2::aes(x=x, y=predicted))+
ggplot2::geom_line()+
ggplot2::geom_errorbar(ggplot2::aes(ymin=conf.low, ymax=conf.high), width=.1)

R: varying-coefficient GAMM models in mgcv - extracting 'by' variable coefficients?

I am creating a varying-coefficient GAMM using 'mgcv' in R with a continuous 'by' variable by using the by setting. However, I am having difficulty in locating the parameter estimate of the effect of the 'by' variable. In this example we determine the spatially-dependent effect of temperature t on sole eggs (i.e. how the linear effect of temperature on sole eggs changes across space):
require(mgcv)
require(gamair)
data(sole)
b = gam(eggs ~ s(la,lo) + s(la,lo, by = t), data = sole)
We can then plot the predicted effects of s(la,lo, by = t) against the predictor t:
pred <- predict(b, type = "terms", se.fit =T)
by.variable.prediction <- pred[[1]][,2]
plot(x= sole$t, y = by.variable.prediction)
However, I can't find a listing/function with the parameter estimates of the 'by' variable t for each sampling location. summary(), coef(), and predict() do not give you the parameter estimates.
Any help would be appreciated!
So the coefficient for the variable t is the value where t is equal to 1, conditional on the latitude and longitude. So one way to get the coefficient/parameter estimate for t at each latitude and longitude is to construct your own dataframe with a range of latitude/longitude combinations with t=1 and run predict.gam on that (rather than running predict.gam on the data used the fit the model, as you have done). So:
preddf <- expand.grid(list(la=seq(min(sole$la), max(sole$la), length.out=100),
lo=seq(min(sole$lo), max(sole$lo), length.out=100),
t=1))
preddf$parameter <- predict(b, preddf, type="response")
And then if you want to visualize this coefficient over space, you could graph it with ggplot2.
library(ggplot2)
ggplot(preddf) +
geom_tile(aes(x=lo, y=la, fill=parameter))

ROC curve - model performace error

I am trying to plot a ROC curve to show my model performance. The model is fitted using the randomForest package
prediction <- predict(fit, test, type="prob")
pred <- prediction(test$prediction, test$flag_cross_over )
pred2 <- prediction(abs(test$prediction +
rnorm(length(test$prediction), 0, 0.1)), flag_cross_over)
perf <- performance( pred, "tpr", "fpr" )
perf2 <- performance(pred2, "tpr", "fpr")
plot( perf, colorize = TRUE)
plot(perf2, add = TRUE, colorize = TRUE)
So using the test data I am trying to check the model performance. So the prediction column holds the predictions made and the flag_cross_over is the labels for the model.
The error is saying:
Number of cross-validation runs must be equal for predictions and labels.
.
the prediction dimensions is 410 2
the labels dim is 410 1
I am unsure why the prediction has one more value in the dimensions

Calculate AUC and GAM and set a scale in R

I have a data form as follows:
x y chla sst ssha eke tuna
: : : : : : :
: : : : : : :
I used a GAM model as follows:
GAM <- gam(tuna~s(chla), family = binomial, data = nonLinear)
By using this model above, I can process the data for chla, sst and ssha. But when I processed the eke data, it was not working R gave me the following error:
error in eval(expr, envir, enclos) : object `eke` not found.
Can anybody help me to solve this problem? I already installed the ROCR package to calculate the AUC. But I do not know how (the syntax) to calculate the AUC. Can anybody help me to solve this problem too?
I also used the following command to make a graph:
plot(GAM, xlab=..., ylab=..... font.lab= ...shade=....)
But when I run that command, the result is not so good. I mean, the scale on the y-axis is very weird. How do I set the scale on the y-axis and x-axis in 1 and 5 interval (for instance) respectively?
Since you didn't include any test data, I will use the test data in the gam package to calculate AUC and plot an ROC curve.
library(gam)
library(ROCR)
#sample binomial regression
data(kyphosis)
GAM<-gam(Kyphosis ~ poly(Age,2) + s(Start), data=kyphosis, family=binomial)
#get the predicted probabilities for each sample
gampred <- predict(GAM, type="response")
#make a ROCR prediction object using the predicted values from
# our model and the true values from the real data
rp <- prediction(gampred, kyphosis$Kyphosis)
#now calculate AUC
auc <- performance( rp, "auc")#y.values[[1]]
auc
#not plot ROC curve
roc <- performance( rp, "tpr", "fpr")
plot( roc )

Resources