Calculate AUC and GAM and set a scale in R - r

I have a data form as follows:
x y chla sst ssha eke tuna
: : : : : : :
: : : : : : :
I used a GAM model as follows:
GAM <- gam(tuna~s(chla), family = binomial, data = nonLinear)
By using this model above, I can process the data for chla, sst and ssha. But when I processed the eke data, it was not working R gave me the following error:
error in eval(expr, envir, enclos) : object `eke` not found.
Can anybody help me to solve this problem? I already installed the ROCR package to calculate the AUC. But I do not know how (the syntax) to calculate the AUC. Can anybody help me to solve this problem too?
I also used the following command to make a graph:
plot(GAM, xlab=..., ylab=..... font.lab= ...shade=....)
But when I run that command, the result is not so good. I mean, the scale on the y-axis is very weird. How do I set the scale on the y-axis and x-axis in 1 and 5 interval (for instance) respectively?

Since you didn't include any test data, I will use the test data in the gam package to calculate AUC and plot an ROC curve.
library(gam)
library(ROCR)
#sample binomial regression
data(kyphosis)
GAM<-gam(Kyphosis ~ poly(Age,2) + s(Start), data=kyphosis, family=binomial)
#get the predicted probabilities for each sample
gampred <- predict(GAM, type="response")
#make a ROCR prediction object using the predicted values from
# our model and the true values from the real data
rp <- prediction(gampred, kyphosis$Kyphosis)
#now calculate AUC
auc <- performance( rp, "auc")#y.values[[1]]
auc
#not plot ROC curve
roc <- performance( rp, "tpr", "fpr")
plot( roc )

Related

Finding area unde the curve in a Gompertz distribution in R

I am trying to fit a gompertz model to survival data. I am using the package 'flexsurv', and after setting up the data I use the following especification:
Gompertz.fit <- flexsurvreg(surv.age ~ Region + Sex + Income,
data = SA_Data, dist = "gompertz")
As I want to estimate the area under the curve but couldn't find a command, I thought about estimating the AUC "by hand", for which I need the gamma parameter. However, I can't seem to find it. When I try the survreg to estimate the parameters I get the following answer
> result.survreg.0 <- survreg(Surv(Age_At_DC, status)~1, data = SA_Data, dist = "gompertz") Error in match.arg(dist,
names(survreg.distributions)) : 'arg' should be one of “extreme”,
“logistic”, “gaussian”, “weibull”, “exponential”, “rayleigh”,
“loggaussian”, “lognormal”, “loglogistic”, “t”
Has anyone else estimated the AUC with a gompertz distribution in R?

ROC curve in R with rpart for a survival tree

I have an issue with creating a ROC Curve for my survival tree created by the rpart package. My goal was to evaluate my survival tree through Area Under Curve (AUC) in ROC curve. I had tried many ways to plot a ROC curve but failed. How can I approach my next step the ROC curve plot?
Here is the R code I have so far:
library(survival)
library("rpart")
library("partykit")
library(rattle)
library(rpart.plot)
temp = coxph(Surv(pgtime, pgstat) ~ age+eet+g2+grade+gleason+ploidy, stagec)
newtime = predict(temp, type = 'expected')
fit <- rpart(Surv(pgtime, pgstat) ~ age+eet+g2+grade+gleason+ploidy, data = stagec)
fancyRpartPlot(fit)
tfit <- as.party(fit) #Transfer "rpart" to "party"
predtree<-predict(tfit,newdata=stagec,type="prob") #Prediction
Here is the R code I have tried so far:
1.
library("ROCR")
predROCR <- prediction(predict(tfit, newdata = stagec, type = "prob")[, 2],labels=Surv(stagec$pgtime, stagec$pgstat))
Error in predict(tfit, newdata = stagec, type = "prob")[, 2] :
incorrect number of dimensions
It doesn't work. I checked the prediction result of a function of predict() and found that it is an ‘Survival’ object (Code and results are as follows:). I guess this method fails because it not suitable for "Survival" object?
predict(tfit, newdata = stagec, type = "prob")[[1]]
Call: survfit(formula = y ~ 1, weights = w, subset = w > 0)
n events median 0.95LCL 0.95UCL
33 1 NA NA NA
I try to derive the survival function value of each terminal node and use these value to draw the ROC curve. Is this correct? The ROC curve drawn in this way seems to treat the predicted classification results as continuous variables rather than categorical variables.
Here's the R code I tried:
tree2 = fit
tree2$frame$yval = as.numeric(rownames(tree2$frame))
#Get the survival function value of each sample
Surv_value = data.frame(predict(tree2, newdata=stagec,type = "matrix"))[,1]
Out=data.frame()
Out=cbind(stagec,Surv_value)
#ROC
library(survivalROC)
roc=survivalROC(Stime=Out$pgtime, status=Out$pgstat, marker = Out$Surv_value, predict.time =5, method="KM")
roc$AUC #Get the AUC of ROC plot
#Plot ROC
aucText=c()
par(oma=c(0.5,1,0,1),font.lab=1.5,font.axis=1.5)
plot(roc$FP, roc$TP, type="l", xlim=c(0,1), ylim=c(0,1),col="#f8766d",
xlab="False positive rate", ylab="True positive rate",
lwd = 2, cex.main=1.3, cex.lab=1.2, cex.axis=1.2, font=1.2)
aucText=c(aucText,paste0("498"," (AUC=",sprintf("%.3f",roc$AUC),")"))
legend("bottomright", aucText,lwd=2,bty="n",col=c("#f8766d","#00bfc4","blue","green"))
abline(0,1)

Calculating AUC from nnet model

For a bit of background, I am using the nnet package building a simple neural network.
My dataset has a number of factor and continuous variable features. To handle the continuous variables I apply scale and center which minuses each by its mean and divides by its SD.
I'm trying to produce an ROC & AUC plot from the results of neural network model.
The below is the code used to build my basic neural network model:
model1 <- nnet(Cohort ~ .-Cohort,
data = train.sample,
size = 1)
To get some predictions, I call the following function:
train.predictions <- predict(model1, train.sample)
Now, this assigns the train.predictions object to a large matrix consisting of 0 & 1 values. What I want to do, is getting the class probabilities for each prediction so I can plot an ROC curve using the pROC package.
So, I tried adding the following parameter to my predict function:
train.predictions <- predict(model1, train.sample, type="prob")
But I get an error:
Error in match.arg(type) : 'arg' should be one of “raw”, “class”
How can I go about getting class probabilities from outputs?
Assuming your test/validation data set is in train.test, and train.labels contains the true class labels:
train.predictions <- predict(model1, train.test, type="raw")
## This might not be necessary:
detach(package:nnet,unload = T)
library(ROCR)
## train.labels:= A vector, matrix, list, or data frame containing the true
## class labels. Must have the same dimensions as 'predictions'.
## computing a simple ROC curve (x-axis: fpr, y-axis: tpr)
pred = prediction(train.predictions, train.labels)
perf = performance(pred, "tpr", "fpr")
plot(perf, lwd=2, col="blue", main="ROC - Title")
abline(a=0, b=1)

different results from confusionMatrix of caret package and ROC of Epi package in R

I am trying to do classification by logistic regression. To evaluate the model, I used confusionMatrix and ROC. The problem is that the results from the two packages are different. I want to figure out which one is right or wrong.
my data is like:
data name = newoversample, with 29 variables and 4802 observations.
"q89" is predicted variable.
my attempt:
(1) confusion Matrix from 'caret' library
glm.fit = glm(q89 ~ ., newoversample, family = binomial)
summary(glm.fit)
glm.probs=predict(glm.fit,type="response")
glm.pred=rep(0,4802)
glm.pred[glm.probs>.5]="1"
library(caret)
confusionMatrix(data=glm.pred, reference=newoversample$q89)
the result is:
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 2018 437
1 383 1964
Accuracy : 0.8292
95% CI : (0.8183, 0.8398)
No Information Rate : 0.5
P-Value [Acc > NIR] : < 2e-16
Kappa : 0.6585
Mcnemar's Test P-Value : 0.06419
Sensitivity : 0.8405
Specificity : 0.8180
Pos Pred Value : 0.8220
Neg Pred Value : 0.8368
Prevalence : 0.5000
Detection Rate : 0.4202
Detection Prevalence : 0.5112
Balanced Accuracy : 0.8292
'Positive' Class : 0
(2) ROC curve from 'Epi' library
library(Epi)
rocresult <- ROC(form = q89 ~ ., data = newoversample, MI = FALSE, main = "over")
rocresult
the result is:
roc curve
as you can see, here, sensitivity is 91 and specificity is 78, which are different from the result of (1)confusion Matrix.
I cannot figure out why the results are different and which one is the correct one.
+)
If the second method(ROC curve) is wrong, please let me know how to calculate auc or draw roc curve from the first method.
please help me!
Thankyou
You should plot ROC curve of the same model you built using glm
library(ROCR)
pred <- prediction(predict(glm.fit), newoversample$q89)
perf <- performance(pred,"tpr","fpr")
plot(perf)
Hope this helps!
I think the confusion matrix seems fine. Giving the fact that you did not define 'Positive Class' so, it was set to 0 by default.
The problem is about the ROC plot. You can still use Epi::ROC for the roc curve but you should use Epi::ROC(test = newoversample$q89, stat = glm.pred, MI = FALSE, main = "over")
In this way, the Sensitivity and specificity should be not so much different from the matrix.
When you use ROC(form = q89 ~ ., data = newoversample, MI = FALSE, main = "over") that means you pass a logistic regression to form parameter which is not the same as glm model. And in this case, you should provide values of test and stat parameters for the ROC function instead (check here for more detail on Epi::ROC).

R: varying-coefficient GAMM models in mgcv - extracting 'by' variable coefficients?

I am creating a varying-coefficient GAMM using 'mgcv' in R with a continuous 'by' variable by using the by setting. However, I am having difficulty in locating the parameter estimate of the effect of the 'by' variable. In this example we determine the spatially-dependent effect of temperature t on sole eggs (i.e. how the linear effect of temperature on sole eggs changes across space):
require(mgcv)
require(gamair)
data(sole)
b = gam(eggs ~ s(la,lo) + s(la,lo, by = t), data = sole)
We can then plot the predicted effects of s(la,lo, by = t) against the predictor t:
pred <- predict(b, type = "terms", se.fit =T)
by.variable.prediction <- pred[[1]][,2]
plot(x= sole$t, y = by.variable.prediction)
However, I can't find a listing/function with the parameter estimates of the 'by' variable t for each sampling location. summary(), coef(), and predict() do not give you the parameter estimates.
Any help would be appreciated!
So the coefficient for the variable t is the value where t is equal to 1, conditional on the latitude and longitude. So one way to get the coefficient/parameter estimate for t at each latitude and longitude is to construct your own dataframe with a range of latitude/longitude combinations with t=1 and run predict.gam on that (rather than running predict.gam on the data used the fit the model, as you have done). So:
preddf <- expand.grid(list(la=seq(min(sole$la), max(sole$la), length.out=100),
lo=seq(min(sole$lo), max(sole$lo), length.out=100),
t=1))
preddf$parameter <- predict(b, preddf, type="response")
And then if you want to visualize this coefficient over space, you could graph it with ggplot2.
library(ggplot2)
ggplot(preddf) +
geom_tile(aes(x=lo, y=la, fill=parameter))

Resources