Sensitivity and Specificity in R - r

I want to know how i can write a functions Sensitivity() and Specificity() that help me to compute Sensitivity and Specificity by using R ? What options can help me?

Here is a method using the caret package and it includes a reproducible example (i.e. a bit of code that someone can quickly run to help you out) from the help files of the caret package. #llottmanhill is correct that you will get more help when you tell us what you are trying to do. Right now your question is quite vague. However, give this a shot:
library(caret)
library(MASS)
fit <- lda(Species ~ ., data = iris)
model <- predict(fit)$class
irisTabs <- table(model, iris$Species)
## When passing factors, an error occurs with more
## than two levels
sensitivity(model, iris$Species)
## When passing a table, more than two levels can
## be used
sensitivity(irisTabs, "versicolor")
specificity(irisTabs, c("setosa", "virginica"))

Related

Error "t.haven_labelled()` not supported" when trying to substitute NA with mice package

Total R noob here, trying to figure out how to implement mice package to account for NAs in my dataset.
This is my code so far (i left out the unimportant stuff like trimming the data set down to relevant variables, recoding etc.)
install.packages("haven")
install.packages("survey")
library(haven)
library(data.table)
library(survey)
library(car)
dat <- read_dta("ZA5270_v2-0-0.dta")
dat_wght <- svydesign(ids= ~1, data=dat, weights =~wghtpew)
install.packages("mice")
library(mice)
dat_wght[["variables"]]$sex = as.factor(dat_wght[["variables"]]$sex)
dat_imp <- mice(dat_wght[["variables"]], m=5, maxit=10)
The error message I get is:
iter imp variable
1 1 px03Error in `t()`:
! `t.haven_labelled()` not supported.
I already did some research and apparantly it has to do with label values since haven package causes lots of weird problems. I already tried to remove all label values with sapply(dat_wght[["variables"]], haven::zap_labels)but the error still occurs (same when I try it with remove_val_labels()) Does anyone know how to solve this problem?
I'm really grateful for every single piece of advice :) Thanks in advance!

Unnest nested tidydrc models

Problem
I've been using a tidy wrapper for the drc package—tidydrc— to build growth curves which produces a tidy version of the normal output (best for ggplot). However, due to the inherit nesting of the models, I can't run simple drc functions since the models are nested inside a dataframe. I've attached code that mirrors drc and tidydrc package below.
Goal
To compare information criteria from multiple model fits for the tidydrc output using the drc function mselect()—ultimately to efficiently select the best fitting model.
Ideal Result (works with drc)
library(tidydrc) # To load the Puromycin data
library(drc)
model_1 <- drm(rate ~ conc, state, data = Puromycin, fct = MM.3())
mselect(model_1, list(LL.3(), LL.5(), W1.3(), W1.4(), W2.4(), baro5()))
# DESIRED OUTPUT SIMILAR TO THIS
logLik IC Lack of fit Res var
MM.3 -78.10685 170.2137 0.9779485 70.54874 # Best fitting model
LL.3 -78.52648 171.0530 0.9491058 73.17059
W1.3 -79.22592 172.4518 0.8763679 77.75903
W2.4 -77.87330 173.7466 0.9315559 78.34783
W1.4 -78.16193 174.3239 0.8862192 80.33907
LL.5 -77.53835 177.0767 0.7936113 87.80627
baro5 -78.00206 178.0041 0.6357592 91.41919
Not Working Example with tidydrc
library(tidyverse) # tidydrc utilizes tidyverse functions
model_2 <- tidydrc_model(data = Puromycin, conc, rate, state, model = MM.3())
summary(model_2)
Error: summary.vctrs_list_of() not implemented.
Now, I can manually tease apart the list of models in the dataframe model_2 but can't seem to figure out the correct apply statements (it's a mess) to get this working.
Progress Thus Far
These both produce the same error, so at least I've subsetted a level down but now I'm stuck and pretty sure this is not the ideal solution.
mselect(model_2$drmod, list(LL.3(), LL.5(), W1.3(), W1.4(), W2.4(), baro5()))
model_2_sub <- model_2$drmod # Manually subset the drmod column
apply(model_2_sub, 2, mselect(list(LL.3(), LL.5(), W1.3(), W1.4(), W2.4(), baro5())))
Error in UseMethod("logLik") :
no applicable method for 'logLik' applied to an object of class "list"
I've even tried the tidyverse function unnest() to no avail
model_2_unnest <- model_2 %>% unnest_longer(drmod, indices_include = FALSE)

How do I program multiple output nodes using the neuralnet package in R?

I am building a neural network to predict "fit" based on a number of variables. "FitCls" is in three classes: "Excellent", "Good" and "Poor". I have 10 input variables, and have chosen one hidden layer with 6 neurons. I would like three output neurons so that I can classify the case that is presented to the neural network as a "fit" which is "excellent", "good" or "poor".
I have seen a similar example where this was done using iris data (on slide 40, et seq.) here: http://www.slideshare.net/DerekKane/data-science-part-viii-artifical-neural-network. I have tried to copy that structure, but I still only get a single output node when I plot the network.
Here is my code (after loading the 'nfit' dataframe):
nfit[nfit$FitCls=="Excellent", "Output"] <- 2
nfit[nfit$FitCls=="Good", "Output"] <- 1
nfit[nfit$FitCls=="Poor", "Output"] <- 0
nn <- neuralnet(Output~Universalism+Benevolence+Tradition+Conformity+Security+Power+Achievement+Hedonism+Stimulation+SelfDir, data = nfit, hidden = 6, err.fct = "ce", linear.output = FALSE)
When I run neuralnet, it gives me a warning message that it has forced err.fct to "sse" because the response is not binary. I am not sure what is going wrong because in the example that I am copying, the plot of the neural network shows three output nodes. Please let me know what I am doing wrong.
If this is not the right way to go about using neuralnet for classification I would also appreciate any help you can provide as to what I should be doing. Many thanks!
To more or less replicate the iris example you will need:
library(neuralnet)
library(nnet)
trainset <- cbind(iris[, 1:4], class.ind(iris$Species))
espnnet2=neuralnet(setosa + versicolor + virginica ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, trainset)
plot(espnnet2)
Unfortunately neuralnet is sensible to data, try scale the explanatory variables.

Evaluating weka classifier J48 with missing values in test set, R RWeka

I have an error when evaluating a simple test set with evaluate_Weka_classifier. Trying to learn how the interface works from R to Weka with RWeka, but I still don't get this.
library("RWeka")
iris_input <- iris[1:140,]
iris_test <- iris[-(1:140),]
iris_fit <- J48(Species ~ ., data = iris_input)
evaluate_Weka_classifier(iris_fit, newdata = iris_test, numFolds=5)
No problems here, as we would assume (It is ofcourse a stupit test, no random holdout data etc). But now I want to simulate missing data (alot). So i set Petal.Width as missing:
iris_test$Petal.Width <- NA
evaluate_Weka_classifier(iris_fit, newdata = iris_test, numFolds=5)
Which gives the error:
Error in .jcall(evaluation, "S", "toSummaryString", complexity) :
java.lang.IllegalArgumentException: Can't have more folds than instances!
Edit: This error should tell me that I have not enough instances, but I have 10
Edit: If I use write.arff, it can be exported and read in by Weka. Change Petal.Width {} into Petal.Width numeric to make the two files exactly the same. Then it works in Weka.
Is this a thinking error? When reading Machine Learning, Practical machine learning tools and techniques it seems to be legit. Maybe I just have to tell RWeka that I want to use fractions when a split uses a missing variable?
Thnx!
The issue is that you need to tell J48() what to do with missing values.
library(RWeka)
?J48()
#pertinent output
J48(formula, data, subset, na.action,
control = Weka_control(), options = NULL)
na.action tells R what to do with missing values. When following up on na.action you will find that "The ‘factory-fresh’ default is na.omit". Under this setting of course there are not enough instances!
Instead of leaving na.action as the default omit, I have changed it as follows,
iris_fit<-J48(Species~., data = iris_input, na.action=NULL)
and it works like a charm!

in R Plot importance variables of Random Forest model

What am I doing wrong here? What does "subscript out of bound" mean?
I got the below code (first block) excerpt form a Revolution R online seminar regarding datamining in R. I'm trying to incorporate this in a RF model I ran but can't get pass what I think is the ordering of variables. I just want to plot the importance of the variables.
I included a little more then needed below to give context. But really what I am erroring out is the third line of code. The second code block are the errors I am getting as applied to the data I am working with. Can anyone help me figure this out?
-------------------------------------------------------------------------
# List the importance of the variables.
rn <- round(importance(model.rf), 2)
rn[order(rn[,3], decreasing=TRUE),]
#### of
# Plot variable importance
varImpPlot(model.rf, main="",col="dark blue")
title(main="Variable Importance Random Forest weather.csv",
sub=paste(format(Sys.time(), "%Y-%b-%d %H:%M:%S"), Sys.info()["user"]))
#--------------------------------------------------------------------------
My errors:
> rn[order(rn[,2], decreasing=TRUE),]
Error in order(rn[, 2], decreasing = TRUE) : subscript out of bounds
Think I understand the confusion. I bet you a 4-finger Kit Kat that if you type in ncol(rn) you'll see that rn has 2 columns, not 3 as you might expect. The first "column" you're seeing on the screen isn't really a column - it's just the row names for the object rn. Type rownames(rn) to confirm this. The final column of rn that you want to order by is therefore rn[,2] rather than rn[,3]. The "subscript out of bounds" message comes up because you've asked R to order by column 3, but rn doesn't have a column 3.
Here's my brief detective trail for anyone interested in what the "importance" object actually is... I installed library(randomForest) and then ran an example from the documentation online:
set.seed(4543)
data(mtcars)
mtcars.rf <- randomForest(mpg ~ ., data=mtcars, ntree=1000,
keep.forest=FALSE, importance=TRUE)
importance(mtcars.rf)
Turns out the "importance" object in this case looks like this (first few rows only to save space):
%IncMSE IncNodePurity
cyl 17.058932 181.70840
disp 19.203139 242.86776
hp 17.708221 191.15919
...
Obviously ncol(importance(mtcars.rf)) is 2, and the row names are likely to be the thing leading to confusion :)

Resources