Creating a confusion matrix for multiclass classification in keras using R - r

I am running a Bidirectional LSTM for multiclass text classification in R using Keras. I have run my model and I need to create a confusion matrix. I tried using predict_classes() but my RStudio threw an error that predict_classes() was deprecated. I tried to use this bit of code that I found on the RStudio Keras website:
prediction1 <- model %>%
predict(x.test) %>%
k_argmax(axis = -1)
NOTE: x.test is my matrix that contains the text features.
I am not sure how to use it + I have not found any examples of how to use it online so I am quite confused. I would appreciate any help that anyone could provide!
Thanks

You can use 'caret' library to achieve that.
#Install required packages
install.packages('caret')
#Import required library
library(caret)
#Creates vectors having data points
expected_value <- factor(c(1,0,1,0,1,1,1,0,0,1))
predicted_value <- factor(c(1,0,0,1,1,1,0,0,0,1))
#Creating confusion matrix
example <- confusionMatrix(data=predicted_value, reference = expected_value)
#Display results
example
Or the table function:
pred <- model %>% predict(x_test, batch_size = batch_size)
y_pred = round(pred)
# Confusion matrix
confusion_matrix = table(y_pred, y_test)
For the 'caret' example:
https://www.journaldev.com/46732/confusion-matrix-in-r

Related

Using parsnip model to predict a raster in R

I've been trying to predict a raster in R using an XGBoost model. I need to use raster::predict() because of the raster size. raster::predict(raster, xgboost_model, type="prob") and raster::predict(raster, xgboost_model, type="raw") work fine. But when I try to predict the classes, which is what I want to do using raster::predict(raster, xgboost_model, type="class"), I get an error:
> predicted<-raster::predict(raster, xgboost_model, type="class")
Error in v[cells, ] <- predv : incorrect number of subscripts on matrix
Here's a reproducible example using tidymodels which is what I used to train my model. Just in case this is tidymodel specific.
library(raster)
library(tidymodels)
library(tidyverse)
## Make dummy raster with class as class to predict.
band1<-raster(ncol=10, nrow=10)
values(band1)<-runif(ncell(band1))
band2<-raster(ncol=10, nrow=10)
values(band2)<-runif(ncell(band2))
band3<-raster(ncol=10, nrow=10)
values(band3)<-runif(ncell(band3))
class<-raster(ncol=10, nrow=10)
values(class)<-floor(runif(ncell(class), min=1, max=5))
r<-stack(band1, band2, band3, class)
names(r)<-c("band1", "band2", "band3", "class")
## Convert raster to df for training.
train<-getValues(r)%>%
as_tibble()
## Tune and train model.
xgb_spec<-boost_tree(
trees=50,
tree_depth = tune(),
min_n=tune(),
loss_reduction=tune(),
sample_size=tune(),
mtry=tune(),
learn_rate=tune()
)%>%
set_engine("xgboost")%>%
set_mode("classification")
xgb_grid<-grid_latin_hypercube(
tree_depth(),
min_n(),
loss_reduction(),
sample_size=sample_prop(),
finalize(mtry(), select(train, -class)),
learn_rate(),
size=5
)
xgb_wf<-workflow()%>%
add_formula(as.factor(class)~band1+band2+band3)%>%
add_model(xgb_spec)
folds <- vfold_cv(train, v = 5)
xgb_res<-tune_grid(
xgb_wf,
resamples=folds,
grid=xgb_grid,
control=control_grid(save_pred=T)
)
best_auc<-select_best(xgb_res, "roc_auc")
final_xgb<-finalize_workflow(
xgb_wf,
best_auc
)
last_fit<-fit(final_xgb, train)
## remove class layer for test to simulate real world example
test<-r%>%
dropLayer(4)
## This works
raster::predict(test, last_fit, type="prob")
## This doesn't
raster::predict(test, last_fit, type="class")
Error produced for type="class" is:
> raster::predict(test, last_fit, type="class")
Error in v[cells, ] <- predv : incorrect number of subscripts on matrix
I've googled my face off and the only way I've figure out how to predict class is to convert the raster to matrix and then add the predictions back into the raster. But this is really, really slow.
Thanks in advance.
Aha. I figured it out. The problem is that a model produced by the parsnip package always returns a tibble when the prediction type is type="class". raster.predict expects a matrix to be returned. You can get around this by providing a function to raster.predict that converts the returned parsnip::predicted model to a matrix.
Here's how I predicted a raster using my original model created in my question:
fun<-function(...){
p<-predict(...)
return(as.matrix(as.numeric(p[, 1, drop=T])))
}
raster::predict(test, last_fit, type="class", fun=fun)

How to drop variables from a model created from the mlr package in R?

This is somewhat similar to the question I asked here. However, that question as zero answers and I think this question might be more fruitful in getting a response.
What I am trying to do is remove some features from an mlr created model, without having to fit the model again. For example, if we take the Boston data from the MASS library and create an mlr model, like so:
library(mlr)
library(MASS)
# Using the mlr package to train the data:
bTask <- makeRegrTask(data = Boston, target = "medv")
bLearn <- makeLearner("regr.randomForest")
bMod <- train(bLearn, bTask)
And then I use the task and trained model in some function, for example:
someFunc <- function(task, model){
pred <- predict(model, task)
pred <- pred$data$response
head(pred,10)
}
someFunc(bTask,bMod)
Everything works fine. But Im wondering if it's possible to remove some variables from bMod, without having to fit the mlr trained model again?
I know it's possible to drop features from the task using dropFeatures(), for example:
bTask1 <- dropFeatures(bTask, c("zn", "chas", "rad"))
But if I try to mix bTask1 and bMod like so:
pred1 <- predict(bMod, btask1)
I get the sensible error:
Error in predict.randomForest(.model$learner.model, newdata =
.newdata, : variables in the training data missing in newdata
Is there a way of dropping some features from the mlr created model (i.e, bMod) without fitting it again?

R difference between class and DMwR package knn functions?

So I was working on a project in R and I ran into a issue with fitting a KNN model to some data. I was getting different results when I ran the knn from class and kNN from DMwR libraries. I tied using the Weekly data from the psych package but I got similar results. Confusion matrices for the fits give significantly different results as does the strait up comparison between between the predictions.
I am not sure why these two functions are returning different results. Maybe someone can review my sample code and let me know what is going on.
library(ISLR)
WTrain <- subset(Weekly, Year <= 2008)
WTest <- subset(Weekly, Year >= 2009)
library(caret)
library(class)
fitClass <- knn(train = data.matrix(WTrain$Lag2), test = data.matrix(WTest$Lag2), cl=WTrain$Direction, k=5)
confusionMatrix(data = fitClass, reference = WTest$Direction)
library(DMwR)
fitDMwR <- kNN(Direction~Lag2,train = WTrain, test = WTest, norm=FALSE, k=5)
confusionMatrix(table(fitDMwR == 'Down', WTest$Direction =='Down'))
results <- cbind(fitClass,fitDMwR)
head(results)

Understanding how to use nnet in R

This is my first attempt using a machine learning paradigm in R. I'm using a planet data set (url: https://www.kaggle.com/mrisdal/open-exoplanet-catalogue) and I simply want to predict a planet's size based on the size of its Sun. This is the code I currently have, using nnet():
library(nnet)
#Organize data:
cols_to_keep = c(1,4,21)
full_data <- na.omit(read.csv('Planet_Data.csv')[, cols_to_keep])
#Split data:
train_data <- full_data[sample(nrow(full_data), round(nrow(full_data)/2)),]
rownames(train_data) <- 1:nrow(train_data)
test_data <- full_data[!rownames(full_data) %in% rownames(data1),]
rownames(test_data) <- 1:nrow(test_data)
#nnet
nnet_attempt <- nnet(RadiusJpt~HostStarRadiusSlrRad, data=train_data, size=0, linout=TRUE, skip=TRUE, maxNWts=10000, trace=FALSE, maxit=1000, decay=.001)
nnet_newdata <- predict(nnet_attempt, newdata=test_data)
nnet_newdata
When I print nnet_newdata I get a value for each row in my data, but I don't really understand what these values mean. Is this a proper way to use the nnet() package to predict a simple regression?
Thanks
When predict is called for an object with class nnet you will get, by default, the raw output from the nnet model applied to your new dataset. If, instead, yours is a classification problem, you can use type = "class".
See here.

Plot in SVM model (e1071 Package) using DocumentTermMatrix

i trying do create a plot for my model create using SVM in e1071 package.
my code to build the model, predict and build confusion matrix is
ptm <- proc.time()
svm.classifier = svm(x = train.set.list[[0.999]][["0_0.1"]],
y = train.factor.list[[0.999]][["0_0.1"]],
kernel ="linear")
pred = predict(svm.classifier, test.set.list[[0.999]][["0_0.1"]], decision.values = TRUE)
time[["svm"]] = proc.time() - ptm
confmatrix = confusionMatrix(pred,test.factor.list[[0.999]][["0_0.1"]])
confmatrix
train.set.list and test.set.list contains the test and train set for several conditions. train and set factor has the true label for each set. Train.set and test.set are both documenttermmatrix.
Then i tried to see a plot of my data, i tried with
plot(svm.classifier, train.set.list[[0.999]][["0_0.1"]])
but i got the message:
"Error in plot.svm(svm.classifier, train.set.list[[0.999]][["0_0.1"]]) :
missing formula."
what i'm doing wrong? confusion matrix seems good to me even not using formula parameter in svm function
Without given code to run, it's hard to say exactly what the problem is. My guess, given
?plot.svm
which says
formula formula selecting the visualized two dimensions. Only needed if more than two input variables are used.
is that your data has more than two predictors. You should specify in your plot function:
plot(svm.classifier, train.set.list[[0.999]][["0_0.1"]], predictor1 ~ predictor2)

Resources