I've been trying to predict a raster in R using an XGBoost model. I need to use raster::predict() because of the raster size. raster::predict(raster, xgboost_model, type="prob") and raster::predict(raster, xgboost_model, type="raw") work fine. But when I try to predict the classes, which is what I want to do using raster::predict(raster, xgboost_model, type="class"), I get an error:
> predicted<-raster::predict(raster, xgboost_model, type="class")
Error in v[cells, ] <- predv : incorrect number of subscripts on matrix
Here's a reproducible example using tidymodels which is what I used to train my model. Just in case this is tidymodel specific.
library(raster)
library(tidymodels)
library(tidyverse)
## Make dummy raster with class as class to predict.
band1<-raster(ncol=10, nrow=10)
values(band1)<-runif(ncell(band1))
band2<-raster(ncol=10, nrow=10)
values(band2)<-runif(ncell(band2))
band3<-raster(ncol=10, nrow=10)
values(band3)<-runif(ncell(band3))
class<-raster(ncol=10, nrow=10)
values(class)<-floor(runif(ncell(class), min=1, max=5))
r<-stack(band1, band2, band3, class)
names(r)<-c("band1", "band2", "band3", "class")
## Convert raster to df for training.
train<-getValues(r)%>%
as_tibble()
## Tune and train model.
xgb_spec<-boost_tree(
trees=50,
tree_depth = tune(),
min_n=tune(),
loss_reduction=tune(),
sample_size=tune(),
mtry=tune(),
learn_rate=tune()
)%>%
set_engine("xgboost")%>%
set_mode("classification")
xgb_grid<-grid_latin_hypercube(
tree_depth(),
min_n(),
loss_reduction(),
sample_size=sample_prop(),
finalize(mtry(), select(train, -class)),
learn_rate(),
size=5
)
xgb_wf<-workflow()%>%
add_formula(as.factor(class)~band1+band2+band3)%>%
add_model(xgb_spec)
folds <- vfold_cv(train, v = 5)
xgb_res<-tune_grid(
xgb_wf,
resamples=folds,
grid=xgb_grid,
control=control_grid(save_pred=T)
)
best_auc<-select_best(xgb_res, "roc_auc")
final_xgb<-finalize_workflow(
xgb_wf,
best_auc
)
last_fit<-fit(final_xgb, train)
## remove class layer for test to simulate real world example
test<-r%>%
dropLayer(4)
## This works
raster::predict(test, last_fit, type="prob")
## This doesn't
raster::predict(test, last_fit, type="class")
Error produced for type="class" is:
> raster::predict(test, last_fit, type="class")
Error in v[cells, ] <- predv : incorrect number of subscripts on matrix
I've googled my face off and the only way I've figure out how to predict class is to convert the raster to matrix and then add the predictions back into the raster. But this is really, really slow.
Thanks in advance.
Aha. I figured it out. The problem is that a model produced by the parsnip package always returns a tibble when the prediction type is type="class". raster.predict expects a matrix to be returned. You can get around this by providing a function to raster.predict that converts the returned parsnip::predicted model to a matrix.
Here's how I predicted a raster using my original model created in my question:
fun<-function(...){
p<-predict(...)
return(as.matrix(as.numeric(p[, 1, drop=T])))
}
raster::predict(test, last_fit, type="class", fun=fun)
Related
I am running a Bidirectional LSTM for multiclass text classification in R using Keras. I have run my model and I need to create a confusion matrix. I tried using predict_classes() but my RStudio threw an error that predict_classes() was deprecated. I tried to use this bit of code that I found on the RStudio Keras website:
prediction1 <- model %>%
predict(x.test) %>%
k_argmax(axis = -1)
NOTE: x.test is my matrix that contains the text features.
I am not sure how to use it + I have not found any examples of how to use it online so I am quite confused. I would appreciate any help that anyone could provide!
Thanks
You can use 'caret' library to achieve that.
#Install required packages
install.packages('caret')
#Import required library
library(caret)
#Creates vectors having data points
expected_value <- factor(c(1,0,1,0,1,1,1,0,0,1))
predicted_value <- factor(c(1,0,0,1,1,1,0,0,0,1))
#Creating confusion matrix
example <- confusionMatrix(data=predicted_value, reference = expected_value)
#Display results
example
Or the table function:
pred <- model %>% predict(x_test, batch_size = batch_size)
y_pred = round(pred)
# Confusion matrix
confusion_matrix = table(y_pred, y_test)
For the 'caret' example:
https://www.journaldev.com/46732/confusion-matrix-in-r
I have trained a classfication model on 13,000 rows of labels with lasso in r's glmnet library. I have checked my accuracy and it looks decent, now I want to make predictions for rest of the dataset, which is 300,000 rows. My approach was to label rest of the rows using the trained model. I'm not sure if that's the most effective strategy to do approximate labeling.
But, when I'm trying to label rest of the data, I'm running into this error:
Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Even if I break the dataset to 5000 rows for predictions, I still get the same error.
Here's my code:
library(glmnet)
#the subset of original dataset
data.text <- data.text_filtered %>% filter(!label1 == "NA")
#Quanteda corpus
data_corpus <- corpus(data.text$text, docvars = data.frame(labels = data.text$label1))
set.seed(1234)
dataShuffled <- corpus_sample(data_corpus, size = 12845)
dataDfm <- dfm_trim( dfm(dataShuffled, verbose = FALSE), min_termfreq = 10)
#model to train the classifier
lasso <- cv.glmnet(x = dataDfm[1:10000,], y = trainclass[1:10000],
alpha = 1, nfolds = 5, family = "binomial")
#plot the lasso plot
plot(lasso)
#predictions
dataPreds <- predict(lasso, dataDfm[10000:2845,], type="class")
(movTable <- table(dataPreds, docvars(dataShuffled, "labels")[10000:2845]))
make predictions on rest of the dataset. This dataset has 300,000 rows.
data.text_NAs <- data.text_filtered %>% filter(label1 == "NA")
data_NADfm <- dfm_trim( dfm(corpus(data.text_NAs$text), verbose = FALSE), min_termfreq = 10)
data.text_filtered <- data.text_filtered %>% mutate(label = predict(lasso, as.matrix(data_NADfm), type="class", s="lambda.1se")
Thanks much for any help.
The problem lies in the as.matrix(data_NADfm) - this makes the dfm into a dense matrix, which makes it too large to handle.
Solution: Keep it sparse: either remove the as.matrix() wrapper, or if it does not like a raw dfm input, you can coerce it to a plain sparse matrix (from the Matrix package) using as(data_NADfm, "dgCMatrix"). This should be fine since both cv.glmnet() and its predict() method can handle sparse matrix inputs.
I am trying to upload a R Model in AzureML as webservice, model uses mlr package in R and its predict function, the output of mlr predict is a table of "PredictionClassif" "Prediction", for the linear model like Regression I use
PredictAction <- function(inputdata){
predict(RegModel, inputdata, type="response")
}
This is working perfectly fine in Azure.
When I use mlr package for classification with predict type probability, the predict function I have to write as,
PredictAction <- function(inputdata){
require(mlr)
predict(randomForest,newdata=inputdata)
}
When calling the function
publishWebService(ws, fun, name, inputSchema)
It produces an Error as
converting `inputSchema` to data frame
Error in convertArgsToAMLschema(lapply(x, class)) :
Error: data type "table" not supported
as the predict function produces a table which I don't know how to convert or modify, so I give the outputschema
publishWebService(ws, fun, name, inputSchema,outputschema)
I am not sure how to specify the outputschema https://cran.r-project.org/web/packages/AzureML/AzureML.pdf
outputschema is a list,
the predict function from mlr produces the output of class
class(pred_randomForest)
"PredictionClassif" "Prediction"
and the data output is a dataframe
class(pred_randomForest$data)
"data.frame"
I am seeking help on the syntax for outputschema in publishWebService function, or whether I have to add any other arguments of the function. Not sure where is the issue, whether AzureML can't read the wrapped Model or whether the predict function of mlr is executed properly in AzureML.
Getting Following Error in AzureML
Execute R Script Piped (RPackage) : The following error occurred during evaluation of R script: R_tryEval: return error: Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "c('FilterModel', 'BaseWrapperModel', 'WrappedModel')"
here is the example of using XGBoost library in R:
library("xgboost") # the main algorithm
##Load the Azure workspace. You can find the ID and the pass in your workspace
ws <- workspace(
id = "Your workspace ID",
auth = "Your Auth Pass"
)
##Download the dataset
dataset <- download.datasets(ws, name = "Breast cancer data", quote="\"")
## split the dataset to get train and score data
## 75% of the sample size
smp_size <- floor(0.75 * nrow(dataset))
## set the seed to make your partition reproductible
set.seed(123)
## get index to split the dataset
train_ind <- sample(seq_len(nrow(dataset)), size = smp_size)
##Split train and test data
train_dataset <- dataset[train_ind, ]
test_dataset <- dataset[-train_ind, ]
#Get the features columns
features<-train_dataset[ , ! colnames(train_dataset) %in% c("Class") ]
#get the label column
labelCol <-train_dataset[,c("Class")]
#convert to data matrix
test_gboost<-data.matrix(test_dataset)
train_gboost<-data.matrix(train_dataset)
#train model
bst <- xgboost(data = train_gboost, label = train_dataset$Class, max.depth = 2, eta = 1,
nround = 2, objective = "binary:logistic")
#predict the model
pred <- predict(bst,test_gboost )
#Score model
test_dataset$Scorelabel<-pred
test_dataset$Scoreclasses<- as.factor(as.numeric(pred >= 0.5))
#Create
# Scoring Function
predict_xgboost <- function(new_data){
predictions <- predict(bst, data.matrix(new_data))
output <- data.frame(new_data, ScoredLabels =predictions)
output
}
#Publish the score function
api <- publishWebService(
ws,
fun = predict_xgboost,
name = "xgboost classification",
inputSchema = as.data.frame(as.table(train_gboost)),
data.frame = TRUE)
I'm trying to learn a penalized logistic regression method with glmnet. I'm trying to predict if a car from the mtcars example data will have an automatic transmission or manual. I think my code is pretty straightforward, but I seem to be getting an error:
This first block simply splits mtcars into an 80% train set and a 20% test set
library(glmnet)
attach(mtcars)
smp_size <- floor(0.8 * nrow(mtcars))
set.seed(123)
train_ind <- sample(seq_len(nrow(mtcars)), size=smp_size)
train <- mtcars[train_ind,]
test <- mtcars[-train_ind,]
I know the x data is supposed to be in a matrix form without the response, so I separate the two training sets into a non-response matrix (train_x) and a response vector (train_y)
train_x <- train[,!(names(train) %in% c("am"))]
train_y <- train$am
But when trying to train the model,
p1 <- glmnet(train_x, train_y)
I get the error:
Error in elnet(x, is.sparse, ix, jx, y, weights, offset, type.gaussian,
:(list) object cannot be coerced to type 'double'
Am I missing something?
Coercing the first argument as a matrix solve for me :
p1 <- glmnet(as.matrix(train_x), train_y)
In fact , form glmnet? looks that the first argument should be a matrix/sparse matrix:
x: input matrix, of dimension nobs x nvars; each row is an observation vector. Can be in sparse matrix format (inherit from class "sparseMatrix" as in package Matrix; not yet available for family="cox")
I'm learning to use AMORE package for fitting data with an optimal neural network, so i'm following examples on his wiki page and im' trying to run the following code but I couldn't, train function triggered a message error:
modelLookup(method) : value of model unknown
require(AMORE)
## We create two artificial data sets. ''P'' is the input data set. ''target'' is the output.
P <- matrix(sample(seq(-1,1,length=500), 500, replace=FALSE), ncol=1)
target <- P^2 + rnorm(500, 0, 0.5)
## We create the neural network object
net.start <- newff(n.neurons=c(1,3,1),
learning.rate.global=1e-2,
momentum.global=0.5,
error.criterium="LMS",
Stao=NA, hidden.layer="tansig",
output.layer="purelin",
method="ADAPTgdwm")
## We train the network according to P and target.
result <- train(net.start, P, target, error.criterium="LMS", report=TRUE, show.step=100, n.shows=5 )
## Several graphs, mainly to remark that
## now the trained network is is an element of the resulting list.
y <- sim(result$net, P)
plot(P,y, col="blue", pch="+")
points(P,target, col="red", pch="x")
Any suggestion is appreciated!