Runtime Library error in R with Random forest (Rborist) - r

I am using library Rborist in R,and one time I accomplished a task to construct a Random Forest model ,and to save the object with the function saveRDS.
Then, I shut down R , and loaded the object with the function readRDS.
It is when a error happened that I tried to predict using the Random Forest model.
This is the error message:
Microsoft Visual C++ Runtime Library
This application has requested the Runtime to terminate it an unusual
way. Please contact the application's support team for more
information.
This is the code:
library(caret)
library(Rborist)
dat <- read.csv("data.csv", header=T)
dat <- transform(dat, y = as.factor(y))
index <- createDataPartition(dat$y, p=.8, list=F)
train <- dat[index, ];test <- dat[-index,]
model <- Rborist(train[,-1], train$y, predProb=0.1, nTree = 500)
table = table(predict(model, test[,-1])$yPred,test$y)
table
sum(diag(table))/sum(table)
saveRDS(model,file="model.rds")
#once shut down ,and boot up R
library(Rborist)
test <- read.csv("test.csv", header=T)
model <- readRDS(file="model.rds")
pred = predict(model, test[,-1])$yPred # Error!!

Related

How to save / load a random forest model created via h2o4gpu library in R? [duplicate]

I have created a R model using mlr and h2o package as below
library(h2o)
rfh20.lrn = makeLearner("classif.h2o.randomForest", predict.type = "prob")
Done the model tunings and model initiates h2o JVM and connects R to h2o cluster, modelling is done and I saved the model as .rds file.
saveRDS(h2orf_mod, "h2orf_mod.rds")
I do the prediction as
pred_h2orf <- predict(h2orf_mod, newdata = newdata)
then i shutdown h2o
h2o.shutdown()
Later I re-call the saved model
h2orf_mod <- readRDS("h2orf_mod.rds")
Initiate h2o so JVM connects R to h2o cluster
h2o.init()
Now the model is from local saved location, cluster doesn't know the model, when i do prediction, I get error which is obvious
ERROR: Unexpected HTTP Status code: 404 Not Found (url = http://localhost:54321/4/Predictions/models/DRF_model_R_1553297204511_743/frames/data.frame_sid_b520_1)
water.exceptions.H2OKeyNotFoundArgumentException
[1] "water.exceptions.H2OKeyNotFoundArgumentException: Object 'DRF_model_R_1553297204511_743' not found in function: predict for argument: model"
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, : ERROR MESSAGE: Object 'DRF_model_R_1553297204511_743' not found in function: predict for argument: model
May I know how to handle this, whether the saved model uploaded to cluster or something else, as every time building the model is NOT the effective way.
As per the comment instead of saving model using saveDRS/readRDS, save model as
h2oModelsaved <- h2o.saveModel(object = h2orf_model, path = "C:/User/Models/")
Re-call model
h2o.init()
h2oModelLoaded <- h2o.loadModel(h2oModelsaved)
Convert the test data as h2o Frame
newdata <- as.h2o(testdata)
Then Call the predict
pred_h2orf2 <- predict(h2oModelLoaded, newdata = newdata)
This works perfect

How to write a predict function for mlr predict to upload in AzureML as webservice?

I am trying to upload a R Model in AzureML as webservice, model uses mlr package in R and its predict function, the output of mlr predict is a table of "PredictionClassif" "Prediction", for the linear model like Regression I use
PredictAction <- function(inputdata){
predict(RegModel, inputdata, type="response")
}
This is working perfectly fine in Azure.
When I use mlr package for classification with predict type probability, the predict function I have to write as,
PredictAction <- function(inputdata){
require(mlr)
predict(randomForest,newdata=inputdata)
}
When calling the function
publishWebService(ws, fun, name, inputSchema)
It produces an Error as
converting `inputSchema` to data frame
Error in convertArgsToAMLschema(lapply(x, class)) :
Error: data type "table" not supported
as the predict function produces a table which I don't know how to convert or modify, so I give the outputschema
publishWebService(ws, fun, name, inputSchema,outputschema)
I am not sure how to specify the outputschema https://cran.r-project.org/web/packages/AzureML/AzureML.pdf
outputschema is a list,
the predict function from mlr produces the output of class
class(pred_randomForest)
"PredictionClassif" "Prediction"
and the data output is a dataframe
class(pred_randomForest$data)
"data.frame"
I am seeking help on the syntax for outputschema in publishWebService function, or whether I have to add any other arguments of the function. Not sure where is the issue, whether AzureML can't read the wrapped Model or whether the predict function of mlr is executed properly in AzureML.
Getting Following Error in AzureML
Execute R Script Piped (RPackage) : The following error occurred during evaluation of R script: R_tryEval: return error: Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "c('FilterModel', 'BaseWrapperModel', 'WrappedModel')"
here is the example of using XGBoost library in R:
library("xgboost") # the main algorithm
##Load the Azure workspace. You can find the ID and the pass in your workspace
ws <- workspace(
id = "Your workspace ID",
auth = "Your Auth Pass"
)
##Download the dataset
dataset <- download.datasets(ws, name = "Breast cancer data", quote="\"")
## split the dataset to get train and score data
## 75% of the sample size
smp_size <- floor(0.75 * nrow(dataset))
## set the seed to make your partition reproductible
set.seed(123)
## get index to split the dataset
train_ind <- sample(seq_len(nrow(dataset)), size = smp_size)
##Split train and test data
train_dataset <- dataset[train_ind, ]
test_dataset <- dataset[-train_ind, ]
#Get the features columns
features<-train_dataset[ , ! colnames(train_dataset) %in% c("Class") ]
#get the label column
labelCol <-train_dataset[,c("Class")]
#convert to data matrix
test_gboost<-data.matrix(test_dataset)
train_gboost<-data.matrix(train_dataset)
#train model
bst <- xgboost(data = train_gboost, label = train_dataset$Class, max.depth = 2, eta = 1,
nround = 2, objective = "binary:logistic")
#predict the model
pred <- predict(bst,test_gboost )
#Score model
test_dataset$Scorelabel<-pred
test_dataset$Scoreclasses<- as.factor(as.numeric(pred >= 0.5))
#Create
# Scoring Function
predict_xgboost <- function(new_data){
predictions <- predict(bst, data.matrix(new_data))
output <- data.frame(new_data, ScoredLabels =predictions)
output
}
#Publish the score function
api <- publishWebService(
ws,
fun = predict_xgboost,
name = "xgboost classification",
inputSchema = as.data.frame(as.table(train_gboost)),
data.frame = TRUE)

Rscript - long time of execution

I'm trying to create predictive model in caret package in R and invoke prediction for new data from terminal/cmd. Here is reproducible example:
# Sonar_training.R
## learning and saving model
library(caret)
library(mlbench)
data(Sonar)
set.seed(107)
inTrain <- createDataPartition(y = Sonar$Class, p = .75,list = FALSE)
training <- Sonar[ inTrain,]
testing <- Sonar[-inTrain,]
saveRDS(testing,"test.rds")
ctrl <- trainControl(method = "repeatedcv",
repeats = 3)
plsFit <- train(Class ~ .,data = training,method = "pls",
tuneLength = 15,
trControl = ctrl,
preProc = c("center", "scale"))
plsClasses <- predict(plsFit, newdata = testing)
saveRDS(plsFit,"fit.rds")
And here is script to invoke by Rscript.exe:
# script.R
##reading model and predict test data
t <- Sys.time()
pls <- readRDS("fit.rds")
testing <- readRDS("test.rds")
head(predict(pls, newdata = testing))
print(Sys.time() - t)
I run this in terminal with following statement:
pawel#pawel-MS-1753:~$ Rscript script.R
Loading required package: pls
Attaching package: ‘pls’
The following object is masked from ‘package:stats’:
loadings
[1] M M R M R R
Levels: M R
Time difference of 2.209697 secs
Is there any way to do it faster/more efficient? For example is there possibility to not loading packages every execution? Is readRDS correct for reading models in this case?
You can try to profile your code with the "profvis" package:
#library(profvis)
profvis({
for (i in 1:100){
#your code here
}
})
I tried and it happens that 99% of the execution time is training time, 1% is saving/loading RDS data, and the rest costs about 0 (loading packages, loading data,...):
So if you don't want to optimize the training function itself, it seems you have very few ways to reduce execution time.
I've seen this occur for PLS classification models and I'm not sure of the issue. However, try using method = "simpls" instead. You will get approximately the same answers and it should complete quickly.

Error in predict.randomForest

I was hoping someone would be able to help me out with an issue I am having with the prediction function of the randomForest package in R. I keep getting the same error when I try to predict my test data:
Here's my code so far:
extractFeatures <- function(RCdata) {
features <- c(4, 9:13, 17:20)
fea <- RCdata[, features]
fea$Week <- as.factor(fea$Week)
fea$Age_Range <- as.factor(fea$Age_Range)
fea$Race <- as.factor(fea$Race)
fea$Referral_Source <- as.factor(fea$Referral_Source)
fea$Referral_Source_Category <- as.factor(fea$Referral_Source_Category)
fea$Rehire <- as.factor(fea$Rehire)
fea$CLFPR_.HS <- as.factor(fea$CLFPR_.HS)
fea$CLFPR_HS <- as.factor(fea$CLFPR_HS)
fea$Job_Openings <- as.factor(fea$Job_Openings)
fea$Turnover <- as.factor(fea$Turnover)
return(fea)
}
gp <- runif(nrow(RCdata))
RCdata <- RCdata[order(gp), ]
train <- RCdata[1:4600, ]
test <- RCdata[4601:6149, ]
rf <- randomForest(extractFeatures(train), suppressWarnings(as.factor(train$disposition_category)), ntree=100, importance=TRUE)
testpredict <- predict(rf, extractFeatures(test))
"Error in predict.randomForest(rf, extractFeatures(test)) :
Type of predictors in new data do not match that of the training data."
I have tried adding in the following line to the code, and still receive the same error:
testpredict <- predict(rf, extractFeatures(test), type="prob")
I found the source of the error being the fact that the training data has a level or two that is not found in the test data. So when I tried another suggestion I found online to adjust the levels of the test data to that of the training data, I keep getting NULL values in the fields I am using in both the training and test sets.
levels(test$Referral)
NULL
I can see the levels when I use the function, however.
levels(as.factor(test$Referral))
So then I tried the same suggestion I found online with adjusting the levels of the test to equal that of the training data using the following function and received an error:
levels(as.factor(test$Referral)) -> levels(as.factor(train$Referral))
Error in `levels<-.factor`(`*tmp*`, value = c(... :
number of levels differs
I am sure there is something simple I am missing (I am still very new to R), so any insight you can provide would be unbelievably helpful. Thanks!

An error occurs when calling rpart for a large data set

I have a large data set which has 100k data fields. When I try str() or view the full data no glitched occurs, but when I run rpart on the training set it takes sometime and after about 3-4 minutes it shows up the following error,
Error: Unable to establish connection with R session
My script looks like below:
# Decision tree
library(rpart)
library(rattle)
library(party)
train_set <- read.table('my_sample_trainset.csv', header=TRUE, sep=',', stringsAsFactors=FALSE)
test_set <- read.table('my_sample_testset.csv', header=TRUE, sep=',', stringsAsFactors=FALSE)
my_trained_tree <- rpart(Route ~ Bus_Id + week_days + time_slot, data=train_set, method="class")
# Error occurs on/after this line
my_prediction <- predict(my_trained_tree, test_set, type = "class")
my_solution <- data.frame(Route = my_prediction)
write.csv(my_solution, file = "solution.csv", row.names = FALSE)
Am I missing a library? or does this happen because of the big data set (6.5MB)
Further, I am using rStudio version 0.99.447 on a Mac OS X Yosemite
That message means that R is still calculating the results. If you open Activity Monitor and sort by CPU usage on the CPU tab, you should see that rsession is using 100% of a CPU. So you can just click "ok" on that message and allow R to keep computing.
I wish there were a workaround though, this issue is plaguing me as we speak!

Resources