Trying to plot ROC curve for H2O Model Object in R, however, I keep receiving the following error message:
"Error in as.double(y) :
cannot coerce type 'S4' to vector of type 'double'"
My code is as follows:
drf1 <- h2o.randomForest(x=x,y=y,training_frame = train,validation_frame = valid, nfolds = nfolds, fold_assignment = "Modulo",keep_cross_validation_predictions = TRUE,seed = 1)
plot((h2o.performance(drf1,valid = T)), type = "roc")
I followed suggestions found here: How to directly plot ROC of h2o model object in R
Any help would be greatly appreciated!
From the error, I think your response variable is not binary. You can change your response variable to factor before putting it into model. i.e.
df$y <- as.factor(df$y)
"ROC is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied".
source:
ROC wiki
Related
I'm trying to test the variable importance before running the actual regression. But, when I attempt to do so, I get this error:
Error in varImp(regressor, scale = FALSE) :
trying to get slot "responses" from an object (class "randomForest.formula") that is not an S4 object
I've tried looking up the error, but there wasn't much information available online. What can I do to fix this?
all = read.csv('https://raw.githubusercontent.com/bandcar/massShootings/main/all.csv')
# Check Variable importance with randomForest
regressor <- randomForest::randomForest(total_victims ~ . , data = all, importance=TRUE) # fit the random forest with default parameter
caret::varImp(regressor, scale = FALSE) # get variable importance, based on mean decrease in accuracy
I am currently trying to run Logistic Regression model on my DF.
While I was creating a new modelframe with the actual and predicted values i get get the following error message.
Error
Error in confusionMatrix(as.factor(log_class), lgtest$Satisfaction, positive = "satisfied") :
unused argument (positive = "satisfied")
This is my model:
#### Logistic regression model
log_model = glm(Satisfaction~., data = lgtrain, family = "binomial")
summary(log_model)
log_preds = predict(log_model, lgtest[,1:22], type = "response")
head(log_preds)
log_class = array(c(99))
for (i in 1:length(log_preds)){
if(log_preds[i]>0.5){
log_class[i]="satisfied"}else{log_class[i]="neutral or dissatisfied"}}
### Creating a new modelframe containing the actual and predicted values.
log_result = data.frame(Actual = lgtest$Satisfaction, Prediction = log_class)
lgtest$Satisfaction = factor(lgtest$Satisfaction, c(1,0),labels=c("satisfied","neutral or dissatisfied"))
lgtest
confusionMatrix(log_class, log_preds, threshold = 0.5) ####this works
mr1 = confusionMatrix(as.factor(log_class),lgtest$Satisfaction, positive = "satisfied") ## this is the line that causes the error
I had same problem. I typed "?confusionMatrix" and take this output:
Help on topic 'confusionMatrix' was found in the following packages:
confusionMatrix
(in package InformationValue in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Create a confusion matrix
(in package caret in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Confusion Matrix
(in package ModelMetrics in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
As we can understand from here, since it is in more than one package, we need to specify which package we want to use.
So I typed code with "caret::confusionMatrix(...)" and it worked!
This is how we can write the code to get rid of argument error when building a confusion matrix in R
caret::confusionMatrix(
data = new_tree_predict$predicted,
reference = new_tree_predict$actual,
positive = "True"
)
I am trying to run a confusion matrix in R for my decision tree model but get the following error:
"Error in table(data, reference, dnn = dnn, ...) : all arguments must have the same length"
I don't understand why it wont run.
dtree_test <- rpart(writeoff ~ education+employ_status+residential_status+loan_amount+loan_length+
net_income,method="class", data=testnew,parms=list(split="information"))
dtree_test$cptable
plotcp(dtree_test)
dtree_test.pruned <- prune(dtree_test, cp=.01`enter code here`639344)
prp(dtree_test.pruned, type = 2, extra = 104,
fallen.leaves = TRUE, main="Decision Tree")
dtree_test.pred <- predict(dtree_test.pruned, testnew, type="class")
dtree_test.perf <- table(testnew$writeoff, dtree_test.pred,
dnn=c("Actual", "Predicted"))
dtree_test.perf
confusionMatrix(predict(dtree_test.pruned, testnew, type="class"),train$writeoff)
The last line is:
confusionMatrix(predict(dtree_test.pruned, testnew, type="class"),train$writeoff)
which makes predictions for dataset testnew but compares those with the response in dataset train.
Also in rpart(...) you have data=testnew but perhaps you meant to use your training data to fit the model?
I am trying to fit a logistic regression model with all predictors on training data, but I keep getting errors. I got this:
library(kernlab)
data(spam)
tr_idx = sample(nrow(spam), 1000)
spam_tr = spam[tr_idx,] # training
spam_te = spam[-tr_idx] # testing
fit_tr = lm(spam_te ~ spam_tr, data=spam)
but this error always comes out:
Error in model.frame.default(formula = spam_te ~ spam_tr, data = spam, :
invalid type (list) for variable 'spam_te'
and when I input this:
fit_tr = lm(spam_te ~ spam_tr, data=tri_dx)
I got another error:
Error in is.data.frame(data) : object 'tri_dx' not found
There are multiple issues with your code.
1. your third line misses a coma
2. your fourth line needs to have the only spam_tr because a linear model is fitted on training data first and then tested on testing data.
tr_idx = sample(nrow(spam), 1000)
spam_tr = spam[tr_idx,]
spam_te = spam[-tr_idx,]
fit_tr = lm(spam_tr , data = spam)
Hope this helps.
In the formula you need to specify the variables in the model, not the data sets.
lm is also a linear model, not logistic.
I am building a decision tree using the tree library in R. I have attempted to fit my model as follows:
model <- tree(Outcome ~ Age + Sex + Income, data = train, type = "class")
Running the above line gives me an error as follows:
Error in tree.control(nobs, ...) : unused argument (type = "class")
I down sampled so that each class is equal and so did not specify any weights. If I remove the argument, type = "class", the model runs but when I predict using the model, it seems that it is building a regression model which I do not want.
Can someone help?
If you look at the help page ?tree there is no argument called type. If you are getting a regression, that is because Outcome is a numeric argument. I expect that you can fix this by adding
train$Outcome = factor(train$Outcome)