I am currently trying to run Logistic Regression model on my DF.
While I was creating a new modelframe with the actual and predicted values i get get the following error message.
Error
Error in confusionMatrix(as.factor(log_class), lgtest$Satisfaction, positive = "satisfied") :
unused argument (positive = "satisfied")
This is my model:
#### Logistic regression model
log_model = glm(Satisfaction~., data = lgtrain, family = "binomial")
summary(log_model)
log_preds = predict(log_model, lgtest[,1:22], type = "response")
head(log_preds)
log_class = array(c(99))
for (i in 1:length(log_preds)){
if(log_preds[i]>0.5){
log_class[i]="satisfied"}else{log_class[i]="neutral or dissatisfied"}}
### Creating a new modelframe containing the actual and predicted values.
log_result = data.frame(Actual = lgtest$Satisfaction, Prediction = log_class)
lgtest$Satisfaction = factor(lgtest$Satisfaction, c(1,0),labels=c("satisfied","neutral or dissatisfied"))
lgtest
confusionMatrix(log_class, log_preds, threshold = 0.5) ####this works
mr1 = confusionMatrix(as.factor(log_class),lgtest$Satisfaction, positive = "satisfied") ## this is the line that causes the error
I had same problem. I typed "?confusionMatrix" and take this output:
Help on topic 'confusionMatrix' was found in the following packages:
confusionMatrix
(in package InformationValue in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Create a confusion matrix
(in package caret in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Confusion Matrix
(in package ModelMetrics in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
As we can understand from here, since it is in more than one package, we need to specify which package we want to use.
So I typed code with "caret::confusionMatrix(...)" and it worked!
This is how we can write the code to get rid of argument error when building a confusion matrix in R
caret::confusionMatrix(
data = new_tree_predict$predicted,
reference = new_tree_predict$actual,
positive = "True"
)
Related
I'm trying to test the variable importance before running the actual regression. But, when I attempt to do so, I get this error:
Error in varImp(regressor, scale = FALSE) :
trying to get slot "responses" from an object (class "randomForest.formula") that is not an S4 object
I've tried looking up the error, but there wasn't much information available online. What can I do to fix this?
all = read.csv('https://raw.githubusercontent.com/bandcar/massShootings/main/all.csv')
# Check Variable importance with randomForest
regressor <- randomForest::randomForest(total_victims ~ . , data = all, importance=TRUE) # fit the random forest with default parameter
caret::varImp(regressor, scale = FALSE) # get variable importance, based on mean decrease in accuracy
When trying to graph the conditional fixed effects of a glmmTMB model with two random intercepts in GGally I get the error:
There was an error calling "tidy_fun()". Most likely, this is because the
function supplied in "tidy_fun=" was misspelled, does not exist, is not
compatible with your object, or was missing necessary arguments (e.g. "conf.level=" or "conf.int="). See error message below.
Error: Error in "stop_vctrs()":
! Can't recycle "..1" (size 3) to match "..2" (size 2).`
I have tinkered with figuring out the issue and it seems to be related to the two random intercepts included in the model. I have also tried extracting the coefficient and standard error information separately through broom.mixed::tidy and then feeding the data frame into GGally:ggcoef() with no avail. Any suggestions?
# Example with built-in randu data set
data(randu)
randu$A <- factor(rep(c(1,2), 200))
randu$B <- factor(rep(c(1,2,3,4), 100))
# Model
test <- glmmTMB(y ~ x + z + (0 +x|A) + (1|B), family="gaussian", data=randu)
# A few of my attempts at graphing--works fine when only one random effects term is in model
ggcoef_model(test)
ggcoef_model(test, tidy_fun = broom.mixed::tidy)
ggcoef_model(test, tidy_fun = broom.mixed::tidy, conf.int = T, intercept=F)
ggcoef_model(test, tidy_fun = broom.mixed::tidy(test, effects="fixed", component = "cond", conf.int = TRUE))
There are some (old!) bugs that have recently been fixed (here, here) that would make confidence interval reporting on RE parameters break for any model with multiple random terms (I think). I believe that if you are able to install updated versions of both glmmTMB and broom.mixed:
remotes::install_github("glmmTMB/glmmTMB/glmmTMB#ci_tweaks")
remotes::install_github("bbolker/broom.mixed")
then ggcoef_model(test) will work.
Trying to plot ROC curve for H2O Model Object in R, however, I keep receiving the following error message:
"Error in as.double(y) :
cannot coerce type 'S4' to vector of type 'double'"
My code is as follows:
drf1 <- h2o.randomForest(x=x,y=y,training_frame = train,validation_frame = valid, nfolds = nfolds, fold_assignment = "Modulo",keep_cross_validation_predictions = TRUE,seed = 1)
plot((h2o.performance(drf1,valid = T)), type = "roc")
I followed suggestions found here: How to directly plot ROC of h2o model object in R
Any help would be greatly appreciated!
From the error, I think your response variable is not binary. You can change your response variable to factor before putting it into model. i.e.
df$y <- as.factor(df$y)
"ROC is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied".
source:
ROC wiki
I recently changed from STATA to R and somehow struggles to find some corresponding commands. I would like to get panel bootsrapped standard errors from a Fixed Effect model using the plm library as described here here for STATA users:
My questions concern the approach in general (whether boot is the appropriate library or the library(meboot)
)
How to solve for that particular error using boot:
First get some panel data:
library(plm)
data(EmplUK) # from plm library
test<-function(data, i) coef(plm(wage~emp+sector,data = data[i,],
index=c("firm","year"),model="within"))
Second:
library(boot)
boot<-boot(EmplUK, test, R = 100)
> boot<-boot(EmplUK, test, R = 100)
duplicate couples (time-id)
Error in pdim.default(index[[1]], index[[2]]) :
Called from: top level
Browse[1]>
For some reason , boot will pass an index ( original here) to plm with duplicated values. You should remove all duplicated values and assert that the index is unique before passing it to plm.
test <- function(data,original) {
coef(plm(wage~emp+sector,data = data[unique(original),],
index=c("firm","year"),model="within"))
}
boot(EmplUK, test, R = 100)
## ORDINARY NONPARAMETRIC BOOTSTRAP
## Call:
## boot(data = EmplUK, statistic = test, R = 100)
## Bootstrap Statistics :
## original bias std. error
## t1* -0.1198127 -0.01255009 0.05269375
I am trying to tune rpart. I have already split my data into a training and cv set. The tune.rpart convenience function doesn't seem to have a a way to specify a cv set. so I am using the regular tune() function.
I have 595 potential variables in my dataset, so I don't want to specify using a formula. I get the following error when I do this
Error in tune(rpart, train.x = trainset[, -1], train.y = trainset[, 1], :
Dependent variable has wrong type!
In addition: Warning message:
In if (y) ans$y <- Y :
the condition has length > 1 and only the first element will be used
Code:
load('train.dat')
load('cv.dat')
trainset$class<-factor(trainset$class)
cvset$class<-factor(cvset$class)
rpart.tune<-tune(rpart,train.x= trainset[,-1], train.y=trainset[,1],
validation.x=cvset[,-1], validation.y=cvset[,1],
ranges = list(
cp = c(0.002,0.005,0.01,0.015,0.02,0.03)),
tunecontrol = tune.control(sampling = "fix"))
Data is available at:
https://docs.google.com/folder/d/0B2_rKFnvrjMAM3FGbnFvZm5laUk/edit
You have to check the prediction output of the classifier you are using. tune() expects (for classification) receive one of the following:
(is.logical(true.y) || is.factor(true.y)) && (is.logical(pred) || is.factor(pred) || is.character(pred))
The prediction with rpart, for example, produces a matrix as output. You can try svmthat handles that right or try to give just two classes.