tune() function e1071 / libsvm -error with rpart - r

I am trying to tune rpart. I have already split my data into a training and cv set. The tune.rpart convenience function doesn't seem to have a a way to specify a cv set. so I am using the regular tune() function.
I have 595 potential variables in my dataset, so I don't want to specify using a formula. I get the following error when I do this
Error in tune(rpart, train.x = trainset[, -1], train.y = trainset[, 1], :
Dependent variable has wrong type!
In addition: Warning message:
In if (y) ans$y <- Y :
the condition has length > 1 and only the first element will be used
Code:
load('train.dat')
load('cv.dat')
trainset$class<-factor(trainset$class)
cvset$class<-factor(cvset$class)
rpart.tune<-tune(rpart,train.x= trainset[,-1], train.y=trainset[,1],
validation.x=cvset[,-1], validation.y=cvset[,1],
ranges = list(
cp = c(0.002,0.005,0.01,0.015,0.02,0.03)),
tunecontrol = tune.control(sampling = "fix"))
Data is available at:
https://docs.google.com/folder/d/0B2_rKFnvrjMAM3FGbnFvZm5laUk/edit

You have to check the prediction output of the classifier you are using. tune() expects (for classification) receive one of the following:
(is.logical(true.y) || is.factor(true.y)) && (is.logical(pred) || is.factor(pred) || is.character(pred))
The prediction with rpart, for example, produces a matrix as output. You can try svmthat handles that right or try to give just two classes.

Related

Unused argument error when building a Confusion Matrix in R

I am currently trying to run Logistic Regression model on my DF.
While I was creating a new modelframe with the actual and predicted values i get get the following error message.
Error
Error in confusionMatrix(as.factor(log_class), lgtest$Satisfaction, positive = "satisfied") :
unused argument (positive = "satisfied")
This is my model:
#### Logistic regression model
log_model = glm(Satisfaction~., data = lgtrain, family = "binomial")
summary(log_model)
log_preds = predict(log_model, lgtest[,1:22], type = "response")
head(log_preds)
log_class = array(c(99))
for (i in 1:length(log_preds)){
if(log_preds[i]>0.5){
log_class[i]="satisfied"}else{log_class[i]="neutral or dissatisfied"}}
### Creating a new modelframe containing the actual and predicted values.
log_result = data.frame(Actual = lgtest$Satisfaction, Prediction = log_class)
lgtest$Satisfaction = factor(lgtest$Satisfaction, c(1,0),labels=c("satisfied","neutral or dissatisfied"))
lgtest
confusionMatrix(log_class, log_preds, threshold = 0.5) ####this works
mr1 = confusionMatrix(as.factor(log_class),lgtest$Satisfaction, positive = "satisfied") ## this is the line that causes the error
I had same problem. I typed "?confusionMatrix" and take this output:
Help on topic 'confusionMatrix' was found in the following packages:
confusionMatrix
(in package InformationValue in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Create a confusion matrix
(in package caret in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Confusion Matrix
(in package ModelMetrics in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
As we can understand from here, since it is in more than one package, we need to specify which package we want to use.
So I typed code with "caret::confusionMatrix(...)" and it worked!
This is how we can write the code to get rid of argument error when building a confusion matrix in R
caret::confusionMatrix(
data = new_tree_predict$predicted,
reference = new_tree_predict$actual,
positive = "True"
)

Fail to predict woe in R

I used this formula to get woe with
library("woe")
woe.object <- woe(data, Dependent="target", FALSE,
Independent="shop_id", C_Bin=20, Bad=0, Good=1)
Then I want to predict woe for the test data
test.woe <- predict(woe.object, newdata = test, replace = TRUE)
And it gives me an error
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "data.frame"
Any suggestions please?
For prediction, you cannot do it with the package woe. You need to use the package. Take note of the masking of the function woe, see below:
#let's say we woe and then klaR was loaded
library(klaR)
data = data.frame(target=sample(0:1,100,replace=TRUE),
shop_id = sample(1:3,100,replace=TRUE),
another_var = sample(letters[1:3],100,replace=TRUE))
#make sure both dependent and independent are factors
data$target=factor(data$target)
data$shop_id = factor(data$shop_id)
data$another_var = factor(data$another_var)
You need two or more dependent variables:
woemodel <- klaR::woe(target~ shop_id+another_var,
data = data)
If you only provide one, you have an error:
woemodel <- klaR::woe(target~ shop_id,
data = data)
Error in woe.default(x, grouping, weights = weights, ...) : All
factors with unique levels. No woes calculated! In addition: Warning
message: In woe.default(x, grouping, weights = weights, ...) : Only
one single input variable. Variable name in resulting object$woe is
only conserved in formula call.
If you want to predict the dependent variable with only one independent, something like logistic regression will work:
mdl = glm(target ~ shop_id,data=data,family="binomial")
prob = predict(mdl,data,type="response")
predicted_label = ifelse(prob>0.5,levels(data$target)[1],levels(data$target)[0])

Error trying to do cross validation after a classification tree

I am trying to run a simple classification tree using the tree package. I have taken the code from a textbook, copied one by one, but it doesn't work, no matter what I do.
library(ISLR)
library(tree)
C = Carseats
C$HighSales = ifelse(C$Sales<=8,"No","Yes")
C = C[,-1]
set.seed(2)
train = sample(1:nrow(C), 200)
carseats.test = C[-train,]
high.test = C$HighSales[-train]
tree.carseats = tree(HighSales~., C, subset = train)
tree.predict = predict(tree.carseats, carseats.test, type = "class")
table(tree.predict,high.test)
(93+48)/200
set.seed(3)
cv.cs = cv.tree(tree.carseats, FUN = prune.misclass)
I am getting the following error:
Error in as.data.frame.default(data, optional = TRUE) :
cannot coerce class ‘"function"’ to a data.frame
I have looked at the help of the function. It requires a tree object, which is what I put inside.
What can be the problem ? The code is identical to the textbook and to other websites who quote the book.
There are two problems. One is related to the formula in tree:
formula - A formula expression. The left-hand-side (response) should be either a numerical vector when a regression tree will be fitted or a factor, when a classification tree is produced. The right-hand-side should be a series of numeric or factor variables separated by +; there should be no interaction terms. Both . and - are allowed: regression trees can have offset terms.
So, we should instead have
C$HighSales <- factor(ifelse(C$Sales <= 8, "No", "Yes"))
Next, there's a problem with how cv.tree deals with variables (see here). Doing something like
mydf <- C
tree.carseats <- tree(HighSales ~ ., mydf, subset = train)
works. The issue is that there's a function called C and cv.tree refers exactly to this function rather than your dataset.

R: randomForest error message

Trying to run Random Forest on a data set that has 400~ samples, and about 360 variables in data frame df:
I'm trying to use the the variables (s10, s100, etc etc) to predict the Genotype. This is the code I'm using:
rf <-randomForest(Genotype ~ ., data = df, importance = TRUE, proximity = TRUE)
but I keep getting the error message:
Error in if (n == 0) stop("data (x) has 0 rows") :
argument is of length zero
What am I doing wrong?
First, don't name your objects the same as R functions (ie., "df"). Second, try a non-formula interface to randomForest. Let's where that gets you.
( rf <-randomForest(y=my.df[,"Genotype"], x=my.df[,2:ncol(my.df)], importance = TRUE, proximity = TRUE) )

library(e1071), tune Variable lengths differ

I have been attempting to utilize the iris dataset and although I've gotten svm to work from the e1071 library, I keep getting a 'variable lengths differ' error when I attempt to make tune work:
library(e1071)
data <- data.frame(iris$Sepal.Width,iris$Petal.Length,iris$Species)
svm_tr <- data[sample(nrow(datasvm), 100), ] #sample 100 random rows
tuned <- tune(svm, svm_tr$iris.Species~.,
data = svm_tr[1:2],
kernel = "linear",
ranges = list(cost=c(.001,.01,.1,1,10,100)))
I have checked the lengths of each of the columns in svm_tr[1:2] and they are the same length. I know the function doesn't take a dataframe directly but maybe I'm missing something?
I can get it to work with:
tune(svm, iris.Species ~ ., data = svm_tr[1:3],
kernel = "linear", ranges = list(cost=c(.001,.01,.1,1,10,100)))
If it's a formula interface you shouldn't be referring to a variable by using $ as all the required variables are sourced from the object specified by the data= argument. Note that I've also made data=svm_tr[1:3] instead of 1:2 so that the iris.Species column is included.

Resources