pred <- predict(fit, x, type="response", s=cv$lambda.min)
confusion_matrix <- confusionMatrix(data = pred, reference = testXsp)
Error in confusionMatrix.matrix(data = pred, reference = testXsp) :
matrix must have equal dimensions
dim(pred)
[1] 751864 1
dim(testXsp)
[1] 751864 1
dim(testXsp) == dim(pred)
[1] TRUE TRUE
The dimensions seem to be the same, then why am I getting this error message?
confusionMatrix argument data must be square if it is a matrix.
> caret:::confusionMatrix.matrix
function (data, positive = NULL, prevalence = NULL, mode = "sens_spec",
...)
{
if (length(unique(dim(data))) != 1) {
stop("matrix must have equal dimensions")
}
classTable <- as.table(data, ...)
confusionMatrix(classTable, positive, prevalence = prevalence,
mode = mode)
}
<bytecode: 0x126452f88>
<environment: namespace:caret>
Note that the method for class matrix does not even take a reference argument. It is the default method that uses reference. Perhaps you should review the help page for confusionMatrix?
One possibility here is that there are one or more NA values contained in your prediction matrix. Try using the following command:
na.omit(pred)
Afterwards, rerun the above code. If this does not work, please post the package you are using to fit your model. This will allow for a more detailed solution!
Best wishes,
-Matt
I am trying to use the NaiveBayes function on a training and test set of data. I am using this helpful website: https://rpubs.com/riazakhan94/naive_bayes_classifier_e1071
However, for some reason it is not working and this is error that I am getting:" Error in table(train$Class, trainPred) : all arguments must have the same length. "
Here is the code that I am using, I am guessing its a super simple fix. The x and y columns of the data set are predicting on the class column:
https://github.com/samuelc12359/NaiveBayes.git
test <- read.csv(file="TestX.csv",header=FALSE)
train <- read.csv(file="TrainX.csv",header=FALSE)
Names <- c("x","y","Class")
colnames(test)<- Names
colnames(train)<- Names
NBclassfier=naiveBayes(Class~x+y, data=train)
print(NBclassfier)
trainPred=predict(NBclassfier,train, type="class")
trainTable=table(train$Class, trainPred)
testPred=predict(NBclassfier, newdata=test, type="class")
testTable=table(test$Class, testPred)
print(trainTable)
print(testTable)
You need to turn the Class column into factors, e.g. like this:
train$Class = factor(train$Class)
test$Class = factor(test$Class)
Then when you call naiveBayes() to train, and later to predict, it will do what you expect.
Alternatively, you can change prediction type to "raw" and turn them into outcomes directly. E.g. like this:
train_predictions = predict(NBclassfier,train, type="raw")
trainPred = 1 * (train_predictions[, 2] >= 0.5 )
trainTable=table(train$Class, trainPred)
test_predictions = predict(NBclassfier, newdata=test, type="raw")
testPred = 1 * (test_predictions[, 2] >= 0.5 )
testTable=table(test$Class, testPred)
print(trainTable)
print(testTable)
I could really use some help.
I am trying to use crossvalidation technique to find the best model. I used the reference code from this website.
https://github.com/asadoughi/stat-learning/blob/master/ch6/lab.R
See line 63 onwards.
I used the same code for another data set and everything worked fine. When I use this new dataset I am getting an error
Error in plot.window(...) : need finite 'ylim' values
The error appears when I try to plot(mean.cv.errors). I saw that the problem is appearing before the plot function. Mean cv errors are not being calculated. For different predictors I get mean cv errors as "NAN"
1 2 3 4 5 6 7 8
NaN NaN NaN NaN NaN NaN NaN NaN
Any body have any clue whats going on? I removed NA from data. And I am at a total loss for what may be going on, since the same exact code worked on another data set.
Here is the structure of the data
structure(list(tricepind = c(-0.174723355, -0.012222222, -0.197554452,
-0.042844901, -0.288806432, -0.340831629, -0.07727797, -0.016715831,
0.032448378, 0.223333333, -0.234205488, 0.152073733, 0.1, 0.066666667,
-0.09843684), mkcal = c(1451.990902, 1820.887165, 2025.580287,
1522.201067, 1296.587413, 936.4362557, 2626.190579, 1257.284695,
1583.382929, 1736.695, 1964.600102, 3557.202041, 1682.712691,
2025.962999, 2286.300483), mprot = c(82.15660833, 79.896551,
70.76528433, 68.026405, 40.859294, 45.39550133, 96.65918833,
82.80520367, 82.48830233, 76.22586667, 92.65016433, 164.821377,
67.04030333, 82.30652767, 59.10089967), mcarb = c(144.6609883,
207.803092, 301.791884, 154.252719, 192.215434, 125.836917, 326.8027877,
117.3693597, 151.8666383, 226.6798, 246.8333723, 455.0111473,
217.4003043, 209.0277287, 254.0715917), mtfat = c(64.452471,
73.34697467, 37.79965033, 72.50962033, 38.87718467, 31.354984,
111.493208, 56.441886, 73.22733933, 56.61331667, 67.261771, 121.9704157,
55.08478833, 94.518705, 100.8741383), PC1 = c(-0.447910206, -0.294634121,
-1.104462969, -0.547207734, -1.954444086, -2.196982329, 2.746913539,
-1.023090581, -0.764200454, -0.584591205, 0.77843409, 5.614654485,
-0.999691479, 0.279942766, 0.896578187), PC2 = c(-0.642332236,
0.049369806, -0.216059532, 1.160722893, 1.078477828, -0.150613681,
1.895259257, -1.909344827, 1.644354816, 1.614658854, 0.433529118,
-1.669928792, -0.560657387, -1.145066836, 1.866870422), PC3 = c(-0.451625917,
-0.772244866, 1.06416389, -0.408526673, 0.337918493, -0.254740649,
1.480378587, 0.583072925, -1.619576656, -1.637944088, -0.430379578,
-0.512822799, 2.018634475, 0.26331773, 3.128258848), PC4 = c(-0.968856054,
0.16683708, 0.914246075, -0.219132873, 0.670302106, 0.368790712,
0.642579887, -1.921774612, 0.016672151, 1.765303371, 0.683175144,
0.884292702, -0.388954363, -1.532636673, -1.199798116)), class = "data.frame", row.names = c(NA,
-15L))
Here is the code I worked with.
usdtricepby6predictors <- read.csv("usdtricepby8predictors2.csv", header = TRUE, na.strings = ".", stringsAsFactors = FALSE)
usdtricepby6predictors <- na.omit(usdtricepby6predictors)
usdtricepby6predictors <- sapply(usdtricepby6predictors, as.numeric)
usdtricepby6predictors <- as.data.frame(usdtricepby6predictors)
predict.regsubsets=function(object,newdata,id,...){ #predic function
form=as.formula(object$call[[2]]) ## extract formula
mat=model.matrix(form,newdata)
coefi=coef(object,id=id)
xvars=names(coefi)
mat[,xvars]%*%coefi
}
k=10
set.seed(1)
folds=sample(1:k,nrow(usdtricepby6predictors),replace=TRUE)
cv.errors=matrix(NA,k,8, dimnames=list(NULL, paste(1:8)))
for(j in 1:k) {
best.fit=regsubsets(tricepind~.,data=usdtricepby6predictors[folds!=j,], nvmax=8)
for(i in 1:8){
pred=predict(best.fit,usdtricepby6predictors[folds==j,],id=i)
cv.errors[j,i]=mean((usdtricepby6predictors$tricepind[folds==j]-pred)^2)
}
}
mean.cv.errors=apply(cv.errors,2,mean)
mean.cv.errors
par(mfrow=c(1,1))
plot(mean.cv.errors,type='b')
points(which.min(mean.cv.errors),mean.cv.errors[which.min(mean.cv.errors)],
col="red",cex=2,pch=20)
reg.best=regsubsets(tricepind~.,data=usdtricepby6predictors,nvmax=8)
coef(reg.best,which.min(mean.cv.errors))
I am a new user of R and trying to use mRMRe R package (mRMR is one of the good and well known feature selection approaches) to obtain feature subset from a feature set. Please excuse if my question is simple as I really want to know how I can fix an error. Below is the detail.
Suppose, I have a csv file (gene.csv) having feature set of 6 attributes ([G1.1.1.1], [G1.1.1.2], [G1.1.1.3], [G1.1.1.4], [G1.1.1.5], [G1.1.1.6]) and a target class variable [Output] ('1' indicates positive class and '-1' stands for negative class). Here's a sample gene.csv file:
[G1.1.1.1] [G1.1.1.2] [G1.1.1.3] [G1.1.1.4] [G1.1.1.5] [G1.1.1.6] [Output]
11.688312 0.974026 4.87013 7.142857 3.571429 10.064935 -1
12.538226 1.223242 3.669725 6.116208 3.363914 9.174312 1
10.791367 0.719424 6.115108 6.47482 3.597122 10.791367 -1
13.533835 0.37594 6.766917 7.142857 2.631579 10.902256 1
9.737828 2.247191 5.992509 5.992509 2.996255 8.614232 -1
11.864407 0.564972 7.344633 4.519774 3.389831 7.909605 -1
11.931818 0 7.386364 5.113636 3.409091 6.818182 1
16.666667 0.333333 7.333333 4.333333 2 8.333333 -1
I am trying to get best feature subset of 2 attributes (out of above 6 attributes) and wrote following R code.
library(mRMRe)
file_n<-paste0("E:\\gene", ".csv")
df <- read.csv(file_n, header = TRUE)
f_data <- mRMR.data(data = data.frame(df))
featureData(f_data)
mRMR.ensemble(data = f_data, target_indices = 7,
feature_count = 2, solution_count = 1)
When I run this code, I am getting following error for the statement f_data <- mRMR.data(data = data.frame(df)):
Error in .local(.Object, ...) :
data columns must be either of numeric, ordered factor or Surv type
However, data in each column of the csv file are real number.So, how can I change the R code to fix this problem? Also, I am not sure what should be the value of target_indices in the statement mRMR.ensemble(data = f_data, target_indices = 7,feature_count = 2, solution_count = 1) as my target class variable name is "[Output]" in the gene.csv file.
I will appreciate much if anyone can help me to obtain the best feature subset based on the gene.csv file using mRMRe R package.
I solved the problem by modifying my code as follows.
library(mRMRe)
file_n<-paste0("E:\\gene", ".csv")
df <- read.csv(file_n, header = TRUE)
df[[7]] <- as.numeric(df[[7]])
f_data <- mRMR.data(data = data.frame(df))
results <- mRMR.classic("mRMRe.Filter", data = f_data, target_indices = 7,
feature_count = 2)
solutions(results)
It worked fine. The output of the code gives the indices of the selected 2 features.
I think it has to do with your Output column which is probably of class integer. You can check that using class(df[[7]]).
To convert it to numeric as required by the warning, just type:
df[[7]] <- as.numeric(df[[7]])
That worked for me.
As for the other question, after reading the documentation, setting target_indices = 7 seems the right choice.