library(tree)
set.seed(1)
train=1:80
sum(is.na(sobija1))
hist(sobija1$CCSI)
sss=ifelse(sobija1$CCSI<=100, "negative","positive" )
sss=as.factor(sss)
sobija1=data.frame(sobija1,sss)
Tree_Class=tree(sss~sobija1$unemployment_rate+sobija1$house_pirce_index,sobija1,subset=train)
print(summary(Tree_Class))
plot(Tree_Class)
text(Tree_Class, pretty=0, cex=0.75)
cat("\n Confusion table for classification trees \n")
print(table(predict(Tree_Class,newdata = sobija1[-train,],type = "class"), sss[-train]))
> print(table(predict(Tree_Class,newdata = sobija1[-train,],type = "class"), sss[-train]))
Error in table(predict(Tree_Class, newdata = sobija1[-train, ], type = "class"), :
all arguments must have the same length
In addition: Warning message:
'newdata' had 41 rows but variables found have 121 rows
I tried to make a decision tree model, and was trying to make a matrix to check the error rate, but this error came up and now I have no idea how fix it.
Related
I've recently been attempting to evaluate output from k-modes (a cluster label), relative to a so-called True cluster label (labelled 'class' below).
In other words: I've been attempting to external validate the clustering output. However, when I tried external validation measures from the 'fpc' package, I was unsuccessful (error term posted below script).
I've attached my code for the mushroom dataset. I would appreciate if anyone could show me how to successful execute these external validation measures in the context of categorical data.
Any help appreciated.
# LIBRARIES
install.packages('klaR')
install.packages('fpc')
library(klaR)
library(fpc)
#MUSHROOM DATA
mushrooms <- read.csv(file = "https://raw.githubusercontent.com/miachen410/Mushrooms/master/mushrooms.csv", header = FALSE)
names(mushrooms) <- c("edibility", "cap-shape", "cap-surface", "cap-color",
"bruises", "odor", "gill-attachment", "gill-spacing",
"gill-size", "gill-color", "stalk-shape", "stalk-root",
"stalk-surface-above-ring", "stalk-surface-below-ring",
"stalk-color-above-ring", "stalk-color-below-ring", "veil-type",
"veil-color", "ring-number", "ring-type", "spore-print-color",
"population", "habitat")
names(mushrooms)[names(mushrooms)=="edibility"] <- "class"
indexes <- apply(mushrooms, 2, function(x) any(is.na(x) | is.infinite(x)))
colnames(mushrooms)[indexes]
table(mushrooms$class)
str(mushrooms)
#REMOVING CLASS VARIABLE
mushroom.df <- subset(mushrooms, select = -c(class))
#KMODES ANALYSIS
result.kmode <- kmodes(mushroom.df, 2, iter.max = 50, weighted = FALSE)
#EXTERNAL VALIDATION ATTEMPT
mushrooms$class <- as.factor(mushrooms$class)
class <- as.numeric(mushrooms$class))
clust_stats <- cluster.stats(d = dist(mushroom.df),
class, result.kmode$cluster)
#ERROR TERM
Error in silhouette.default(clustering, dmatrix = dmat) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In dist(mushroom.df) : NAs introduced by coercion
I got a problem training SVMLinear with caret. The data works just fine with SVMRadial though.
The data is accessible via (29/05/2016):
https://www.dropbox.com/s/ia2vc25uhxdgqn1/projetTest01.txt?dl=0
(8000 lines of 1021 variables, ~10% target)
Here's the code:
projetTest01<-read.table("projetTest01.txt", sep="\t")
Test01<-list(data=projetTest01[,-c(2,3)],label=projetTest01[,3])
Test01N<-Test01
Test01N$label<-as.factor(Test01$label)
levels(Test01N$label)[levels(Test01N$label)=="0"] <- "No"
levels(Test01N$label)[levels(Test01N$label)=="1"] <- "Yes"
temp<-as.matrix(Test01$data)
storage.mode(temp) <- "numeric" #I need 'num' type
Test01N$data<-as.data.frame(temp)
svmTuneGrid_L <- data.frame(.C = 2^(-2:7))
trControl_SVML<-trainControl(method = "repeatedcv", repeats = 3, classProbs = TRUE)
svmFit_Lin <- train(Test01N$label ~ ., data = Test01N$data,method = "svmLinear",preProc = c("center", "scale"), tuneGrid = svmTuneGrid_L,trControl = trControl_SVML)
And I got these messages:
line search fails [..]
Warning in method$predict(modelFit = modelFit, newdata = newdata, submodels = param) :
kernlab class prediction calculations failed; returning NAs
Warning in data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
I looked up the site/the web for some answers, but
the levels aren't numeric (=yes/no)
the ClassProb is set to TRUE
the labels can't be predicted perfectly from another variable (I know this from other algorithms)
there isn't a empty class
preproc(scale) or not doesn't make a difference
And the data works just fine with SVMRadial!!
I use caret 6.0-68
I really am at a loss. An idea someone?
library(nnet)
set.seed(9850)
train1<- sample(1:155,110)
test1 <- setdiff(1:110,train1)
ideal <- class.ind(hepatitis$class)
hepatitisANN = nnet(hepatitis[train1,-20], ideal[train1,], size=10, softmax=TRUE)
j <- predict(hepatitisANN, hepatitis[test1,-20], type="class")
hepatitis[test1,]$class
table(predict(hepatitisANN, hepatitis[test1,-20], type="class"),hepatitis[test1,]$class)
confusionMatrix(hepatitis[test1,]$class, j)
Error:
Error in nnet.default(hepatitis[train1, -20], ideal[train1, ], size = 10, :
NA/NaN/Inf in foreign function call (arg 2)
In addition: Warning message:
In nnet.default(hepatitis[train1, -20], ideal[train1, ], size = 10, :
NAs introduced by coercion
hepatitis variable consists of the hepatitis dataset available on UCI.
This error message is because you have character values in your data.
Try reading the hepatitis dataset with na.strings = "?". This is defined in the description of the dataset on the uci page.
headers <- c("Class","AGE","SEX","STEROID","ANTIVIRALS","FATIGUE","MALAISE","ANOREXIA","LIVER BIG","LIVER FIRM","SPLEEN PALPABLE","SPIDERS","ASCITES","VARICES","BILIRUBIN","ALK PHOSPHATE","SGOT","ALBUMIN","PROTIME","HISTOLOGY")
hepatitis <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/hepatitis/hepatitis.data", header = FALSE, na.strings = "?")
names(hepatitis) <- headers
library(nnet)
set.seed(9850)
train1<- sample(1:155,110)
test1 <- setdiff(1:110,train1)
ideal <- class.ind(hepatitis$Class)
# will give error due to missing values
# 1st column of hepatitis dataset is the class variable
hepatitisANN <- nnet(hepatitis[train1,-1], ideal[train1,], size=10, softmax=TRUE)
This code will not give your error, but it will give an error on missing values. You will need to do address those before you can continue.
Also be aware that the class variable is the first variable in the dataset straight from the UCI data repository
Edit based on comments:
The na.action only works if you use the formula notation of nnet.
So in your case:
hepatitisANN <- nnet(class.ind(Class)~., hepatitis[train1,], size=10, softmax=TRUE, na.action = na.omit)
I'm working the npreg example in the R np package documentation (by T. Hayfield, J. Racine), section 3.1 Univariate Regression.
library("np")
data("cps71")
model.par = lm(logwage~age + I(age^2),data=cps71)
summary(model.par)
#
attach(cps71)
bw = npregbw(logwage~age) # thislne not in example 3.1
model.np = npreg(logwage~age,regtype="ll", bwmethod="cv.aic",gradients="TRUE",
+ data=cps71)
This copied directly from the example, but the npreg call results in error message
*Rerun with Debug
Error in npreg.rbandwidth(txdat = txdat, tydat = tydat, bws = bws, ...) :
NAs in foreign function call (arg 15)
In addition: Warning message:
In npreg.rbandwidth(txdat = txdat, tydat = tydat, bws = bws, ...) :
NAs introduced by coercion*
The npreg R documentation indicates the first argument should be BW specificaion. I tried setting bws=1
model.np = npreg(bws=1,logwage~age,regtype="ll",
+ bwmethod="cv.aic",gradients="TRUE", data=cps71)
which gives the following error
*Error in toFrame(xdat) :
xdat must be a data frame, matrix, vector, or factor*
First time working with density estimation in R. Please suggest how to resolve these errors.
I`m using RStudio v 0.97. I want to get color scatterplot matrix, here is my code:
dt<- impact[c(3,4,7,8)]
dt.r <- cor(dt)
dt.color <- dmat.color(dt.r)
dt.order<- order.single(dt.r)
cpairs(dt, dt.order, panel.controls = dt.color, main= "Scatterplots")
But my output is black&white scatterplot and warning: "There were 50 or more warnings (use warnings() to see the first 50)"
How to fix this?
Read the help for cpairs:
Usage:
cpairs(data, order = NULL,
panel.colors = NULL, border.color = "grey70", show.points = TRUE, ...)
The parameter is panel.colors not panel.controls.
The warning is a clue - did you read the warning?
Warning messages:
1: In plot.window(...) : "panel.controls" is not a graphical parameter