Plotting C5.0 Tree in R - r

I am trying to plot a C5.0 object tree in R but it is giving the following error and I can't seem to find out how to fix it.
plot(model)
Error in partysplit(varid = as.integer(i), index = index, info = k, prob = NULL) :
minimum of ‘index’ is not equal to 1
In addition: Warning message:
In min(index, na.rm = TRUE) :
no non-missing arguments to min; returning Inf

It seems that the factors in your data frame contain spaces. I was facing the same issue, then I removed spaces from them and now it works.
for example, if a variable has factors " bad" and " good" then change them to "bad" and "good".
"The error itself is due to NA values being passed in the index vector. The root cause is probably that the factor levels are being split on spaces" Found here https://github.com/topepo/C5.0/issues/10

try this
library(rattle)
fancyRpartPlot(model)

Related

Error in cor.test.default 'x' and 'y' must have the same length (Spearman’s Rank-Order Correlation)

I'm trying to test for correlation between x and y of my data using Spearman Rank-Order Correlation in R but encountered the following error:
Error in cor.test.default(x = Female_ChLO, y = Female_TBL, method = "spearman") :
'x' and 'y' must have the same length
This data "Female_ChLO" had an outlier removed. When tested on the data before removing the outlier, I didn't encounter this error message.
The data does have a lot of NA but they are vital to the test and I'm trying to include na.rm=T but have no idea how to. Would love to hear suggestions but not too complicated please as I'm new to R.

Problems with function MatchIt::matchit

Hi I'd like to do a logistical regression by adjusting on the propensity score. But first I'd like to match treaty and non-treaty according to propensity scores. Here's my first script:
mod_match<-matchit(Treatment~Prop.score, method = "nearest", data = Epidemio.prop,caliper = 0.05)
Here are the error messages
Error in matchit(Treatment~Prop.score, method = "nearest", data =
Epidemio.prop, : Missing values exist in the data
I have therefore removed from the model all other variables except the two variables of interest that have no missing data.
mod_match<-matchit(Treatment~Prop.score,
method = "nearest", data = Epidemio.prop[c("Treatment","Prop.score")],
caliper = 0.1)
I still have error messages.
Error in weights.matrix(match.matrix, treat, discarded) : No units
were matched In addition: Warning messages:
1: In max(pscore[treat == 0]) : no non-missing arguments to max;
returning -Inf
2: In max(pscore[treat == 1]) : no non-missing arguments to max;
returning -Inf
3: In min(pscore[treat == 0]) : no non-missing arguments to min;
returning Inf
4: In min(pscore[treat == 1]) : no non-missing arguments to min;
returning Inf
The problem is that you are not giving any variables to be used in the propensity score calculation (i.e., you are only giving Treatment and Prop.score, whose meaning is not clear to me).
You need to pass a set of auxiliary variables that are going to be used to fit the model predicting propensity scores.
Also, from my experience using MatchIt, it will throw an error related to missing values no matter the missingness is not related to the variables included in the model.
I recommend you create an auxiliary data frame with the variables you want to use in the model, and delete (or impute) any of the observations with missing values in any of those variables.
Something like this:
vars_to_keep <- c("Treatment", "x1", "x2", "x3", ... )
aux_df <- df[vars_to_keep]
# Select only complete cases (i.e. drop observations with at least one missing)
aux_df <- aux_df[complete.cases(aux_df), ]
mod_match <- matchit(Treatment ~ x1 + x2 + x3 + ..., method = "nearest", data = aux_df)
Nevertheless, this tutorial is a much more comprehensive help. I recommend having a look at it.
Good luck!

Error in plotting a C5.0 decision tree in R

I'm trying to plot my decision tree model.
cnt_c50 <- C5.0Control(CF=0.25,minCases=30,sample=0.7)
myTree <- C5.0(Y ~ X1+X2+X3+....+X30, data=data, control= cnt_c50,trials=100)
summary(myTree)
When I run summary(myTree) there is no warning, no error, everything is fine.
But when I want to visualize it : plot(myTree) , I have this error message :
Error in partysplit(varid = as.integer(i), index = index, info = k,
prob = NULL) : minimum of ‘index’ is not equal to 1 In addition:
Warning message: In min(index, na.rm = TRUE) : no non-missing
arguments to min; returning Inf
I tried to define which tree I want to visualize with plot(myTree,trial=9), and it returns this message :
Error in !all.equal(diff(sort(unique(index))), rep(1, max(index, na.rm
= TRUE) - : invalid argument type
When I try with trials=1 instead of trials=100, I have this message :
Error in if (!n.cat[i]) { : missing value where TRUE/FALSE needed
I also tried this because I saw it in a response on this website but it seems to be for CART Trees :
library(rattle)
fancyRpartplot(myTree)
and I have this error :
Error in if (model$method == "class") { : argument is of length zero
I read that it could be due to a lack of values in a category (for example a category with only 1 or 2 people) so I ran without some variables with small categories but it was the same.
I also tried this function http://r-project-thanos.blogspot.fr/2014/09/plot-c50-decision-trees-in-r.html but I am not a good programmer and I had errors too...
Did anyone have the same problem and can help me ?
Thanks a lot

Will only plot Factors?

Im working in my new Data set and I always start it with
options(StringsAsFactors = FALSE)
The problem im having now, is that R will only plot the Data I set if the strings as factors options is set to TRUE.
Whenever I try to plot with Stringsasfactors = FALSE it will give me the next Error message.
plot(Data$Jobs, Data$RXH)
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
But when I set Stringsasfactors TRUE it plots it without problem...
This is the script.
#Setting WD.
getwd()
setwd("C:/Windows/System32/config/systemprofile/Documents/R proj")
options(stringsAsFactors = F)
get <- read.csv("WorkExcelR.csv", header = TRUE, sep = ",")
Data <- na.omit(get)
And this is Data$Jobs and Data$RXH
> Data$Jobs
[1] "Playstation" "RWC Heineken" "Jagermeister" "RWC Heineken"
[5] "RWC Heineken" "RWC Heineken"
> Data$RXH
[1] 90 90 100 90 90 90
The problem you are illustrating stems from the fact that there is a plot.factor function but no plot.character function. You can see the available plot.-methods by typing:
methods(plot)
This is not particularly well described in the help page for ?plot, but there is a separate help page for ?plot.factor. Functions in R are dispatched on the basis of their arguments: S3 functions on the basis only of the class of their first argument and S4 methods on the basis of their argument signatures. In a sense the plot.factor function elaborates on that strategy, because it then dispatches to different plotting routines based on the second argument's class as well, assuming it is matched by position or named y.
You have a couple of choices: Force the plot method which then needs to be caled using the ::: infix function since plot.factor is not exported or do the coercion yourself or call a more specific plotting type.
graphics:::plot.factor(Data$Jobs, Dat
plot(factor(Data$Jobs), Data$RXH)
boxplot(Data$RXH ~Data$Jobs) # which is the result if x is factor and y is numeric

Screeplot in R with psych package

I have computed a PCA with the principal function in the psych package in R. I would like to build a screeplot from the eigenvalues, but both scree(PCA) and screeplot(PCA) give me errors and no plot. Is there a function within this package that I'm not aware of (I have very, very little R experience)??
NOTE: I've been simply working in the command line.
Error for scree(PCA):
Error in if (nvar != dim(rx)[1]) { : argument is of length zero
Error for screeplot(PCA):
Error in plot.window(xlim, ylim, log = log, ...) :
need finite 'xlim' values
In addition: Warning messages:
1: In min(w.l) : no non-missing arguments to min; returning Inf
2: In max(w.r) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Without data it is hard for us to check this. The error message looks like the data is empty.
Here are some tips for R beginners.
Try get help on scree function. Are you missing a parameter? Type in command line.
help(scree)
Look at your variable PCA
head(PCA) - shows first few rows of your data
str(PCA) - shows structure of the variable. Is it what scree function is expecting?
Do you have missing values or text values in your data? The function may be thrown out by these. You can drop missing data - take a look at complete.cases. is.na() is how you check for NA values (i.e. if I wanted to check for NAs in variable mydata, sum(is.na(mydata)) would tell me how many I have. Drop those rows and see if that gets your scree function working okay.
Take a look at the vignette for the package:
https://cran.r-project.org/web/packages/psych/vignettes/overview.pdf
Hope this gets you on track.
Did you enter a correlation matrix as your input to the scree( ) function?
Using my own data, I was able to generate a scree plot with the following two lines of code:
humor_cor <- cor(humor, use = "pairwise.complete.obs")
scree(humor_cor, factors = FALSE)

Resources