How to prepare variables for nnet classification/predict in R? - r

In the classification I use the variable x as the value and y as the labels. As here in the example for randomForest:
iris_train_values <- iris[,c(1:4)]
iris_train_labels <- iris[,5]
model_RF <- randomForest(x = iris_train_values, y = iris_train_labels, importance = TRUE,
replace = TRUE, mtry = 4, ntree = 500, na.action=na.omit,
do.trace = 100, type = "classification")
This solution works for many classifiers, however when I try to do it in nnet and get error:
model_nnet <- nnet(x = iris_train_values, y = iris_train_labels, size = 1, decay = 0.1)
Error in nnet.default(x = iris_train_values, y = iris_train_labels, size = 1, :
NA/NaN/Inf in foreign function call (arg 2)
In addition: Warning message:
In nnet.default(x = iris_train_values, y = iris_train_labels, size = 1, :
NAs introduced by coercion
Or on another data set gets an error:
Error in y - tmp : non-numeric argument to binary operator
How should I change the variables to classify?

The formula syntax works:
library(nnet)
model_nnet <- nnet(Species ~ ., data = iris, size = 1)
But the matrix syntax does not:
nnet::nnet(x = iris_train_values, y = as.matrix(iris_train_labels), size = 1)
I don't understand why this doesn't work, but at least there is a work around.
predict works fine with the formula syntax:
?predict.nnet
predict(model_nnet,
iris[c(1,51,101), 1:4],
type = "class") # true classese are ['setosa', 'versicolor', 'virginica']

Related

R: passing a formula into an function as the first input

library(RSSL)
set.seed(1)
df <- generateSlicedCookie(1000,expected=FALSE) %>%
add_missinglabels_mar(Class~.,0.98)
class_erlr <- EntropyRegularizedLogisticRegression(Class ~., df, lambda=0.01,lambda_entropy = 100)
In the EntropyRegularizedLogisticRegression function from the RSSL package, the example in the documentation passed in the formula Class ~. as the input. I was looking at the source code, and these are the parameters for the function
function (X, y, X_u = NULL, lambda = 0, lambda_entropy = 1, intercept = TRUE,
init = NA, scale = FALSE, x_center = FALSE)
I tried manually defining what X, y, X_u are based on the df I generated. But running the following gives me an error with the optimization:
y <- df$Class
X <- df[, -1]
ids <- which(is.na(y))
X_u <- X[ids, ]
class_erlr_manual <- EntropyRegularizedLogisticRegression(X = X, y = y, X_u = X_u, lambda=0.01,lambda_entropy = 100)
The error reads:
Error in optim(w, fn = loss_erlr, gr = grad_erlr, X, y, X_u, lambda = lambda, :
initial value in 'vmmin' is not finite
Why does changing the formula input Class ~. into X=X, y =y, X_u = X_u result in an error? Can anyone point me to where in the source code the formula input is being used?

Train function in caret returns error message

I am using caret train() function to find an optimal cp value for a CART decision tree adopting as metric the F1 through a custom function. The train() function returns an error I can not understand. Perhaps the problem lies in the way I define the reproducible example?
> library(data.table)
> library(ROSE)
> data(hacide)
> train <- hacide.train
> test <- hacide.test
> numFolds = trainControl(method = "cv" , number = 10)
> cpGrid = expand.grid(.cp = seq(0.01, 0.5, 0.01))
> f1 <- function(data, lev = NULL, model = NULL) {
+ f1_val <- F1_Score(y_pred = data$pred, y_true = data$obs, positive = lev[1])
+ c(F1 = f1_val)
+ }
> set.seed(12)
> train(cls ~ ., data = train,
+ method = "rpart",
+ tuneLength = 5,
+ metric = "F1",
+ trControl = trainControl(summaryFunction = f1,
+ classProbs = TRUE))
Error in train.default(x, y, weights = w, ...) :
At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to X0, X1 . Please use factor levels that can be used as valid R variable names (see ?make.names for help).
> levels(train$cls)
[1] "0" "1"
> class(train$cls)
[1] "factor"
You can try this :
levels(train$cls) <- make.names(levels(train$cls))
Then run your model this should fix your problem, Unfortunately your example is not reproducible as you missed out F1_Score function definition in your question. See if this works.
The below is working for me:
levels(train$cls) <- make.names(levels(train$cls))
set.seed(12)
train(cls ~ ., data = train,method = "rpart",tuneLength = 5,
metric = "ROC", trControl = trainControl(summaryFunction = twoClassSummary, classProbs = TRUE))

Error with tuning custom SVM model in caret

I'm having trouble with my custom training model in the caret package. I need to do a SVM regression and I want to find all the parameters of the SVM model - cost, sigma and epsilon. The built-in version has only cost and sigma. I have already found quite a helpful tip here and here but my model still does not work.
Error in models$grid(x = x, y = y, len = tuneLength, search = trControl$search) :
unused argument (search = trControl$search)
This error is the one I am getting and my code is here.
SVMrbf <- list(type = "Regression", library = "kernlab", loop = NULL)
prmrbf <- data.frame(parameters = data.frame(parameter = c('sigma', 'C', 'epsilon'),
class = c("numeric", "numeric", "numeric"),
label = c('Sigma', "Cost", "epsilon")))
SVMrbf$parameters <- prmrbf
svmGridrbf <- function(x, y, len = NULL) {
library(kernlab)
sigmas <- sigest(as.matrix(x), na.action = na.omit, scaled = TRUE, frac = 1)
expand.grid(sigma = mean(sigmas[-2]), epsilon = 10^(-5:0),
C = 2 ^(-5:len)) # len = tuneLength in train
}
SVMrbf$grid <- svmGridrbf
svmFitrbf <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
ksvm(x = as.matrix(x), y = y,
type = "eps-svr",
kernel = "rbfdot",
sigma = param$sigma,
C = param$C, epsilon = param$epsilon,
prob.model = classProbs,
...)
}
SVMrbf$fit <- svmFitrbf
svmPredrbf <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
predict(modelFit, newdata)
SVMrbf$predict <- svmPredrbf
svmProb <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
predict(modelFit, newdata, type="probabilities")
SVMrbf$prob <- svmProb
svmSortrbf <- function(x) x[order(x$C), ]
SVMrbf$sort <- svmSortrbf
svmRbfFit <- train(x = train.predictors1, y = train.response1, method = SVMrbf,
tuneLength = 10)
svmRbfFit
I could not find anyone, who had the same error and have no clue what is actually wrong. This code is pretty much just something I found online and slightly altered.
BTW this is my first post, so hopefully it's understandable, if not I can add additional info.
The solution is to include an argument search into your grid function, for example with
svmGridrbf <- function(x, y, len = NULL, search = "grid") {
library(kernlab)
sigmas <- sigest(as.matrix(x), na.action = na.omit, scaled = TRUE, frac = 1)
expand.grid(sigma = mean(sigmas[-2]), epsilon = 10^(-5:0), C = 2 ^(-5:len)) # len = tuneLength in train
}
If you look at the caret documentation for custom functions carefully, you'll see that caret wants you to specify how to select default parameters in case the user wants to do grid search and in case she wants to do random search (see "the grid element").
The error message tells you that caret passes an argument to the function which is not actually defined as an argument for that function.
This is probably easier to see here:
sd(x = c(1,2,3), a = 2)
# Error in sd(x = c(1, 2, 3), a = 2) : unused argument (a = 2)

GBM and Caret package: invalid number of intervals

Though I am defining that target <- factor(train$target, levels = c(0, 1)), the below-given code provides this error:
Error in cut.default(y, unique(quantile(y, probs = seq(0, 1, length =
cuts))), : invalid number of intervals In addition: Warning
messages: 1: In train.default(x, y, weights = w, ...) : cannnot
compute class probabilities for regression
What does it mean and how to fix this?
gbmGrid <- expand.grid(n.trees = (1:30)*10,
interaction.depth = c(1, 5, 9),
shrinkage = 0.1)
fitControl <- trainControl(method = "repeatedcv",
number = 5,
repeats = 5,
verboseIter = FALSE,
returnResamp = "all",
classProbs = TRUE)
target <- factor(train$target, levels = c(0, 1))
gbm <- caret::train(target ~ .,
data = train,
#distribution="gaussian",
method = "gbm",
trControl = fitControl,
tuneGrid = gbmGrid)
prob = predict(gbm, newdata=testing, type='prob')[,2]
First, don't do this:
target <- factor(train$target, levels = c(0, 1))
You will get an warning:
At least one of the class levels are not valid R variables names; This may cause errors if class probabilities are generated because the variables names will be converted to: X0, X1
Second, you created an object called target. Using the formula method means that train will use the column called target in the data frame train and those are different data. Modify the column.

Error with custom SVM model for tuning in caret

I'm trying to follow this link to create a custom SVM and run it through some cross-validations. My primary reason for this is to run Sigma, Cost and Epsilon parameters in my grid-search and the closest caret model (svmRadial) can only do two of those.
When I attempt to run the code below, I get the following error all over the place at every iteration of my grid:
Warning in eval(expr, envir, enclos) :
model fit failed for Fold1.: sigma=0.2, C=2, epsilon=0.1 Error in if (!isS4(modelFit) & !(method$label %in% c("Ensemble Partial Least Squares Regression", :
argument is of length zero
Even when I replicate the code from the link verbatim, I get a similar error and I'm not sure how to solve it. I found this link which goes through how the custom models are built and I see where this error is referenced, but still not sure what the issue is. I have my code below:
#Generate Tuning Criteria across Parameters
C <- c(1,2)
sigma <- c(0.1,.2)
epsilon <- c(0.1,.2)
grid <- data.frame(C,sigma)
#Parameters
prm <- data.frame(parameter = c("C", "sigma","epsilon"),
class = rep("numeric", 3),
label = c("Cost", "Sigma", "Epsilon"))
#Tuning Grid
svmGrid <- function(x, y, len = NULL) {
expand.grid(sigma = sigma,
C = C,
epsilon = epsilon)
}
#Fit Element Function
svmFit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
ksvm(x = as.matrix(x), y = y,
type = "eps-svr",
kernel = rbfdot,
kpar = list(sigma = param$sigma),
C = param$C,
epsilon = param$epsilon,
prob.model = classProbs,
...)
}
#Predict Element Function
svmPred <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
predict(modelFit, newdata)
#Sort Element Function
svmSort <- function(x) x[order(x$C),]
#Model
newSVM <- list(type="Regression",
library="kernlab",
loop = NULL,
parameters = prm,
grid = svmGrid,
fit = svmFit,
predict = svmPred,
prob = NULL,
sort = svmSort,
levels = NULL)
#Train
tc<-trainControl("repeatedcv",number=2, repeats = 0,
verboseIter = T,savePredictions=T)
svmCV <- train(
Y~ 1
+ X1
+ X2
,data = data_nn,
method=newSVM,
trControl=tc
,preProc = c("center","scale"))
svmCV
After viewing the second link provided, I decided to try and include a label into the Model's parameters and that solved the issue! It's funny that it worked because the caret documentation says that value is optional, but if it works I can't complain.
#Model
newSVM <- list(label="My Model",
type="Regression",
library="kernlab",
loop = NULL,
parameters = prm,
grid = svmGrid,
fit = svmFit,
predict = svmPred,
prob = NULL,
sort = svmSort,
levels = NULL)

Resources