I've gotten an error message when attempting to plot a neural network. I was able to run the code fine at first then it stopped. I do not get an error message when the neuralnet() function is run. Any help would be appreciated. I predicting the loan default.
library(neuralnet)
library(plyr)
CreditCardnn <- read.csv("https://raw.githubusercontent.com/621-Group2/Final-Project/master/UCI_Credit_Card.csv")
#Normalize dataset
maxValue <- apply(CreditCardnn, 2, max)
minValue <- apply(CreditCardnn, 2, min)
CreditCardnn <- as.data.frame(scale(CreditCardnn, center = minValue, scale = maxValue - minValue))
#Rename to target variable
colnames(CreditCardnn)[25] <- "target"
smp <- floor(0.70 * nrow(CreditCardnn))
set.seed(4784)
CreditCardnn$ID <- NULL
train_index <- sample(seq_len(nrow(CreditCardnn)), size = smp, replace = FALSE)
train_nn <- CreditCardnn[train_index, ]
test_nn <- CreditCardnn[-train_index, ]
allVars <- colnames(CreditCardnn)
predictorVars <- allVars[!allVars%in%'target']
predictorVars <- paste(predictorVars, collapse = "+")
f <- as.formula(paste("target~", predictorVars, collapse = "+"))
nueralModel <- neuralnet(formula = f, hidden = c(4,2), linear.output = T, data = train_nn)
plot(nueralModel)
Which gives the following error:
Error in plot.nn(nueralModel) : weights were not calculated
Before the error you report, most probably you also got a warning:
# your data preparation code verbatim here
> nueralModel <- neuralnet(formula = f, hidden = c(4,2), linear.output = T, data = train_nn)
Warning message:
algorithm did not converge in 1 of 1 repetition(s) within the stepmax
This message is important, effectively warning you that your neural network did not converge. Given this message, the error further downstream, when you try to plot the network, is actually expected:
> plot(nueralModel)
Error in plot.nn(nueralModel) : weights were not calculated
Looking more closely into your code & data, it turns out that the problem lies in your choice for linear.output = T in fitting your neural network; from the docs:
linear.output logical. If act.fct should not be applied to the output neurons set linear output to TRUE, otherwise to FALSE.
Keeping a linear output in the final layer of a neural network is normally used in regression settings only; in classification settings, such as yours, the correct choice is to apply the activation function to the output neuron(s) as well. Hence, trying the same code as yours but with linear.output = F, we get:
> nueralModel <- neuralnet(formula = f, hidden = c(4,2), linear.output = F, data = train_nn) # no warning this time
> plot(nueralModel)
And here is the result of the plot:
Try increasing stepmax. Ex. set stepmax = 1e6 or higher. It takes longer time for higher stepmax but you can try:
nueralModel <- neuralnet(formula = f, hidden = c(4,2), linear.output = F, data = train_nn, stepmax = 1e6)
Related
I am trying to do parallel computing using the apply family of functions with the following lines of code.
the objective is to fit each column of the silum_ matrix to my specification and when I check
dim((simul_[,1]))
I get "NULL" which causes the problem with the apply function.
The complete code is the following:
## Library
lib_vec = c("MSGARCH", "matrixStats", "parallel")
invisible(lapply(lib_vec, f_install_load_lib))
## seed
set.seed(1234)
MSGARCH model Specification from the MSGARCH package
MSGARCH_spec <- CreateSpec(variance.spec = list(model = c("sGARCH", "sGARCH")),
distribution.spec = list(distribution = c("norm",
"norm")),
switch.spec = list(do.mix = FALSE, K = NULL),
constraint.spec = list(fixed = list(),
regime.const = NULL),
prior = list(mean = list(), sd = list()))
MSGARCH fitting: the sp500_logrets are just log-returns.
MSgarch_fit <- FitML(data = sp500_logrets, spec = MSGARCH_spec)
Simulating MSGARCH Log_returns
nsim <- 100 # number of simulations
nahead <- 1000 # size of each simualtion
MS_simul <- simulate(MSgarch_fit, nsim = nsim, nahead = nahead, n.start = 500,
nburn = 100)
simul_ <- MS_simul$draw # retrieving the simulated data
Parallel computing settings
n_cores <- detectCores()
cl <- makeCluster(n_cores[1] - 1)
fitting each simulation by parallel computing with the apply functions
fitt_ <- parSapply(cl, X = simul_, MARGIN = 2, FUN = FitML, spec = MSGARCH_spec)
stopCluster(cl)
The error I get is :
7 nodes produced errors; first error: unused argument (MARGIN = base::quote(2))
I think that
I am quite lost and would very much appreciate any help :)
The error is quite explicit:
7 nodes produced errors; first error: unused argument (MARGIN = base::quote(2))
There is no parameter MARGIN in parSapply() / sapply().
You're maybe mistaking with apply().
Trying to run a Cross validation of a zero-inflated poisson model using cv.zipath from the mpath package.
Fitting the LASSO
fit.lasso = zipath(estimation_sample_nomiss ~ .| .,
data = missings,
nlambda = 100,
family = "poisson",
link = "logit")
Cross validation
n <- dim(docvisits)[1]
K <- 10
set.seed(197)
foldid <- split(sample(1:n), rep(1:K, length = n))
fitcv <- cv.zipath(F_time_unemployed~ . | .,
data = estimation_sample_nomiss, family = "poisson",
nlambda = 100, lambda.count = fit.lasso$lambda.count[1:30],
lambda.zero = fit.lasso$lambda.zero[1:30], maxit.em = 300,
maxit.theta = 1, theta.fixed = FALSE, penalty = "enet",
rescale = FALSE, foldid = foldid)
I encounter the following error:
Error in model.frame.default(formula = F_time_unemployed ~ . + ., data = list(: variable lengths differ (found for '(weights)')
I have cleaned the sample of all NA's but still encounter the error message.
The solution turns out to be that the cv.zipath() command does not accept tibble data formats - at least in this instance. (No guarantee as to how this statement can be generalised). Having used dplyr commands, one needs to convert back to data frame. Thus, the solution is as simple as as.dataframe().
When I try to run the following code:
reg <- randomForest(max_orders ~ ., data = df[-c(1:3)], ntree = 100, importance = T)
varImpPlot(reg, sort = T)
I get the error:
Error in plot.window(xlim = xlim, ylim = ylim, log = "") :
need finite 'xlim' values
But if I run:
reg <- randomForest(max_orders ~ ., data = df[-c(1:3)], ntree = 100)
varImpPlot(reg, sort = T)
Everything's fine and dandy!
I'm legitimately about to lose my sanity. I've made the MSE variable importance plots a countless number of times, I don't know what the issue is here. Here's my original regression data (df[-c(1:3]):
EDIT: R has officially gonna full blown schizophrenic on me:
> # Test Variables
> reg <- randomForest(max_orders ~ release_age, data = df[-c(1:3)], ntree =
100, importance = T)
> varImpPlot(reg, sort = T)
> # Test Variables
> reg <- randomForest(max_orders ~ release_age, data = df[-c(1:3)], ntree =
100, importance = T)
> varImpPlot(reg, sort = T)
Error in plot.window(xlim = xlim, ylim = ylim, log = "") :
need finite 'xlim' values
HOW DOES R RUN THE EXACT SAME CODE WORD FOR WORD AND HAVE AN ERROR ONE TIME AND NOT ANOTHER?! Well, I guess I fixed the problem, just kept rerunning the code until a plot finally showed up, still want to know what is reason behind this enigma.
I'm having and error while trying to train a dataset with the caret package. The error is the following... Error in train.default(x, y, weights = w, ...) : Stopping. I also have warnings() which all of them are the same because I'm creating an object for the tuneGrid with the following code...grid <- expand.grid(cp = seq(0, 0.05, 0.005)). This code is creating a data.frame with 11 rows that correspond to the 11 warnings I'm having. Here is the warning... In eval(expr, envir, enclos) :
model fit failed for Fold01: cp=0 Error in[.data.frame(m, labs) : undefined columns selected. Looks like the cp doesn't have anything. I can go to my environment and see the grid object and all 11 rows. I have search stackoverflow and I found similar questions but since these functions have so many ways to tweak them, I haven't found a question that fix my problem.
Here is my code...
require(rpart)
require(rattle)
require(rpart.plot)
require(caret)
setwd('~/Documents/Lipscomb/predictive_analytics/class4/')
data <- read.csv(file = 'data.csv',
head = FALSE)
data <- subset(data, select = -V1)
colnames(data) <- c('diagnostic', 'm.radius', 'm.texture', 'm. perimeter', 'm.area', 'm.smoothness', 'm.compactness', 'm.concavity', 'm.concave.points', 'm.symmetry', 'm.fractal.dimension',
'se.radius', 'se.texture', 'se. perimeter', 'se.area', 'se.smoothness', 'se.copactness', 'se.concavity', 'se.concave.points', 'se.symmetry', 'se.fractal.dimension',
'w.radius', 'w.texture', 'w. perimeter', 'w.area', 'w.smoothness', 'w.copactness', 'w.concavity', 'w.concave.points', 'w.symmetry', 'w.fractal.dimension')
str(data)
set.seed(7)
sample.train <- sample(1:nrow(data), nrow(data) * .8)
sample.test <- setdiff(1:nrow(data), sample.train)
data.train <- data[sample.train, ]
data.test <- subset(data[sample.test, ], select = -diagnostic)
rpart.tree <- rpart(diagnostic ~ ., data = data.train)
out <- predict(rpart.tree, data.test, type = 'class')
table(out, data[sample.test, ]$diagnostic)
fancyRpartPlot(rpart.tree)
temp <- rpart.control(xval = 10, minbucket = 2, minsplit = 4, cp = 0)
dfit <- rpart(diagnostic ~ ., data = data.train, control = temp)
fancyRpartPlot(dfit)
fit.control <- trainControl(method = 'cv', number = 10)
grid <- expand.grid(cp = seq(0, 0.05, 0.005))
trained.tree <- train(diagnostic ~ ., method = 'rpart', data = data.train,
metric = 'Accuracy', maximize = TRUE,
trControl = fit.control, tuneGrid = grid)
I have found a solution to this problem. I changed the way I was naming my colnames. For some reason, the original code for naming colnames was causing error utilizing the train function. This code fixed the problem.
colnames(data) <- c('diagnostic', 'radius', 'texture', 'perimeter', 'area', 'smoothness', 'compactness', 'concavity', 'concavePoints', 'symmetry', 'fractalDimension',
'SeRadius', 'SeTexture', 'SePerimeter', 'SeArea', 'SeSmoothness', 'SeCopactness', 'SeConcavity', 'SeConcavePoints', 'SeSymmetry', 'SeFractalDimension',
'Wradius', 'Wtexture', 'Wperimeter', 'Warea', 'Wsmoothness', 'Wcopactness', 'Wconcavity', 'WconcavePoints', 'Wsymmetry', 'WfractalDimension')
I wrote a function within lapply to fit a GAM (with splines) for each element in a vector of response variables within a data frame. I opted to use caret to fit the models instead of directly using mgcv or the gam package because I would like to eventually split my data into a train/test set for validation and use various resampling techniques. For now, I simply have the trainControl method set to 'none' like so:
# Set resampling method
# tc <- trainControl(method = "boot", number = 100)
# tc <- trainControl(method = "repeatedcv", number = 10, repeats = 1)
tc <- trainControl(method = "none")
fm <- lapply(group, function(x) {
printFormula <- paste(x, "~", inf.factors)
inputFormula <- as.formula(printFormula)
# Partition input data for model training and testing
# dpart <- createDataPartition(mdata[,x], times = 1, p = 0.7, list = FALSE)
# train <- mdata[ data.partition, ]
# test <- mdata[ -data.partition, ]
cat("Fitting:", printFormula, "\n")
# gam(inputFormula, family = binomial(link = "logit"), data = mdata)
train(inputFormula, family = binomial(link = "logit"), data = mdata, method = "gam",
trControl = tc)
})
When I execute this code, I receive the following error:
Error in train.default(x, y, weights = w, ...) :
Only one model should be specified in tuneGrid with no resampling
If I re-run the code in debugging mode, I can find where caret stops the training process:
if (trControl$method == "none" && nrow(tuneGrid) != 1)
stop("Only one model should be specified in tuneGrid with no resampling")
Clearly the train function fails because of the second condition, but when I look up the tuning parameters for a GAM (with splines) there is only an option for feature selection (not interested, I want to keep all the predictors in the model) and the method. Consequently, I do not include a tuneGrid data frame when I call train. Is this the reason why the model is failing in this way? What parameter would I provide and what would the tuneGrid look like?
I should add that the model is trained successfully when I use bootstrapping or k-fold CV, however these resampling methods take much longer to calculate and I do not need to use them yet.
Any help on this issue would be appreciated!
For that model, the tuning grid looks over two values of the select parameters:
> getModelInfo("gam", regex = FALSE)[[1]]$grid
function(x, y, len = NULL, search = "grid") {
if(search == "grid") {
out <- expand.grid(select = c(TRUE, FALSE), method = "GCV.Cp")
} else {
out <- data.frame(select = sample(c(TRUE, FALSE), size = len, replace = TRUE),
method = sample(c("GCV.Cp", "ML"), size = len, replace = TRUE))
}
out[!duplicated(out),]
}
You should use something like tuneGrid = data.frame(select = FALSE, method = "GCV.Cp") to only evaluate a single model (as error message says).