I'm attempting to run a neural network model for the first time using nnet in r. When I supply a range of values to be given to the "size" argument, I get the following error:
Error in nnet.default(x, y, w, entropy = TRUE, ...) :
initial value in 'vmmin' is not finite
However when I pass a single value to the "size" argument, the function works without any problem. Why is this error occurring and how can I get around it?
Here is a reproducible example:
Var1 <- rnorm(100, 1, 2)
Var2 <- rnorm(100, 1, 2)
Var3 <- rnorm(100, 1, 2)
Var4 <- rnorm(100, 1, 2)
Var5 <- as.factor(runif(100)<=.50)
outcome <- as.factor(runif(100)<=.90)
data <- data.frame(outcome, Var1, Var2, Var3, Var4, Var5)
neural_net <- nnet(outcome ~ ., data = data, decay=5e-4, maxit=200, size = seq(from = 2, to = 30, by = 1))
And this is my R version info:
> version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 4.0
year 2017
month 04
day 21
svn rev 72570
language R
version.string R version 3.4.0 (2017-04-21)
nickname You Stupid Darkness
Thanks!
nnet only fits single-hidden-layer neural networks, so size (the number of neurons in the hidden layer) can only be a scalar.
The nnet package is very obsolete now. It dates to the 1990s, before all the current advances in deep learning were made. Consider using a more modern package if you want to learn about neural nets in R, like RStudio's tensorflow or Microsoft R's MicrosoftML. The latter is actually a toolkit of machine learning algorithms including random forests, boosted trees, and others in addition to NNs.
Related
I've run into the following error that only occurs when I pass a model with more than 30 predictors to pdredge():
Error in sprintf(gettext(fmt, domain = domain), ...) :
invalid format '%d'; use format %f, %e, %g or %a for numeric objects
I'm on a windows machine running Microsoft R Open through RStudio:
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
RStudio Version 1.0.153
MuMIn_1.43.6
Reproducible example:
library(MuMIn)
library(parallel)
#Random data: X1 as response, X2-X31 (30) predictors
var.30 <- data.frame(replicate(31,sample(0:100,75,rep=TRUE)))
#Random data: X1 as response, X2-X32 (31) predictors
var.31 <- data.frame(replicate(32,sample(0:100,75,rep=TRUE)))
#prepare cluster for pdredge
clust <- try(makeCluster(detectCores()-1))
#working model (30 or less predictors)
mod <- lm(X1 ~ ., data=var.30, na.action = "na.fail")
sub.dredge <- pdredge(mod, cluster=clust, eval=FALSE)
#Non-working model (31 or more predictors)
mod <- lm(X1 ~ ., data=var.31, na.action = "na.fail")
sub.dredge <- pdredge(mod, cluster=clust, eval=FALSE)
I know in 2016 that this was an issue with integer bit restrictions. However, from this question and the comments it received, I was under the impression that the issue was resolved and the maximum changed?
The 31 terms limit in dredge is pretty much ultimate. It will not be extended unless R implements native support for 64-bit integers.
(Also, update your MuMIn - this 'sprintf' error has been fixed some time ago)
There are actually only 16 parameters in the second question you reference, but some are called multiple times to represent interaction terms (though, whether that OP really wanted them to represent interactions, or intended for I(parameter^2), is unclear; if the latter, their code would have failed as there would have been too many unique parameters). So, even though there are many (~41) terms in that question, there are only 16 unique parameters.
As far as I can tell, #Kamil Bartoń has not updated dredge to accept more than 30 unique parameter calls yet.
Link to data (1170 obs, 9 variables, .Rd file)
Simply read it in using readRDS(file).
I´m trying to setup a GLMM using the glmmPQL function from the MASS package including a random effects part and accounting for spatial autocorrelation. However, R (Version: 3.3.1) crashes upon execution.
library(nlme)
# setup model formula
fo <- hail ~ prec_nov_apr + t_min_nov_apr + srad_nov_apr + age
# setup corSpatial object
correl = corSpatial(value = c(10000, 0.1), form = ~ry + rx, nugget = TRUE,
fixed = FALSE, type = "exponential")
correl = Initialize(correl, data = d)
# fit model
fit5 <- glmmPQL(fo, random = ~1 | date, data = d,
correl = correl, family = binomial)
What I tried so far:
reduce number of observation
play with corSpatial parameters (range and nugget)
reduce number of fixed predictors
execute code on Windows, Linux (Debian) and Mac R installations
While I get no error message on my local pc (RStudio just crashes), running the script on a server returns the following error message:
R: malloc.c:3540: _int_malloc: Assertion (fwd->size & 0x4) == 0' failed. Aborted
I'd use the INLA package to model this, as it allows to use spatially correlated random effects. The required code is a bit too long to place here. Therefore I've place it in a document on http://rpubs.com/INBOstats/spde
While trying to run the example on H2OEnsemble found on http://learn.h2o.ai/content/tutorials/ensembles-stacking/index.html from within Rstudio, I encounter the following error:
Error in value[3L] :
argument "training_frame" must be a valid H2O H2OFrame or id
after defining the ensemble
fit <- h2o.ensemble(x = x, y = y,
training_frame = train,
family = family,
learner = learner,
metalearner = metalearner,
cvControl = list(V = 5, shuffle = TRUE))
I installed the latest version of both h2o and h2oEnsemble but the issue remains. I have read here `h2o.cbind` accepts only of H2OFrame objects - R that the naming convention in h2o changed over time, but I assume by installing the latest version of both this should not be any longer the issue.
Any suggestions?
library(readr)
library(h2oEnsemble) # Requires version >=0.0.4 of h2oEnsemble
library(cvAUC) # Used to calculate test set AUC (requires version >=1.0.1 of cvAUC)
localH2O <- h2o.init(nthreads = -1) # Start an H2O cluster with nthreads = num cores on your machine
# Import a sample binary outcome train/test set into R
train <- h2o.importFile("http://www.stat.berkeley.edu/~ledell/data/higgs_10k.csv")
test <- h2o.importFile("http://www.stat.berkeley.edu/~ledell/data/higgs_test_5k.csv")
y <- "C1"
x <- setdiff(names(train), y)
family <- "binomial"
#For binary classification, response should be a factor
train[,y] <- as.factor(train[,y])
test[,y] <- as.factor(test[,y])
# Specify the base learner library & the metalearner
learner <- c("h2o.glm.wrapper", "h2o.randomForest.wrapper",
"h2o.gbm.wrapper", "h2o.deeplearning.wrapper")
metalearner <- "h2o.deeplearning.wrapper"
# Train the ensemble using 5-fold CV to generate level-one data
# More CV folds will take longer to train, but should increase performance
fit <- h2o.ensemble(x = x, y = y,
training_frame = train,
family = family,
learner = learner,
metalearner = metalearner,
cvControl = list(V = 5, shuffle = TRUE))
This bug was recently introduced by a bulk find/replace change of a class name made to the h2o R code. The change was inadvertently applied to the ensemble code folder as well (where we currently have manual instead of automatic tests -- soon to be automatic to prevent this sort of thing). I've fixed the bug.
To fix, reinstall the h2oEnsemble package from GitHub:
library(devtools)
install_github("h2oai/h2o-3/h2o-r/ensemble/h2oEnsemble-package")
Thanks for the report! For a quicker response, post bugs and questions here: https://groups.google.com/forum/#!forum/h2ostream
I am trying to calculate variable importance for a random forest built using the cforest function in the party package. I would like to run varimp with conditional set to TRUE, but I get an error message when I do so. The error reads:
Error in if (node[[5]][1] == variableID) cp <- node[[5]][[3]] :
argument is of length zero
Varimp run with the default setting conditional = FALSE works just fine.
Regarding the data set, all variables are categorical. The response variable is Glottal (yes/no), and there are seven predictors. Here is a link to the data, and here is the code I am using:
library(party)
glottal.df <-read.csv("~glottal_data.csv", header=T)
glottal.df$Instance <- factor(glottal.df$Instance)
data.controls <- cforest_unbiased(ntree = 500, mtry = 2)
set.seed(45)
glottal.cf <- cforest(Glottal ~ Stress + Boundary + Context + Instance + Region + Target + Speaker, data = glottal.df, controls = data.controls)
# this gives me an error
glottal.cf.varimp.true <- varimp(glottal.cf, conditional = TRUE)
# this works
glottal.cf.varimp.false <- varimp(glottal.cf)
Can anyone tell me why I am getting this error? It is not a problem with any specific variable as the problem persists even if I remove a variable, create a new forest and try to recalculate varimp, and there are no missing values in the data set. Many thanks in advance for your help!
Appears to be working with party 1.2.4:
> glottal.cf.varimp.true
Stress Boundary Context
0.0003412322 0.2405971564 0.0122369668
Instance Region Target
-0.0043507109 0.0044360190 -0.0011469194
Speaker
0.0384834123
> packageVersion('party')
[1] ‘1.2.4’
> R.version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 4.3
year 2017
month 11
day 30
svn rev 73796
language R
version.string R version 3.4.3 (2017-11-30)
nickname Kite-Eating Tree
I'm trying to build some models using Zero-Inflated Poisson regression using pscl package and after having manipulated the output object which turns to be zeroinfl, I find that doing residuals(fm_zip) is not equal to fm_zip$residuals.
The following is an example of what I'm talking about:
library("pscl")
data("bioChemists", package = "pscl")
fm_zip <- zeroinfl(art ~ . | 1, data = bioChemists)
names(fm_zip)
fm_zip$residuals
residuals(fm_zip)
all.equal(fm_zip$residuals,residuals(fm_zip))
qplot(fm_zip$residuals,residuals(fm_zip))
As you might realize, the results are not equal. I would say that both ways are equivalent but it seems like they're not. Could you explain me what is wrong with this? According to residuals R help, those two alternatives are supposed to return the difference (observed - fitted). By contrast, I did the same with a plain vanilla linear regression and they are equal.
My R version is:
sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)...
and the package verion is pscl_1.04.4
Any help is appreciated.
To get equal result you should set type to response ( pearson by default)
all.equal(fm_zip$residuals,residuals(fm_zip,'response'))
[1] TRUE
From the ?residuals.zeroinfl:
The residuals method can compute raw residuals (observed - fitted) and
Pearson residuals (raw residuals scaled by square root of variance
function).
The perason variance is defined as:
mu <- predict(fm_zip, type = "count")
phi <- predict(fm_zip, type = "zero")
theta1 <- switch(fm_zip$dist, poisson = 0,
geometric = 1,
negbin = 1/object$theta)
variance <- fm_zip$fitted.values * (1 + (phi + theta1) * mu)
EDIT
Don't hesitate to read the code behind, it is generally a source of learning and you can also avoid many confusions. To get the code behind the S3 method residuals.zeroinfl, you can use something like this :
getS3method('residuals','zeroinfl')