h2o deeplearning error when specifying nfolds for cross validation - r

has this issue been resolved by now? I encounter the same error message.
Usecase: I am doing binary classification using h2o's deeplearning() function. Below, I provide randomly generated data the same size as my actual usecase. System specs:
# R version 3.3.2 (2016-10-31)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows >= 8 x64 (build 9200)
# h2o version h2o_3.20.0.2
I am currently learning how to use h2o, so I have played with that function quite a bit. Everything runs smoothly until I specify parameters for cross validation.
The problem occurs when specifying the nfolds parameter for cross-validation. Interestingly, I can specify low values for nfolds and everything goes fine. For my use case, even nfolds > 3 produced an error message (see below). I provide an example below, here I was able to specify nfolds < 7 (not really consistent... sometimes just up to nfolds = 3). Above those values, the REST API give the above mentioned error: object not found for argument: key.
# R version 3.3.2 (2016-10-31)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows >= 8 x64 (build 9200)
# h2o version h2o_3.20.0.2
#does not matter whether run on debian or windows, does not matter how many threads are used
#error occurs with options for cross validation, otherwise works fine
#no error occurs with specifying a low nfold number(in my actual use case, maximum of 3 folds possible without running into that error message)
require(h2o)
h2o.init(nthreads = -1)
x = matrix(rnorm(900*96, mean=10, sd=2), nrow=900, ncol=96)
y = sample(size=900, x=c(0,1), replace=T)
sampleData = cbind(x, y)
sampleData = data.frame(sampleData)
sampleData[,97] = as.factor(sampleData[,97])
m = h2o.deeplearning(x = 1:96, y = 97,
training_frame = as.h2o(sampleData), reproducible = T,
activation = "Tanh", hidden = c(64,16), epochs = 10000, verbose=T,
nfolds = 4, balance_classes = TRUE, #Cross-validation
standardize = TRUE, variable_importances = TRUE, seed=123,
stopping_rounds=2, stopping_metric="misclassification", stopping_tolerance=0.01, #early stopping
)
performance = h2o.performance(model = m)
print(performance)
######### gives error message
# ERROR: Unexpected HTTP Status code: 404 Not Found (url = http://localhost:xxxxx/3/Models/DeepLearning_model_R_1535036938222_489)
#
# water.exceptions.H2OKeyNotFoundArgumentException
# [1] "water.exceptions.H2OKeyNotFoundArgumentException: Object 'DeepLearning_model_R_1535036938222_489' not found for argument: key"
I cannot understand why it does work only for low values of nfolds. Any suggestions? What am I missing here? I've searched most remotely related threads on Google Groups and also here on stackoverflow, but without success. If this is to do with a changed API for h2o 3.x as suggested above (though that post was 18 months ago...) I would highly appreciate some documentary on how to correctly specify the syntax to do CV with h2o.deeplearning(). Thanks in advance!

This is a bug caused by setting the verbose parameter to True, the workaround is to leave the verbose parameter as the default which is FALSE. I've created a jira ticket to track the issue here

Related

response.plot3() crashes RStudio

0. Session information
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
1. Summary of my issue
I am having a crash while using a modified function of response.plot2(), named response.plot3(). The problem is not the function in itself, as the same problem occurs with response.plot2().
2. Code
library(biomod2)
library(raster)
library(reshape)
library(ggplot2)
setwd("xxx")
# I load the modified version of response.plot2()
source("/response.plot_modified.R", local = TRUE)
sp <- "NAME"
baseline_EU <- readRDS("./data/baseline_EU.rds")
initial.wd <- getwd()
setwd("models")
# Loading of formatted data and models calibrated by biomod
load(paste0(sp, "/run.data"))
load(paste0(sp, "/model.runs"))
# Variables used for calibration
cur.vars <- model.runs#expl.var.names
# Loading model names into R memory
models.to.plot <- BIOMOD_LoadModels(model.runs)
# Calculation of response curves with all models (stored in the object resp which is an array)
resp <- response.plot3(models = models.to.plot,
Data = baseline_EU[[cur.vars]],
fixed.var.metric = "sp.mean",
show.variables = cur.vars,
run.data = run.data)
I have got 60 models and the code plot the first curve before aborting the session, with no further explanation.
3. What I unsuccessfully tried
(1) check that it was not a ram issue
(2) uninstall-reinstall all the packages and their dependencies
(3) uptade to the last R version
(4) go back to response.plot2() to see if the issue could come from response.plot3()
4. I found some similar errors which lead me to think that it might be a package issue
https://github.com/rstudio/rstudio/issues/9373
Call to library(raster) or require(raster) causes Rstudio to abort session
Now I presume that there is a problem either with the biomod2 or the raster packages, or maybe the R version?
I would greatly appreciate your help if you have any ideas.

Newsmap topic classification: issue with "predict" step of the newsmap process

I am giving a try to the Newsmap package for topic classification (not geographical, but you know, implementing to other tasks...). I follow the instructions from the Quanteda tutorials website (here).
Everything runs smoothly until I try to predict the topic most strongly associated with each of my texts.
Here is the code:
labels <- types(toks_labelExclu)
dfmt_label <- dfm(toks_labelExclu, tolower = FALSE) # The dfm with the labels of my 35 topics
dfmt_feat <- dfm(toks_sent) %>%
dfm_trim(min_termfreq = 50) # The dfm with the features to be associated with my topics
model_nm <- textmodel_newsmap(dfmt_feat, dfmt_label)
coef(model_nm, n= 20)[labels] # All good so far
pred_nm <- predict(model_nm)
# Here is the snag: the function returns
Error in x[, feature] : Subscript out of bounds
Does anyone have an idea of where the error could come from?
For information, here is the sessionInfo:
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4
other attached packages:
[1] newsmap_0.7.1 quanteda_2.0.1

Error in MuMin pdredge() when using more than 30 predictor variables

I've run into the following error that only occurs when I pass a model with more than 30 predictors to pdredge():
Error in sprintf(gettext(fmt, domain = domain), ...) :
invalid format '%d'; use format %f, %e, %g or %a for numeric objects
I'm on a windows machine running Microsoft R Open through RStudio:
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
RStudio Version 1.0.153
MuMIn_1.43.6
Reproducible example:
library(MuMIn)
library(parallel)
#Random data: X1 as response, X2-X31 (30) predictors
var.30 <- data.frame(replicate(31,sample(0:100,75,rep=TRUE)))
#Random data: X1 as response, X2-X32 (31) predictors
var.31 <- data.frame(replicate(32,sample(0:100,75,rep=TRUE)))
#prepare cluster for pdredge
clust <- try(makeCluster(detectCores()-1))
#working model (30 or less predictors)
mod <- lm(X1 ~ ., data=var.30, na.action = "na.fail")
sub.dredge <- pdredge(mod, cluster=clust, eval=FALSE)
#Non-working model (31 or more predictors)
mod <- lm(X1 ~ ., data=var.31, na.action = "na.fail")
sub.dredge <- pdredge(mod, cluster=clust, eval=FALSE)
I know in 2016 that this was an issue with integer bit restrictions. However, from this question and the comments it received, I was under the impression that the issue was resolved and the maximum changed?
The 31 terms limit in dredge is pretty much ultimate. It will not be extended unless R implements native support for 64-bit integers.
(Also, update your MuMIn - this 'sprintf' error has been fixed some time ago)
There are actually only 16 parameters in the second question you reference, but some are called multiple times to represent interaction terms (though, whether that OP really wanted them to represent interactions, or intended for I(parameter^2), is unclear; if the latter, their code would have failed as there would have been too many unique parameters). So, even though there are many (~41) terms in that question, there are only 16 unique parameters.
As far as I can tell, #Kamil Bartoń has not updated dredge to accept more than 30 unique parameter calls yet.

R h2o.glm error - java.lang.ArrayIndexOutOfBoundsException: 32

I am attempting to run h2o.glm in R but am encountering some strange behaviour. The same line of code sometimes works and sometimes errors with the following result
h2o.glm(x = Predictors.Revised, y = "NN", model_id = "GLM_FREQ_INITIAL",
offset_column = "Offset.To.Apply", nfolds = 5, family = "poisson",
link = "log", lambda_search = TRUE, training_frame = TrainDS.h2o,
alpha = 1, standardize = TRUE)
java.lang.ArrayIndexOutOfBoundsException: 32
java.lang.ArrayIndexOutOfBoundsException: 32 at
water.util.ArrayUtils.subtract(ArrayUtils.java:1334) at
hex.glm.GLM$GLMDriver.fitIRLSM(GLM.java:824) at
hex.glm.GLM$GLMDriver.fitModel(GLM.java:1080) at
hex.glm.GLM$GLMDriver.computeSubmodel(GLM.java:1169) at
hex.glm.GLM.cv_computeAndSetOptimalParameters(GLM.java:132) at
hex.ModelBuilder.cv_buildModels(ModelBuilder.java:595) at
hex.ModelBuilder.computeCrossValidation(ModelBuilder.java:431) at
hex.glm.GLM.computeCrossValidation(GLM.java:100) at
hex.ModelBuilder$1.compute2(ModelBuilder.java:309) at
water.H2O$H2OCountedCompleter.compute(H2O.java:1395) at
jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at
jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at
jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at
jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at
jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
R Version: 3.3.1
Platform: x86_64-pc-linux-gnu (64-bit)
h2o Version: 3.22.1.5
Any ideas why? I am stumped.
If you get an "out of bounds" error (e.g. java.lang.ArrayIndexOutOfBoundsException), you should check if there is anything you are iterating over, during multiple runs of your glm function. To help debug the issue I would remove all but the simplest arguments and then slowly add arguments in while running whatever set of steps led to the issue, until you can identify the parameter that is causing the out of bounds error.

Error in varimp (R party package) when conditional = TRUE

I am trying to calculate variable importance for a random forest built using the cforest function in the party package. I would like to run varimp with conditional set to TRUE, but I get an error message when I do so. The error reads:
Error in if (node[[5]][1] == variableID) cp <- node[[5]][[3]] :
argument is of length zero
Varimp run with the default setting conditional = FALSE works just fine.
Regarding the data set, all variables are categorical. The response variable is Glottal (yes/no), and there are seven predictors. Here is a link to the data, and here is the code I am using:
library(party)
glottal.df <-read.csv("~glottal_data.csv", header=T)
glottal.df$Instance <- factor(glottal.df$Instance)
data.controls <- cforest_unbiased(ntree = 500, mtry = 2)
set.seed(45)
glottal.cf <- cforest(Glottal ~ Stress + Boundary + Context + Instance + Region + Target + Speaker, data = glottal.df, controls = data.controls)
# this gives me an error
glottal.cf.varimp.true <- varimp(glottal.cf, conditional = TRUE)
# this works
glottal.cf.varimp.false <- varimp(glottal.cf)
Can anyone tell me why I am getting this error? It is not a problem with any specific variable as the problem persists even if I remove a variable, create a new forest and try to recalculate varimp, and there are no missing values in the data set. Many thanks in advance for your help!
Appears to be working with party 1.2.4:
> glottal.cf.varimp.true
Stress Boundary Context
0.0003412322 0.2405971564 0.0122369668
Instance Region Target
-0.0043507109 0.0044360190 -0.0011469194
Speaker
0.0384834123
> packageVersion('party')
[1] ‘1.2.4’
> R.version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 4.3
year 2017
month 11
day 30
svn rev 73796
language R
version.string R version 3.4.3 (2017-11-30)
nickname Kite-Eating Tree

Resources