Error while running h2o.deeplearning algorithm in R - r

I am facing an error while running this command in H2O Deep Learning in R:
model <- h2o.deeplearning(x = x, y = y, seed = 1234,
training_frame = as.h2o(trainDF),
nfolds = 3,
stopping_rounds = 7,
epochs = 400,
overwrite_with_best_model = TRUE,
activation = "Tanh",
input_dropout_ratio = .1,
hidden = c(10,10),
l1 = 6e-4,
loss = "automatic",
distribution = 'AUTO',
stopping_metric = "MSE")
ERROR as below:
Error in h2o.deeplearning(x = x, y = y, seed = 1234, training_frame = as.h2o(trainDF), :
unused arguments (training_frame = as.h2o(trainDF), stopping_rounds = 7, overwrite_with_best_model = TRUE, distribution = "AUTO", stopping_metric = "MSE")

I was not able to reproduce your specific error, but I was able to get the code to work on my end by updating loss="automatic" to loss="Automatic" (note that loss it is case sensitive).

Related

ensemble_glmnet: could not find function "predict.cv.glmnet"

I am trying to run the ensemble_glmnet program, but receiving an error that it cannot find predict.cv.glmnet. I have loaded the glmnet and glmnetUtils libraries.
I'm running RStudio 1.2.5033 and R version 3.6.2
library(BuenaVista)
library(glmnet)
library(glmnetUtils)
data<-iris[sample(1:150, size = 150, replace = FALSE),]
data <- derive_variables(dataset=data, type = "dummy", integer = TRUE, return_dataset=TRUE)
data$Species_setosa<-as.factor(data$Species_setosa)
test <-data[101:50,c(1,2,3,4,6,7)]
data<-data[,c(5,1,2,3,4,6,7)]
ensemble_glmnet(y_index = 1, train = data, valid_size = 50, n = 10, alpha = 1, family = "binomial", type = "class")
Error in predict.cv.glmnet(object = cv.glmnet(x = X, y = Y, nfolds =
nfolds, : could not find function "predict.cv.glmnet"

How to specify offset_column in h2o.stackedEnsemble()

I am running gbm and glm with offset_column as base learners in h2o. My response variable is binary and the offset_column is a positive constant. Base learners worked. Here is the code:
train["offset"]<-train["log_hazard"] # offset column in the training set
my_gbm <- h2o.gbm(x = x, y = y, training_frame = train,
fold_column = "fold_id",
keep_cross_validation_predictions = TRUE,
offset_column = "offset",
seed = 1)
my_glm <- h2o.glm(x = x, y = y, training_frame = train,
fold_column = "fold_id",
keep_cross_validation_predictions = TRUE,
offset_column = "offset",
seed = 1,family = "binomial")
Then I am passing the offset_column in h2o.stackedEnsemble() through metalerner_params. Here is the code:
stack_model <- h2o.stackedEnsemble(x = x,
y = y,
training_frame = train,
base_models = list(my_gbm, my_glm),
metalearner_params = list(offset_column = "offset"))
But I received the following error:
ERRR on field: _offset_column: Offset column 'offset' not found in the training frame
The offset_column is in the training data. I am not sure why I am receiving this error message.
Then I tried running h2o.stackedEnsemble() without the metalerner_params option. Here is the code:
stack_model <- h2o.stackedEnsemble(x = x,
y = y,
training_frame = train,
base_models = list(my_gbm, my_glm))
and received the following warning message:
Warning message:
In .h2o.startModelJob(algo, params, h2oRestApiVersion) :
Dropping bad and constant columns: [offset].
I am not sure whether it ran properly. Can anyone please help me with this issue?
if you carefully read h2o docs for h2o.stackedEnsemble then you realize that h2o metalearner won't need offset parameter anymore as it will use cross-validated predicted values from base models to train:
my_gbm <- h2o.gbm(x = x, y = y, training_frame = train,
fold_column = "fold_id",
keep_cross_validation_predictions = TRUE,
offset_column = "offset",
seed = 1)
my_glm <- h2o.glm(x = x, y = y, training_frame = train,
fold_column = "fold_id",
keep_cross_validation_predictions = TRUE,
offset_column = "offset",
seed = 1,family = "binomial")
stack_model <- h2o.stackedEnsemble(x = x,
y = y,
training_frame = train,
base_models = list(my_gbm, my_glm))
h2o.performance(my_gbm, newdata = test)
h2o.performance(my_glm, newdata = test)
h2o.performance(stack_model, newdata = test)

H2O deep learning model results with dropout scaled down

I am having the following figure when training an H2O Deep Learning model with dropout
Misaligned predictions
The code used to train the net is
m.nn <- h2o.deeplearning(x = 1:(nc-1),
y = nc,
training_frame = datTra,
#validation_frame = datTst,
nfolds = 5,
activation = 'RectifierWithDropout',
#input_dropout_ratio = 0.2,
hidden_dropout_ratios = c(dro, dro, dro),
hidden = c(120,30,8),
#hidden = 20,
epochs = 999,
#mini_batch_size = 100,
#variable_importances = TRUE,
standardize = TRUE,
regression_stop = 1e-3,
stopping_metric = "MSE",
stopping_tolerance = 1e-6,
stopping_rounds = 10)
Figure corresponds to dro=0.1
Why am I having that misalignment? Is there any option i am missing out?
You can find a piece of code to try below
(download 'SampleData.csv' from here)
library(h2o)
library(readr)
library(ggplot2)
df <- as.data.frame(read_delim(file = 'SampleData.csv', delim = ";"))
localH2O <- h2o.init(ip = "localhost", startH2O = TRUE, nthreads = 2, max_mem_size = '4g')
dat_h2o <- as.h2o(x = df)
model.ref <- h2o.deeplearning(x = 1:(ncol(df)-1), y = ncol(df),
training_frame = dat_h2o,
hidden = c(120,30,8),
activation = 'Rectifier',
epochs = 199,
mini_batch_size = 10,
regression_stop = 0.1,
stopping_metric = "MSE",
stopping_tolerance = 1e-6,
stopping_rounds = 10)
model.dro <- h2o.deeplearning(x = 1:(ncol(df)-1), y = ncol(df),
training_frame = dat_h2o,
hidden = c(120,30,8),
activation = 'RectifierWithDropout',
hidden_dropout_ratios = c(0.2, 0.2, 0.2),
epochs = 199,
mini_batch_size = 10,
regression_stop = 0.1,
stopping_metric = "MSE",
stopping_tolerance = 1e-6,
stopping_rounds = 10)
pred.ref <- as.data.frame(h2o.predict(object = model.ref, newdata = dat_h2o))
pred.dro <- as.data.frame(h2o.predict(object = model.dro, newdata = dat_h2o))
dfRes <- data.frame(cbind(df$SeqF, pred.ref$predict, pred.dro$predict))
colnames(dfRes) <- c('act', 'pred', 'pred2')
ggplot(data = dfRes) + geom_point(aes(x=act, y=pred), color='blue') +
geom_point(aes(x=act, y=pred2), color='red') + geom_abline()

Error in running h2o.ensemble

I am getting error while running h2o.ensemble in R. This is the error output
[1] "Cross-validating and training base learner 1: h2o.glm.wrapper"
|======================================================================| 100%
[1] "Cross-validating and training base learner 2: h2o.randomForest.1"
|============== | 19%
Got exception 'class java.lang.AssertionError', with msg 'null'
java.lang.AssertionError
at hex.tree.DHistogram.scoreMSE(DHistogram.java:323)
at hex.tree.DTree$DecidedNode$FindSplits.compute2(DTree.java:441)
at hex.tree.DTree$DecidedNode.bestCol(DTree.java:421)
at hex.tree.DTree$DecidedNode.<init>(DTree.java:449)
at hex.tree.SharedTree.makeDecided(SharedTree.java:489)
at hex.tree.SharedTree$ScoreBuildOneTree.onCompletion(SharedTree.java:436)
at jsr166y.CountedCompleter.__tryComplete(CountedCompleter.java:425)
at jsr166y.CountedCompleter.tryComplete(CountedCompleter.java:383)
at water.MRTask.compute2(MRTask.java:683)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1069)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Error: 'null'
This is my code that i am using. I am using this script for regression problem. "sales" column is for output prediction. Rest of the columns are for training.
response <- "Sales"
predictors <- setdiff(names(train), response)
h2o.glm.1 <- function(..., alpha = 0.0) h2o.glm.wrapper(..., alpha = alpha)
h2o.glm.2 <- function(..., alpha = 0.5) h2o.glm.wrapper(..., alpha = alpha)
h2o.glm.3 <- function(..., alpha = 1.0) h2o.glm.wrapper(..., alpha = alpha)
h2o.randomForest.1 <- function(..., ntrees = 200, nbins = 50, seed = 1) h2o.randomForest.wrapper(..., ntrees = ntrees, nbins = nbins, seed = seed)
h2o.randomForest.2 <- function(..., ntrees = 200, sample_rate = 0.75, seed = 1) h2o.randomForest.wrapper(..., ntrees = ntrees, sample_rate = sample_rate, seed = seed)
h2o.gbm.1 <- function(..., ntrees = 100, seed = 1) h2o.gbm.wrapper(..., ntrees = ntrees, seed = seed)
h2o.gbm.6 <- function(..., ntrees = 100, col_sample_rate = 0.6, seed = 1) h2o.gbm.wrapper(..., ntrees = ntrees, col_sample_rate = col_sample_rate, seed = seed)
h2o.gbm.8 <- function(..., ntrees = 100, max_depth = 3, seed = 1) h2o.gbm.wrapper(..., ntrees = ntrees, max_depth = max_depth, seed = seed)
h2o.deeplearning.1 <- function(..., hidden = c(500,500), activation = "Rectifier", epochs = 50, seed = 1) h2o.deeplearning.wrapper(..., hidden = hidden, activation = activation, seed = seed)
h2o.deeplearning.6 <- function(..., hidden = c(50,50), activation = "Rectifier", epochs = 50, seed = 1) h2o.deeplearning.wrapper(..., hidden = hidden, activation = activation, seed = seed)
h2o.deeplearning.7 <- function(..., hidden = c(100,100), activation = "Rectifier", epochs = 50, seed = 1) h2o.deeplearning.wrapper(..., hidden = hidden, activation = activation, seed = seed)
print("learning starts ")
#### Customized base learner library
learner <- c("h2o.glm.wrapper",
"h2o.randomForest.1", "h2o.randomForest.2",
"h2o.gbm.1", "h2o.gbm.6", "h2o.gbm.8",
"h2o.deeplearning.1", "h2o.deeplearning.6", "h2o.deeplearning.7")
metalearner <- "h2o.glm.wrapper"
#
#Train with new library:
fit <- h2o.ensemble(
x = predictors,
y= response,
training_frame=train,
family = "gaussian",
learner = learner,
metalearner = metalearner,
cvControl = list(V = 5))
All columns of train data are numeral. I am using R version 3.2.2.
The updated way to do this is
h2o.init(nthreads=-1,enable_assertions = FALSE)
As suggested by Spencer Aiello
Setting the assertion to FALSE in the h2o initialisation might do the trick
h2o.init(nthreads=-1, assertion = FALSE)
Make sure that you properly shutdown/restart h2o before applying the changes
h2o.shutdown()
h2o.init(nthreads=-1, assertion = FALSE)

Object 'w' not found error in factor analysis with package 'psych'

A lot of questions about factor analysis on these pages. I have browsed through them but nothing seems similar, so hopefully someone can help.
I am running a factor analysis on some survey questions where I expect some latent constructs to emerge. I am running either principal axes or minres and get the same problem, as detailed below.
My dataset contains many discrete variables and a reasonable amount of missing variables coded as NA, but even after removing all NA the problem persists:
minres.out <- factor.minres(r = res, nfactors = 5, residuals=F, rotate = "varimax", n.obs=NA, scores=F, SMC=T, missing=F, min.err=0.001, ,max.iter=50, symmetric=T,warnings=T,fm="minres")
minres.out
minres.out2 <- fa(r = res, nfactors = 5, residuals=F, rotate = "oblimin", n.obs=NA, scores=F, SMC=T, missing=F, impute="median",min.err=0.001, ,max.iter=50, symmetric=T,warnings=T,fm="minres", alpha=0.1, p=0.05,oblique.scores=F, use="pairwise")
minres.out2
The first one uses the deprecated version and gives me a warning, but it works. The second one gives me the following error:
Error in factor.scores(x.matrix, f = Structure, method = scores) :
object 'w' not found
I have no object w in my data, but I do not really understand what this object is meant to be in the first place.
Running traceback() gives me:
3: factor.scores(x.matrix, f = Structure, method = scores)
2: fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate,
scores = scores, residuals = residuals, SMC = SMC, covar = covar,
missing = FALSE, impute = impute, min.err = min.err, max.iter = max.iter,
symmetric = symmetric, warnings = warnings, fm = fm, alpha = alpha,
oblique.scores = oblique.scores, np.obs = np.obs, use = use,
...)
1: fa(r = res, nfactors = 5, residuals = F, rotate = "oblimin",
n.obs = NA, scores = F, SMC = T, missing = F, impute = "median",
min.err = 0.001, , max.iter = 50, symmetric = T, warnings = T,
fm = "minres", alpha = 0.1, p = 0.05, oblique.scores = F,
use = "pairwise")
Not very enlightening to me.
Any suggestions regarding this w?
I went through the code line-by-line. It seems that scores cannot be passed as an argument to the factor.scores function. It goes through a switch statement and none of the branches activates, so you end up with no value for w which causes it to fail. You could try copying and pasting the following silly fix into your R session and then running your code again:
fa <- function(r, nfactors = 1, n.obs = NA, n.iter = 1, rotate = "oblimin",
scores = "regression", residuals = FALSE, SMC = TRUE, covar = FALSE,
missing = FALSE, impute = "median", min.err = 0.001, max.iter = 50,
symmetric = TRUE, warnings = TRUE, fm = "minres", alpha = 0.1,
p = 0.05, oblique.scores = FALSE, np.obs = NULL, use = "pairwise",
...){
scores <- c("a","b")
psych::fa(r, nfactors = 1, n.obs = NA, n.iter = 1, rotate = "oblimin",
scores = "regression", residuals = FALSE, SMC = TRUE, covar = FALSE,
missing = FALSE, impute = "median", min.err = 0.001, max.iter = 50,
symmetric = TRUE, warnings = TRUE, fm = "minres", alpha = 0.1,
p = 0.05, oblique.scores = FALSE, np.obs = NULL, use = "pairwise",
...)
}
I had this same error. Mine was caused because I tried to pass "Regression" to scores instead of "regression". So make sure that what you're passing to scores is an acceptable parameter option.

Resources