ensemble_glmnet: could not find function "predict.cv.glmnet" - r

I am trying to run the ensemble_glmnet program, but receiving an error that it cannot find predict.cv.glmnet. I have loaded the glmnet and glmnetUtils libraries.
I'm running RStudio 1.2.5033 and R version 3.6.2
library(BuenaVista)
library(glmnet)
library(glmnetUtils)
data<-iris[sample(1:150, size = 150, replace = FALSE),]
data <- derive_variables(dataset=data, type = "dummy", integer = TRUE, return_dataset=TRUE)
data$Species_setosa<-as.factor(data$Species_setosa)
test <-data[101:50,c(1,2,3,4,6,7)]
data<-data[,c(5,1,2,3,4,6,7)]
ensemble_glmnet(y_index = 1, train = data, valid_size = 50, n = 10, alpha = 1, family = "binomial", type = "class")
Error in predict.cv.glmnet(object = cv.glmnet(x = X, y = Y, nfolds =
nfolds, : could not find function "predict.cv.glmnet"

Related

Error when running multiscale GWR: Error in gw_weight_vec: Not compatible > with requested type: [type=NULL; target=double]

I am trying to run multiscale geographically weighted regression (MGWR) using the GWmodel package in R. When running the function gwr.multiscale this error is shown:
Error in gw_weight_vec(vdist, bw, kernel, adaptive): Not compatible
with requested type: [type=NULL; target=double].
An example:
library(GWmodel)
data(LondonHP)
dist <- gw.dist(coordinates(londonhp))
ab_gwr <- gwr.multiscale(PURCHASE ~ FLOORSZ + PROF,
data = londonhp,
criterion = "dCVR",
kernel = "gaussian",
adaptive = FALSE,
var.dMat.indx = 2,
bws0 = c(100,
100,
100),
bw.seled = rep(T, 3),
dMats = list(dist,
dist,
dist),
parallel.method = "omp",
parallel.arg = "omp")
I have tried other parameters as well, like adaptive bandwidth, to include fewer covariates, to change the bws0 parameter etc etc. Other kinds of errors occur depending on what I have tried.
I am following the example from the package's PDF.
The parameter var.dMat.indx is defined for the usage of distance matrix for each variable, and was used wrongly in my code. The solution:
library(GWmodel)
data(LondonHP)
dist <- gw.dist(coordinates(londonhp))
ab_gwr <- gwr.multiscale(PURCHASE ~ FLOORSZ + PROF,
data = londonhp,
criterion = "dCVR",
kernel = "gaussian",
adaptive = FALSE,
var.dMat.indx = 1:3,
bws0 = c(100,
100,
100),
bw.seled = rep(TRUE, 3),
dMats = list(dist,
dist,
dist),
parallel.method = "omp",
parallel.arg = "omp")

How to prepare variables for nnet classification/predict in R?

In the classification I use the variable x as the value and y as the labels. As here in the example for randomForest:
iris_train_values <- iris[,c(1:4)]
iris_train_labels <- iris[,5]
model_RF <- randomForest(x = iris_train_values, y = iris_train_labels, importance = TRUE,
replace = TRUE, mtry = 4, ntree = 500, na.action=na.omit,
do.trace = 100, type = "classification")
This solution works for many classifiers, however when I try to do it in nnet and get error:
model_nnet <- nnet(x = iris_train_values, y = iris_train_labels, size = 1, decay = 0.1)
Error in nnet.default(x = iris_train_values, y = iris_train_labels, size = 1, :
NA/NaN/Inf in foreign function call (arg 2)
In addition: Warning message:
In nnet.default(x = iris_train_values, y = iris_train_labels, size = 1, :
NAs introduced by coercion
Or on another data set gets an error:
Error in y - tmp : non-numeric argument to binary operator
How should I change the variables to classify?
The formula syntax works:
library(nnet)
model_nnet <- nnet(Species ~ ., data = iris, size = 1)
But the matrix syntax does not:
nnet::nnet(x = iris_train_values, y = as.matrix(iris_train_labels), size = 1)
I don't understand why this doesn't work, but at least there is a work around.
predict works fine with the formula syntax:
?predict.nnet
predict(model_nnet,
iris[c(1,51,101), 1:4],
type = "class") # true classese are ['setosa', 'versicolor', 'virginica']

How to specify offset_column in h2o.stackedEnsemble()

I am running gbm and glm with offset_column as base learners in h2o. My response variable is binary and the offset_column is a positive constant. Base learners worked. Here is the code:
train["offset"]<-train["log_hazard"] # offset column in the training set
my_gbm <- h2o.gbm(x = x, y = y, training_frame = train,
fold_column = "fold_id",
keep_cross_validation_predictions = TRUE,
offset_column = "offset",
seed = 1)
my_glm <- h2o.glm(x = x, y = y, training_frame = train,
fold_column = "fold_id",
keep_cross_validation_predictions = TRUE,
offset_column = "offset",
seed = 1,family = "binomial")
Then I am passing the offset_column in h2o.stackedEnsemble() through metalerner_params. Here is the code:
stack_model <- h2o.stackedEnsemble(x = x,
y = y,
training_frame = train,
base_models = list(my_gbm, my_glm),
metalearner_params = list(offset_column = "offset"))
But I received the following error:
ERRR on field: _offset_column: Offset column 'offset' not found in the training frame
The offset_column is in the training data. I am not sure why I am receiving this error message.
Then I tried running h2o.stackedEnsemble() without the metalerner_params option. Here is the code:
stack_model <- h2o.stackedEnsemble(x = x,
y = y,
training_frame = train,
base_models = list(my_gbm, my_glm))
and received the following warning message:
Warning message:
In .h2o.startModelJob(algo, params, h2oRestApiVersion) :
Dropping bad and constant columns: [offset].
I am not sure whether it ran properly. Can anyone please help me with this issue?
if you carefully read h2o docs for h2o.stackedEnsemble then you realize that h2o metalearner won't need offset parameter anymore as it will use cross-validated predicted values from base models to train:
my_gbm <- h2o.gbm(x = x, y = y, training_frame = train,
fold_column = "fold_id",
keep_cross_validation_predictions = TRUE,
offset_column = "offset",
seed = 1)
my_glm <- h2o.glm(x = x, y = y, training_frame = train,
fold_column = "fold_id",
keep_cross_validation_predictions = TRUE,
offset_column = "offset",
seed = 1,family = "binomial")
stack_model <- h2o.stackedEnsemble(x = x,
y = y,
training_frame = train,
base_models = list(my_gbm, my_glm))
h2o.performance(my_gbm, newdata = test)
h2o.performance(my_glm, newdata = test)
h2o.performance(stack_model, newdata = test)

Error while running h2o.deeplearning algorithm in R

I am facing an error while running this command in H2O Deep Learning in R:
model <- h2o.deeplearning(x = x, y = y, seed = 1234,
training_frame = as.h2o(trainDF),
nfolds = 3,
stopping_rounds = 7,
epochs = 400,
overwrite_with_best_model = TRUE,
activation = "Tanh",
input_dropout_ratio = .1,
hidden = c(10,10),
l1 = 6e-4,
loss = "automatic",
distribution = 'AUTO',
stopping_metric = "MSE")
ERROR as below:
Error in h2o.deeplearning(x = x, y = y, seed = 1234, training_frame = as.h2o(trainDF), :
unused arguments (training_frame = as.h2o(trainDF), stopping_rounds = 7, overwrite_with_best_model = TRUE, distribution = "AUTO", stopping_metric = "MSE")
I was not able to reproduce your specific error, but I was able to get the code to work on my end by updating loss="automatic" to loss="Automatic" (note that loss it is case sensitive).

R deep learning, multiple outputs

Is it possible to create a deep learning net that gives multiple outputs?
The reason for doing this is to also try to capture the relationships between outputs.
In the examples given I can only create one output.
library(h2o)
localH2O = h2o.init()
irisPath = system.file("extdata", "iris.csv", package = "h2o")
iris.hex = h2o.importFile(localH2O, path = irisPath)
h2o.deeplearning(x = 1:4, y = 5, data = iris.hex, activation = "Tanh",
hidden = c(10, 10), epochs = 5)
It doesn't look like multiple response columns are currently supported in H2O (H2O FAQ and H2O Google Group topic). Their suggestion is to train a new model for each response.
(Nonsensical) example:
library(h2o)
localH2O <- h2o.init()
irisPath <- system.file("extdata", "iris.csv", package = "h2o")
iris.hex <- h2o.importFile(localH2O, path = irisPath)
m1 <- h2o.deeplearning(x = 1:2, y = 3, data = iris.hex, activation = "Tanh",
hidden = c(10, 10), epochs = 5, classification = FALSE)
m2 <- h2o.deeplearning(x = 1:2, y = 4, data = iris.hex, activation = "Tanh",
hidden = c(10, 10), epochs = 5, classification = FALSE)
However, it appears that multiple responses are available through the deepnet package (check library(sos); findFn("deep learning")).
library(deepnet)
x <- as.matrix(iris[,1:2])
y <- as.matrix(iris[,3:4])
m3 <- dbn.dnn.train(x = x, y = y, hidden = c(5,5))

Resources