R deep learning, multiple outputs - r

Is it possible to create a deep learning net that gives multiple outputs?
The reason for doing this is to also try to capture the relationships between outputs.
In the examples given I can only create one output.
library(h2o)
localH2O = h2o.init()
irisPath = system.file("extdata", "iris.csv", package = "h2o")
iris.hex = h2o.importFile(localH2O, path = irisPath)
h2o.deeplearning(x = 1:4, y = 5, data = iris.hex, activation = "Tanh",
hidden = c(10, 10), epochs = 5)

It doesn't look like multiple response columns are currently supported in H2O (H2O FAQ and H2O Google Group topic). Their suggestion is to train a new model for each response.
(Nonsensical) example:
library(h2o)
localH2O <- h2o.init()
irisPath <- system.file("extdata", "iris.csv", package = "h2o")
iris.hex <- h2o.importFile(localH2O, path = irisPath)
m1 <- h2o.deeplearning(x = 1:2, y = 3, data = iris.hex, activation = "Tanh",
hidden = c(10, 10), epochs = 5, classification = FALSE)
m2 <- h2o.deeplearning(x = 1:2, y = 4, data = iris.hex, activation = "Tanh",
hidden = c(10, 10), epochs = 5, classification = FALSE)
However, it appears that multiple responses are available through the deepnet package (check library(sos); findFn("deep learning")).
library(deepnet)
x <- as.matrix(iris[,1:2])
y <- as.matrix(iris[,3:4])
m3 <- dbn.dnn.train(x = x, y = y, hidden = c(5,5))

Related

Error using glmulti with furrr::future_map

I am getting an error when I try to run glmulti on different datasets in parallel using furr::future_map. It works when future_map is called sequentially.
#Load packages
library(furrr)
library(future)
library(tidyverse)
#This doesn't work
plan(multisession, workers = 2 ) #Set number of parallel sessions (can't do multi core on Windows)
mods <- list(tibble(exposure = rnorm(100, 0:100), outcome = rbinom(100, 1, 0.5)),
tibble(exposure = rnorm(100, 0:100), outcome = rbinom(100, 1, 0.5))) %>%
future_map(~glmulti(
y = "outcome",
xr = "exposure",
data = .x,
level = 1,
method = "g", #Genetic algorithm
fitfunction = "glm",
family = binomial,
confsetsize = 2, #Maximum number of possible models, so it doesn't run indefinitely
plotty = F, report = F #To simplify the outputs
))
Here is the error message this gives:
Error in get(as.character(FUN), mode = "function", envir = envir) : object 'aic' of mode 'function' was not found
It runs fine when done sequentially:
#This works
plan(multiprocess, workers = 1 ) #1 worker, so normal map behaviour.
mods <- list(tibble(exposure = rnorm(1000, 0:1000), outcome = rbinom(1000, 1, 0.5)),
tibble(exposure = rnorm(1000, 0:1000), outcome = rbinom(1000, 1, 0.5))) %>%
future_map(~glmulti(
y = "outcome",
xr = "exposure",
data = .x,
level = 1,
method = "g", #Genetic algorithm
fitfunction = "glm",
family = binomial,
confsetsize = 2,
plotty = F, report = F
))
Is there any way to fix this? Or is it just a problem with one of the two packages? Is it more likely that it's an issue with furrr or with glmulti?

Xgboost Hyperparameter Tuning In R for binary classification

I am new to R and trying to do hyper parameter tuning for xgboost- binary classification, However I am getting error, I would appreciate if someone could help me
Error in as.matrix(cv.res)[, 3] : subscript out of bounds In addition: Warning message: 'early.stop.round' is deprecated. Use 'early_stopping_rounds' instead. See help("Deprecated") and help("xgboost-deprecated").
Please find below the code snippet`
I would appreciate if some one could provide another alternative too apart from this approach in R
X_Train <- as(X_train, "dgCMatrix")
GS_LogLoss = data.frame("Rounds" = numeric(),
"Depth" = numeric(),
"r_sample" = numeric(),
"c_sample" = numeric(),
"minLogLoss" = numeric(),
"best_round" = numeric())
for (rounds in seq(50,100, 25)) {
for (depth in c(4, 6, 8, 10)) {
for (r_sample in c(0.5, 0.75, 1)) {
for (c_sample in c(0.4, 0.6, 0.8, 1)) {
for (imb_scale_pos_weight in c(5, 10, 15, 20, 25)) {
for (wt_gamma in c(5, 7, 10)) {
for (wt_max_delta_step in c(5,7,10)) {
for (wt_min_child_weight in c(5,7,10,15)) {
set.seed(1024)
eta_val = 2 / rounds
cv.res = xgb.cv(data = X_Train, nfold = 2, label = y_train,
nrounds = rounds,
eta = eta_val,
max_depth = depth,
subsample = r_sample,
colsample_bytree = c_sample,
early.stop.round = 0.5*rounds,
scale_pos_weight= imb_scale_pos_weight,
max_delta_step = wt_max_delta_step,
gamma = wt_gamma,
objective='binary:logistic',
eval_metric = 'auc',
verbose = FALSE)
print(paste(rounds, depth, r_sample, c_sample, min(as.matrix(cv.res)[,3]) ))
GS_LogLoss[nrow(GS_LogLoss)+1, ] = c(rounds,
depth,
r_sample,
c_sample,
min(as.matrix(cv.res)[,3]),
which.min(as.matrix(cv.res)[,3]))
}
}
}
}
}
}
}
}
`
To do you hyperparameters selection, you could use the metapackage tidymodels, especially the packages parsnip, rsample, yardstick and tune.
A workflow like this would work:
library(tidyverse)
library(tidymodels)
# Specify the model and the parameters to tune (parnsip)
model <-
boost_tree(tree_depth = tune(), mtry = tune()) %>%
set_mode("classification") %>%
set_engine("xgboost")
# Specify the resampling method (rsample)
splits <- vfold_cv(X_train, v = 2)
# Specify the metrics to optimize (yardstick)
metrics <- metric_set(roc_auc)
# Specify the parameters grid (or you can use dials to automate your grid search)
grid <- expand_grid(tree_depth = c(4, 6, 8, 10),
mtry = c(2, 10, 50)) # You can add others
# Run each model (tune)
tuned <- tune_grid(formula = Y ~ .,
model = model,
resamples = splits,
grid = grid,
metrics = metrics,
control = control_grid(verbose = TRUE))
# Check results
show_best(tuned)
autoplot(tuned)
select_best(tuned)
# Update model
tuned_model <-
model %>%
finalize_model(select_best(tuned)) %>%
fit(Y ~ ., data = X_train)
# Make prediction
predict(tuned_model, X_train)
predict(tuned_model, X_test)
Please note that the names during the model specification are subject to change compare to the original names in xgboost because parsnip is a unified interface with consistant names across several models. See here.

ensemble_glmnet: could not find function "predict.cv.glmnet"

I am trying to run the ensemble_glmnet program, but receiving an error that it cannot find predict.cv.glmnet. I have loaded the glmnet and glmnetUtils libraries.
I'm running RStudio 1.2.5033 and R version 3.6.2
library(BuenaVista)
library(glmnet)
library(glmnetUtils)
data<-iris[sample(1:150, size = 150, replace = FALSE),]
data <- derive_variables(dataset=data, type = "dummy", integer = TRUE, return_dataset=TRUE)
data$Species_setosa<-as.factor(data$Species_setosa)
test <-data[101:50,c(1,2,3,4,6,7)]
data<-data[,c(5,1,2,3,4,6,7)]
ensemble_glmnet(y_index = 1, train = data, valid_size = 50, n = 10, alpha = 1, family = "binomial", type = "class")
Error in predict.cv.glmnet(object = cv.glmnet(x = X, y = Y, nfolds =
nfolds, : could not find function "predict.cv.glmnet"

Retrain mxnet model in R

I have created a neural network with mxnet. Now I want to train this model iteratively on new data points. After I simulated a new data point I want to make a new gradient descent update on this model. I do not want to save the model to an external file and load it again.
I have written the following code, but the weights do not change after a new training step. I also get NaN as a training error.
library(mxnet)
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 2, no.bias = TRUE)
lro <- mx.symbol.LinearRegressionOutput(fc1)
# first data observation
train.x = matrix(0, ncol = 3)
train.y = matrix(0, nrow = 2)
# first training step
model = mx.model.FeedForward.create(lro,
X = train.x, y = train.y, initializer = mx.init.uniform(0.001),
num.round = 1, array.batch.size = 1, array.layout = "rowmajor",
learning.rate = 0.1, eval.metric = mx.metric.mae)
print(model$arg.params)
# second data observation
train.x = matrix(0, ncol = 3)
train.x[1] = 1
train.y = matrix(0, nrow = 2)
train.y[1] = -33
# retrain model on new data
# pass on params of old model
model = mx.model.FeedForward.create(symbol = model$symbol,
arg.params = model$arg.params, aux.params = model$aux.params,
X = train.x, y = train.y, num.round = 1,
array.batch.size = 1, array.layout = "rowmajor",
learning.rate = 0.1, eval.metric = mx.metric.mae)
# weights do not change
print(model$arg.params)
I found a solution. begin.round in the second training step must be greater than num.round in the first training step, so that the model continues to train.
library(mxnet)
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 2, no.bias = TRUE)
lro <- mx.symbol.LinearRegressionOutput(fc1)
# first data observation
train.x = matrix(0, ncol = 3)
train.y = matrix(0, nrow = 2)
# first training step
model = mx.model.FeedForward.create(lro,
X = train.x, y = train.y, initializer = mx.init.uniform(0.001),
num.round = 1, array.batch.size = 1, array.layout = "rowmajor",
learning.rate = 0.1, eval.metric = mx.metric.mae)
print(model$arg.params)
# second data observation
train.x = matrix(0, ncol = 3)
train.x[1] = 1
train.y = matrix(0, nrow = 2)
train.y[1] = -33
# retrain model on new data
# pass on params of old model
model = mx.model.FeedForward.create(symbol = model$symbol,
arg.params = model$arg.params, aux.params = model$aux.params,
X = train.x, y = train.y, begin.round = 2, num.round = 3,
array.batch.size = 1, array.layout = "rowmajor",
learning.rate = 0.1, eval.metric = mx.metric.mae)
print(model$arg.params)
did you try to call mx.model.FeedForward.create only once and then use the fit function for incremental training?

How to get each percentages of parameters's contribution in R h2o deeplearning package?

How to get each percentages of parameters's contribution in R h2o deeplearning package?
library(h2o)
localH2O = h2o.init(ip = "localhost", port = 54321, startH 2O = TRUE)
irisPath = system.file("extdata", "iris.csv", package = "h2o")
iris.hex = h2o.importFile(localH2O, path = irisPath)
h2o.deeplearning(x = 1:4, y = 5, data = iris.hex, activation = "Tanh")
h2o.shutdown(localH2O)
When you are building your model, add the following condition: variable_importance = T
This will ensure that when your model is built, it will return your variable importances.
In the Deep Learning demo for R, this requires that you modify the model building process. First, launch the demo by running the following code:
library(h2o)
conn <- h2o.init(nthreads = -1)
demo(h2o.deeplearning)
Then, adjust the code the initiates your model build by adding in the condition mentioned earlier:
model = h2o.deeplearning(x = setdiff(colnames(prostate.hex), c("ID","CAPSULE")), y = "CAPSULE", training_frame = prostate.hex, activation = "Tanh", hidden = c(10, 10, 10), epochs = 10000, variable_importances = T)
Finally, you can do the following to get your variable importances:
> h2o.varimp(model)
Variable Importances:
variable relative_importance scaled_importance percentage
1 PSA 1.000000 1.000000 0.175660
2 VOL 0.937293 0.937293 0.164645
3 GLEASON 0.930565 0.930565 0.163463
4 AGE 0.799607 0.799607 0.140459
5 DCAPS 0.793741 0.793741 0.139429
6 DPROS 0.703781 0.703781 0.123626
7 RACE 0.527824 0.527824 0.092718
Hope this helps!

Resources