Understanding R-syntax in the code for Bayesian Optimization - r

This is with reference to this answer on implementation of Bayesian Optimization. I am unable to understand the following R-code that defines a function xgb.cv.bayes(). The code is as follows:
xgb.cv.bayes <- function(max.depth, min_child_weight, subsample, colsample_bytree, gamma){
cv <- xgv.cv(params = list(booster = 'gbtree', eta = 0.05,
max_depth = max.depth,
min_child_weight = min_child_weight,
subsample = subsample,
colsample_bytree = colsample_bytree,
gamma = gamma,
lambda = 1, alpha = 0,
objective = 'binary:logistic',
eval_metric = 'auc'),
data = data.matrix(df.train[,-target.var]),
label = as.matrix(df.train[, target.var]),
nround = 500, folds = cv_folds, prediction = TRUE,
showsd = TRUE, early.stop.round = 5, maximize = TRUE,
verbose = 0
)
list(Score = cv$dt[, max(test.auc.mean)],
Pred = cv$pred)
}
I am unable to understand the following part of code that comes after closing parenthesis of xgb.cv():
list(Score = cv$dt[, max(test.auc.mean)],
Pred = cv$pred)
Or very briefly, I do not understand the following syntax:
xgb.cv.bayes <- function(max.depth, min_child_weight, subsample, colsample_bytree, gamma){
cv <- xgv.cv(...)list(...)
}
I will be grateful in understanding this R-syntax and where can I find more examples of this.

In R the value of the last expression in a function is automatically the return value of this function. So the function you presented has exactly two steps:
compute the result of xgv.cv(...) and store the result in a
variable cv
create a list with two entries (Score and Pred)
whose values are extracted from cv.
Since the expression that creates the list is the last expression in the function, the list is automatically the return value. So, if you would execute test <- xgb.cv.bayes(...) you could then access test$Score and test$Pred.
Does this answer your question?

Related

MXNET softmax output: label shape confusion

I have not got a clear idea about how labels for the softmax classifier should be shaped.
What I could understand from my experiments is that a scalar laber indicating the index of class probability output is one option, while another is a 2D label where the rows are class probabilities, or one-hot encoded variable, like c(1, 0, 0).
What puzzles me though is that:
I can use sclalar label values that go beyong indexing, like 4 in my
example below -- without warning or error. Why is that?
When my label is a negative scalar or an array with a negative value,
the model converges to uniform probablity distribution over classes.
For example, is this expected that actor_train.y = matrix(c(0, -1,v0), ncol = 1) results in equal probabilities in the softmax output?
I try to use softmax MXNET classifier to produce the policy gradient
reifnrocement learning, and my negative rewards lead to the issue
above: uniform probability. Is that expected?
require(mxnet)
actor_initializer <- mx.init.Xavier(rnd_type = "gaussian",
factor_type = "avg",
magnitude = 0.0001)
actor_nn_data <- mx.symbol.Variable('data') actor_nn_label <- mx.symbol.Variable('label')
device.cpu <- mx.cpu()
NN architecture
actor_fc3 <- mx.symbol.FullyConnected(
data = actor_nn_data
, num_hidden = 3 )
actor_output <- mx.symbol.SoftmaxOutput(
data = actor_fc3
, label = actor_nn_label
, name = 'actor' )
crossentfunc <- function(label, pred)
{
- sum(label * log(pred)) }
actor_loss <- mx.metric.custom(
feval = crossentfunc
, name = "log-loss"
)
initialize NN
actor_train.x <- matrix(rnorm(11), nrow = 1)
actor_train.y = 0 #1 #2 #3 #-3 # matrix(c(0, 0, -1), ncol = 1)
rm(actor_model)
actor_model <- mx.model.FeedForward.create(
symbol = actor_output,
X = actor_train.x,
y = actor_train.y,
ctx = device.cpu,
num.round = 100,
array.batch.size = 1,
optimizer = 'adam',
eval.metric = actor_loss,
clip_gradient = 1,
wd = 0.01,
initializer = actor_initializer,
array.layout = "rowmajor" )
predict(actor_model, actor_train.x, array.layout = "rowmajor")
It is quite strange to me, but I found a solution.
I changed optimizer from optimizer = 'adam' to optimizer = 'rmsprop', and the NN started to converge as expected in case of negative targets. I made simulations in R using a simple NN and optim function to get the same result.
Looks like adam or SGD may be buggy or whatever in case of multinomial classification... I also used to get stuck at the fact those optimizers did not converge to a perfect solution on just 1 example, while rmsprop does! Be aware!

Using XGBoost in R for regression based model

I'm trying to use XGBoost as a replacement for gbm.
The scores I'm getting are rather odd, so I'm thinking maybe I'm doing something wrong in my code.
My data contains several factor variables, all other numeric.
Response variable is a continuous variable indicating a House-Price.
I Understand that in order to use XGBoost, I need to use One Hot Enconding for those. I'm doing so by using the following code:
Xtest <- test.data
Xtrain <- train.data
XSalePrice <- Xtrain$SalePrice
Xtrain$SalePrice <- NULL
# Combine data
Xall <- data.frame(rbind(Xtrain, Xtest))
# Get categorical features names
ohe_vars <- names(Xall)[which(sapply(Xall, is.factor))]
# Convert them
dummies <- dummyVars(~., data = Xall)
Xall_ohe <- as.data.frame(predict(dummies, newdata = Xall))
# Replace factor variables in data with OHE
Xall <- cbind(Xall[, -c(which(colnames(Xall) %in% ohe_vars))], Xall_ohe)
After that, I'm splitting the data back to the test & train set:
Xtrain <- Xall[1:nrow(train.data), ]
Xtest <- Xall[-(1:nrow(train.data)), ]
And then building a model, and printing the RMSE & Rsquared:
# Model
xgb.fit <- xgboost(data = data.matrix(Xtrain), label = XSalePrice,
booster = "gbtree", objective = "reg:linear",
colsample_bytree = 0.2, gamma = 0.0,
learning_rate = 0.05, max_depth = 6,
min_child_weight = 1.5, n_estimators = 7300,
reg_alpha = 0.9, reg_lambda = 0.5,
subsample = 0.2, seed = 42,
silent = 1, nrounds = 25)
xgb.pred <- predict(xgb.fit, data.matrix(Xtrain))
postResample(xgb.pred, XSalePrice)
Problem is I'm getting very off RMSE & Rsxquare:
RMSE Rsquared
1.877639e+05 5.308910e-01
That are VERY far from the results I get when using GBM.
I'm thinking i'm doing something wrong, my best guess it probably with the One Hot Encoding phase which I'm unfamiliar, So used a googled code with adjustments to my data.
Can someone indicate what am I doing wrong and how to 'fix' it?
UPDATE:
After reviewing #Codutie answer, my code has some errors:
Xtrain <- sparse.model.matrix(SalePrice ~. , data = train.data)
XDtrain <- xgb.DMatrix(data = Xtrain, label = "SalePrice")
xgb.DMatrix produces:
Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) :
The length of labels must equal to the number of rows in the input data
train.data is data frame, and it has 1453 rows. Label SalePrice also contains 1453 values (No missing values)
Thanks
train <- dat[train_ind,]
train.y <- train[,ncol(train_ind)]
xgboost(data =data.matrix(train[,-1]),
label = train.y,
objective = "reg:linear",
eval_metric = "rmse",
max.depth =15,
eta = 0.1,
nround = 15,
subsample = 0.5,
colsample_bytree = 0.5,
num_class = 12,
nthread = 3
)
Two clues to control XGB for Regression,
1) eta : if eta is small, models tends to overfit
2) eval_metric : Not sure if xgb allowed user to use their own eval_metric. But this metric is not useful when the quantitative dependent variable contains outlier. Check if XGB support hubber loss function.

Parameter optimization in R and H2O

I need to perform parameter optimization on a gbm model on RH2o. I am relatively new to H2o and I think I need to convert ntrees and learn_rate(below) into a H2o vector before performing the below.
How do I perform this operation?
Thanks!
ntrees <- c(100,200,300,400)
learn_rate <- c(1,0.5,0.1)
for (i in ntrees){
for j in learn_rate{
n = ntrees[i]
l= learn_rate[j]
gbm_model <- h2o.gbm(features, label, training_frame = train, validation_frame = valid, ntrees=ntrees[[i]],max_depth = 5,learn_rate=learn_rate[j])
print(c(ntrees[i],learn_rate[j],h2o.mse(h2o.performance(gbm_model, valid = TRUE))))
}
}
you can use h2o.grid() to do your grid search
# specify your hyper parameters
hyper_params = list( ntrees = c(100,200,300,400), learn_rate = c(1,0.5,0.1) )
# then build your grid
grid <- h2o.grid(
## hyper parameters
hyper_params = hyper_params,
## which algorithm to run
algorithm = "gbm",
## identifier for the grid, to later retrieve it
grid_id = "my_grid",
## standard model parameters
x = features,
y = label,
training_frame = train,
validation_frame = valid,
## set a seed for reproducibility
seed = 1234)
you can read more about how h2o.grid() works in the R documentation http://docs.h2o.ai/h2o/latest-stable/h2o-r/h2o_package.pdf
Lauren's answer, to use grids, is the best one here. I'll just quickly point out that what you have written is a usable approach, and one you can fall back on when grids don't do something you need.
Your example didn't include any data (see https://stackoverflow.com/help/mcve) so I couldn't run it, but I corrected the couple of syntax issues I noticed (R's for-in loop directly gives you the value, not the index, and parentheses around the 2nd for loop):
ntrees <- c(100,200,300,400)
learn_rate <- c(1,0.5,0.1)
for (n in ntrees){
for (l in learn_rate){
gbm_model <- h2o.gbm(
features, label, training_frame = train, validation_frame = valid,
ntrees = n,max_depth = 5,learn_rate = l
)
print(c(n,l,h2o.mse(h2o.performance(gbm_model, valid = TRUE))))
}
}
An example of when you'd use nested loops, like this, is when you want to skip certain combinations. E.g. You might decide to only test ntrees of 100 with learn rate of 0.1, which would then look like this:
ntrees <- c(100,200,300,400)
learn_rate <- c(1,0.5,0.1)
for (n in ntrees){
for (l in learn_rate){
if(l == 0.1 && n > 100)next #Skip when n is 200,300,400
gbm_model <- h2o.gbm(
features, label, training_frame = train, validation_frame = valid,
ntrees = n,max_depth = 5,learn_rate = l
)
print(c(n,l,h2o.mse(h2o.performance(gbm_model, valid = TRUE))))
}
}

How to use xgboost R tree dump to compute or do predictions?

Taking cue from xgboost xgb.dump tree coefficient question.
I specifically want to know if eta = 0.1 or 0.01 how will the probability calculation differ from the answer provided?
I want to do predictions using the tree dump.
My code is
#Define train label and feature frames/matrix
y <- train_data$esc_ind
train_data = as.matrix(train_data)
trainX <- as.matrix(train_data[,-1])
param <- list("objective" = "binary:logistic",
"eval_metric" = "logloss",
"eta" = 0.5,
"max_depth" = 2,
"colsample_bytree" = .8,
"subsample" = 0.8, #0.75
"alpha" = 1
)
#Train XGBoost
bst = xgboost(param=param, data = trainX, label = y, nrounds=2)
trainX1 = data.frame(trainX)
mpg.fmap = genFMap(trainX1, "xgboost.fmap")
xgb.save(bst, "xgboost.model")
xgb.dump(bst, "xgboost.model_6.txt",with.stats = TRUE, fmap = "xgboost.fmap")
The tree looks like:
booster[0]
0:[order.1<12.2496] yes=1,no=2,missing=2,gain=1359.61,cover=7215.25
1:[access.1<0.196687] yes=3,no=4,missing=4,gain=3.19685,cover=103.25
3:leaf=-0,cover=1
4:leaf=0.898305,cover=102.25
2:[team<6.46722] yes=5,no=6,missing=6,gain=753.317,cover=7112
5:leaf=0.893333,cover=55.25
6:leaf=-0.943396,cover=7056.75
booster[1]
0:[issu.1<6.4512] yes=1,no=2,missing=2,gain=794.308,cover=5836.81
1:[team<3.23361] yes=3,no=4,missing=4,gain=18.6294,cover=67.9586
3:leaf=0.609363,cover=21.4575
4:leaf=1.28181,cover=46.5012
2:[case<6.74709] yes=5,no=6,missing=6,gain=508.34,cover=5768.85
5:leaf=1.15253,cover=39.2126
6:leaf=-0.629773,cover=5729.64
Will the coefficient for all tree leaf scores for xgboost be 1 when eta is chosen less than 1?
Actually this was practical which I have overseen earlier.
Using the above tree structure one can find the probability for each training example.
The parameter list was:
param <- list("objective" = "binary:logistic",
"eval_metric" = "logloss",
"eta" = 0.5,
"max_depth" = 2,
"colsample_bytree" = .8,
"subsample" = 0.8,
"alpha" = 1)
For the instance set in leaf booster[0], leaf: 0-3; the probability will be exp(-0)/(1+exp(-0)).
And for booster[0], leaf: 0-3 + booster[1], leaf: 0-3; the probability will be exp(0+ 0.609363)/(1+exp(0 + 0.609363)).
And so on as one goes on increasing number of iterations.
I matched these values with R's predicted probabilities they differ in 10^(-7), probably due to floating point curtailing of leaf quality scores.
This answer can give a production level solution when R's trained boosted trees are used in different environment for prediction.
Any comment on this will be highly appreciated.

Understanding num_classes for xgboost in R

I'm having a lot of trouble figuring out how to correctly set the num_classes for xgboost.
I've got an example using the Iris data
df <- iris
y <- df$Species
num.class = length(levels(y))
levels(y) = 1:num.class
head(y)
df <- df[,1:4]
y <- as.matrix(y)
df <- as.matrix(df)
param <- list("objective" = "multi:softprob",
"num_class" = 3,
"eval_metric" = "mlogloss",
"nthread" = 8,
"max_depth" = 16,
"eta" = 0.3,
"gamma" = 0,
"subsample" = 1,
"colsample_bytree" = 1,
"min_child_weight" = 12)
model <- xgboost(param=param, data=df, label=y, nrounds=20)
This returns an error
Error in xgb.iter.update(bst$handle, dtrain, i - 1, obj) :
SoftmaxMultiClassObj: label must be in [0, num_class), num_class=3 but found 3 in label
If I change the num_class to 2 I get the same error. If I increase the num_class to 4 then the model runs, but I get 600 predicted probabilities back, which makes sense for 4 classes.
I'm not sure if I'm making an error or whether I'm failing to understand how xgboost works. Any help would be appreciated.
label must be in [0, num_class)
in your script add y<-y-1 before model <-...
I ran into this rather weird problem as well. It seemed in my class to be a result of not properly encoding the labels.
First, using a string vector with N classes as the labels, I could only get the algorithm to run by setting num_class = N + 1. However, this result was useless, because I only had N actual classes and N+1 buckets of predicted probabilities.
I re-encoded the labels as integers and then num_class worked fine when set to N.
# Convert classes to integers for xgboost
class <- data.table(interest_level=c("low", "medium", "high"), class=c(0,1,2))
t1 <- merge(t1, class, by="interest_level", all.x=TRUE, sort=F)
and
param <- list(booster="gbtree",
objective="multi:softprob",
eval_metric="mlogloss",
#nthread=13,
num_class=3,
eta_decay = .99,
eta = .005,
gamma = 1,
max_depth = 4,
min_child_weight = .9,#1,
subsample = .7,
colsample_bytree = .5
)
For example.
I was seeing the same error, my issue was that I was using an eval_metric that was only meant to be used for multiclass labels when my data had binary labels. See eval_metric in the Learning Class Parameters section of the XGBoost docs for a list of all of the options.
I had this problem and it turned out that I was trying to subtract 1 from my predictor which was already in the units of 0 and 1. Probably a novice mistake, but in case anyone else is running into this with a binary response variable that is already 0 and 1 it is something to make note of.
Tutorial said:
label = as.integer(iris$Species)-1
What worked for me (response is high_end):
label = as.integer(high_end)

Resources