Questions of xgboost with R - r

I used xgboost to do logistic regression. I followed the steps from, but I got two problems.The datasets are found here.
First, when I run the follow code:
bst <- xgboost(data = sparse_matrix, label = output_vector,nrounds = 39,param)
Then, I got
[0]train-rmse:0.350006
[1]train-rmse:0.245008
[2]train-rmse:0.171518
[3]train-rmse:0.120065
[4]train-rmse:0.084049
[5]train-rmse:0.058835
[6]train-rmse:0.041185
[7]train-rmse:0.028830
[8]train-rmse:0.020182
[9]train-rmse:0.014128
[10]train-rmse:0.009890
[11]train-rmse:0.006923
[12]train-rmse:0.004846
[13]train-rmse:0.003392
[14]train-rmse:0.002375
[15]train-rmse:0.001662
[16]train-rmse:0.001164
[17]train-rmse:0.000815
[18]train-rmse:0.000570
[19]train-rmse:0.000399
[20]train-rmse:0.000279
[21]train-rmse:0.000196
[22]train-rmse:0.000137
[23]train-rmse:0.000096
[24]train-rmse:0.000067
[25]train-rmse:0.000047
[26]train-rmse:0.000033
[27]train-rmse:0.000023
[28]train-rmse:0.000016
[29]train-rmse:0.000011
[30]train-rmse:0.000008
[31]train-rmse:0.000006
[32]train-rmse:0.000004
[33]train-rmse:0.000003
[34]train-rmse:0.000002
[35]train-rmse:0.000001
[36]train-rmse:0.000001
[37]train-rmse:0.000001
[38]train-rmse:0.000000
train-rmse is finally equal to 0! Is that normal? Usually,I know train-rmse can't be equal to 0. So,where is my problem?
Second, when I run
importance <- xgb.importance(sparse_matrix#Dimnames[[2]], model = bst)
Then, I got a Error:
Error in eval(expr, envir, enclos) : object 'Yes' not found.
I don't know what does it mean, maybe the first question leads to the second one.
library(data.table)
train_x<-fread("train_x.csv")
str(train_x)
train_y<-fread("train_y.csv")
str(train_y)
train<-merge(train_y,train_x,by="uid")
train$uid<-NULL
test<-fread("test_x.csv")
require(xgboost)
require(Matrix)
sparse_matrix <- sparse.model.matrix(y~.-1, data = train)
head(sparse_matrix)
output_vector = train[,y] == "Marked"
param <- list(objective = "binary:logistic", booster = "gblinear",
nthread = 2, alpha = 0.0001,max.depth = 4,eta=1,lambda = 1)
bst <- xgboost(data = sparse_matrix, label = output_vector,nrounds = 39,param)
importance <- xgb.importance(sparse_matrix#Dimnames[[2]], model = bst)

I ran into the same problem (Error in eval(expr, envir, enclos) : object 'Yes' not found.) and the reason was the following:
I tried to do
dt = data.table(x = runif(10), y = 1:10, z = 1:10)
label = as.logical(dt$z)
train = dt[, z := NULL]
trainAsMatrix = as.matrix(train)
label = as.matrix(label)
bst <- xgboost(data = trainAsMatrix, label = label, max.depth = 8,
eta = 0.3, nthread = 2, nround = 50, objective = "reg:linear")
bst$featureNames = names(train)
xgb.importance(model = bst)
The problem comes from the line
label = as.logical(dt$z)
I got this line in there because the last time I used xgboost, I wanted to predict a categorial variable. Now since I want to do regression it should read:
label = dt$z
Maybe something similar causes the problem in your case?

Perhaps this is of any help. I'm often getting the same error when the labels have zero variation. Using the current CRAN version of xgboost, which is somewhat old already (0.4.4). xgb.train happily accepts this (showing a .50 AUC) but the error then shows when calling xgb.importance.
Cheers
Otto
[0] train-auc:0.500000 validate-auc:0.500000
[1] train-auc:0.500000 validate-auc:0.500000
[2] train-auc:0.500000 validate-auc:0.500000
[3] train-auc:0.500000 validate-auc:0.500000
[4] train-auc:0.500000 validate-auc:0.500000
[1] "XGB error: Error in eval(expr, envir, enclos): object 'Yes' not found\n"

Related

Error in parse(text = x, keep.source = FALSE) : <text>:1:15: unexpected symbol 1: ID ~ 0+Offset Length

I would like to ask you for help. I am trying to perform MICE imputation of missing values in my dataset. Here is part of the code:
imputed_Data <- mice(data, m=5, maxit = 10, method = "PMM", seed = 200)
Unfortunately, this code returns the following error:
Error in parse(text = x, keep.source = FALSE) :
<text>:1:15: unexpected symbol
1: ID ~ 0+Offset Length
^
Does anybody knows where the mistake is? "ID" and "Offset Length" are variables in my dataset.
Thank you,
Blanka

Trouble with running h2o.hit_ratio_table

I got this issue when I run H2o for xgboost. May I ask how can I solve this issue? Thank you.
I run this code
h2o.hit_ratio_table(gbm2,valid =T)
And I encounter this error
" Error in names(v) <- v_names :
'names' attribute [1] must be the same length as the vector [0]"
Then I proceed run
mean(finalRF_prediction$predict==test_gb$Cover_Type)
and I got the error:
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, :
ERROR MESSAGE:
Name lookup of 'NULL' failed
My model is:
gbm2=h2o.gbm(training_frame = train_gb,validation_frame = valid_gb,x=1:51,y=52,
model_id="gbm2_covType_v2",
ntrees=200,
max_depth = 30,
sample_rate = .7,
col_sample_rate = .7,
learn_rate=.3,
stopping_round=2,
stopping_tolerance = .01,
score_each_iteration = T,seed=2000000)
finalRF_prediction=h2o.predict(object=gbm2,newdata = test_gb)
summary(gbm2)
h2o.hit_ratio_table(gbm2,valid=T)[1,2]
mean(finalRF_prediction$predict==test_gb$Cover_Type)
Without having a dataset to rerun your code on it's hard to say what caused the error. For your second error, check if the column Cover_Type exists in your test_gb dataframe.
The code you have seems to be fine, so I would just double check your column names.
In addition here is a code snippet with xgboost that shows you, you can use the hit_ratio_table() successfully.
library(h2o)
h2o.init()
iris.hex <- h2o.importFile( "http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")
i.sid <- h2o.runif(iris.hex)
iris.train <- h2o.assign(iris.hex[i.sid > .2, ], "iris.train")
iris.test <- h2o.assign(iris.hex[i.sid <= .2, ], "iris.test")
iris.xgboost.valid <- h2o.xgboost(x = 1:4, y = 5, training_frame = iris.train, validation_frame = iris.test)
# Hit ratio
hrt.valid.T <- h2o.hit_ratio_table(iris.xgboost.valid,valid = TRUE)
print(hrt.valid.T)

Error (Setting objectives in 'params' and 'obj' at the same time is not allowed) in xgboost() function in R

Below is the code which i am executing on XGBOOST,
data(Glass, package = "mlbench")
levels(Glass$Type) <- c(0:5) #Proper Sequence. Should start with 0
Glass$Type <- as.integer(as.character(Glass$Type))
set.seed(100)
options(scipen = 999)
library(caret)
R_index <- createDataPartition(Glass$Type, p=.7, list = FALSE)
gl_train <- Glass[R_index,]
gl_test <- Glass[-R_index,]
'%ni%' <- Negate('%in%')
library(xgboost)
library(Matrix)
#Creating the matrix for training the model
train_gl <- xgb.DMatrix(data.matrix(gl_train[ ,colnames(gl_train) %ni% 'Type']),
label = as.numeric(gl_train$Type))
test_gl <- xgb.DMatrix(data.matrix(gl_test[ ,colnames(gl_test) %ni% 'Type']))
watchlist <- list(train = gl_train, test = gl_test)
#Define the parameters and cross validate
param <- list("objective" = "multi:softmax",
"eval_metric" = "mlogloss",
"num_class" = length(unique(gl_train$Type)))
cv.nround <- 5
cv.nfold <- 3
cvMod <- xgb.cv(param = param, data = train_gl,
nfold = cv.nfold,
nrounds = cv.nround,
watchlist=watchlist)
#Build the Model
nrounds = 50
xgMod = xgboost(param = param, data = train_gl, nrounds = nrounds, watchlist = watchlist)
After executing xgMod i am getting the below mentioned error,
Error in check.custom.obj() :
Setting objectives in 'params' and 'obj' at the same time is not allowed
Let me know what's wrong in my code.
Any help is appreciated.
Regards,
Mohan
The problem is due to the watchlist parameter passed to xgboost.
watchlist is a parameter of xgb.train but not of xgboost, hence it is considered by xgboost like "other parameters" (...) .
The following code
xgMod <- xgboost(param = param, data = train_gl, nrounds = nrounds)
works correctly
[1] train-mlogloss:1.259886
[2] train-mlogloss:0.963367
[3] train-mlogloss:0.755535
[4] train-mlogloss:0.601647
[5] train-mlogloss:0.478923
...

error tuning custom algorithm with caret r

I want to tune two parameters of my custom algorithm with caret. Un parameter (lambda) is numeric and the other parameter (prior) is character. This parameter can take two values "known" or "unknown". I've tuned the algorithm with just the lambda parameter. It's okay. But when I add the character parameter (prior) gives me the following error:
1: In eval(expr, envir, enclos) : model fit failed for Resample01:
lambda=1, prior=unknown Error in mdp(Class = y, data = x, lambda =
param$lambda, prior = param$prior, : object 'assignment' not found
the error must be related with the way to specify the character parameter (prior). Here is my code:
my_mod$parameters <- data.frame(
parameter = c("lambda","prior"),
class = c("numeric", "character"),
label = c("sample_length", "prior_type"))
## The grid Element
my_mod$grid <- function(x, y, len = NULL){expand.grid(lambda=1:2,prior=c("unknown", "known"))}
mygrid<-expand.grid(lambda=1:2,prior=c('unknown','known'))
## The fit Element
my_mod$fit <- function(x, y, wts, param, lev, last, classProbs, ...){
mdp(Class=y,data=x,lambda=param$lambda,prior=param$prior,info.pred ="yes")
}
## The predict Element
mdpPred <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
predict.mdp(modelFit, newdata)
my_mod$predict <- mdpPred
fitControl <- trainControl(method = "cv",number = 5,repeats = 5)
train(x=data, y = factor(Class),method = my_mod,trControl = fitControl, tuneGrid = mygrid)
That is because you must specify as.character(param$prior) in the fit function.

Error in xgboost() in R

I am trying to use xgboost(), but I am getting following error:
Error in xgb.DMatrix(data, label = label) : can not open file "0"
If I traceback,
traceback()
4: .Call("XGDMatrixCreateFromFile_R", data, as.integer(FALSE), PACKAGE = "xgboost")
3: xgb.DMatrix(data, label = label)
2: xgb.get.DMatrix(data, label)
1: xgboost(data = as.matrix(trainSet[, 1:13]), label = trainSet[,
"count"], max.depth = depth, nround = rounds, objective = "reg:linear",
verbose = 0) at #5
Any reason why I am getting the above error. I would appreciate any kind of help.
Thanks in advance!
Check if your data has character or factor variables and try to convert them to numerical.

Resources