Error in xgboost() in R - r

I am trying to use xgboost(), but I am getting following error:
Error in xgb.DMatrix(data, label = label) : can not open file "0"
If I traceback,
traceback()
4: .Call("XGDMatrixCreateFromFile_R", data, as.integer(FALSE), PACKAGE = "xgboost")
3: xgb.DMatrix(data, label = label)
2: xgb.get.DMatrix(data, label)
1: xgboost(data = as.matrix(trainSet[, 1:13]), label = trainSet[,
"count"], max.depth = depth, nround = rounds, objective = "reg:linear",
verbose = 0) at #5
Any reason why I am getting the above error. I would appreciate any kind of help.
Thanks in advance!

Check if your data has character or factor variables and try to convert them to numerical.

Related

Simple regression linreg error in R: argument 'modeltest'must be either TRUE or FALSE

Hello everyone I am having an issue with my R software. When I try to run a simple regression i get this error:
model1 <- linReg(data = dat, dep = 'Intent', blocks = list(c('Enjoy')), modelTest = 'f', stdEst = TRUE)
model1
Error: Argument 'modelTest' must be either TRUE or FALSE
I have tried changing the to a capital and typing in FALSE instead but when I type in FALSE I get this error:
model1 <- linReg(data = dat, dep = 'Intent', blocks = list(c('Enjoy')), modelTest = FALSE, stdEst = TRUE)
Error in eval(predvars, data, env) : object 'XRW5qb3k' not found
Can anyone please help me with this? All o the descriptives, graphs, and plots run just fine. I have the packages jmv, psych, and car loaded
Thank you for your help!

Clustering by M3C package : Error in `[.data.frame`(df, neworder2) : undefined columns selected

I had a similar problem to what posted here. To resolve the issue, followed the answer by #Jack Gisby there. Now a new error showed up:
Working on TCGA data , I am getting the same error (first error):
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
running duplicated() on each relevant field returned FALSE.
Her is the second error (just after trimming identifiers to not start with a common string like "TCGA-"):
Error in `[.data.frame`(df, neworder2) : undefined columns selected
> traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(df, neworder2)
3: df[neworder2]
2: M3Creal(as.matrix(mydata), maxK = maxK, reps = repsreal, pItem = pItem,
pFeature = 1, clusterAlg = clusteralg, distance = distance,
title = "/home/christopher/Desktop/", des = des, lthick = lthick,
dotsize = dotsize, x1 = pacx1, x2 = pacx2, seed = seed, removeplots = removeplots,
silent = silent, fsize = fsize, method = method, objective = objective)
1: M3C(pro.vst, des = clin, removeplots = FALSE, iters = 25, objective = "PAC",
fsize = 8, lthick = 1, dotsize = 1.25)
I've added to an opened issue on the M3C GitHub.
I got the same error as Hamid Ghaedi while running M3C. I managed to track it down to the following line of code (line 476 on the M3C.R file):
df <- data.frame(m_matrix)
Many of my sample names (column names) started with a number and the data.frame() function added an "X" to the beginning of each name that started with a number ("1" becomes "X1"). This caused a mismatch with the names listed in neworder2.
To get around this problem, I changed all of my sample names to start with a letter and M3C is now running correctly.
Edit: This workaround can be easily applied by using the data.frame() function on your input dataset before running M3C.

Error in parse(text = x, keep.source = FALSE) : <text>:1:15: unexpected symbol 1: ID ~ 0+Offset Length

I would like to ask you for help. I am trying to perform MICE imputation of missing values in my dataset. Here is part of the code:
imputed_Data <- mice(data, m=5, maxit = 10, method = "PMM", seed = 200)
Unfortunately, this code returns the following error:
Error in parse(text = x, keep.source = FALSE) :
<text>:1:15: unexpected symbol
1: ID ~ 0+Offset Length
^
Does anybody knows where the mistake is? "ID" and "Offset Length" are variables in my dataset.
Thank you,
Blanka

Trouble with running h2o.hit_ratio_table

I got this issue when I run H2o for xgboost. May I ask how can I solve this issue? Thank you.
I run this code
h2o.hit_ratio_table(gbm2,valid =T)
And I encounter this error
" Error in names(v) <- v_names :
'names' attribute [1] must be the same length as the vector [0]"
Then I proceed run
mean(finalRF_prediction$predict==test_gb$Cover_Type)
and I got the error:
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, :
ERROR MESSAGE:
Name lookup of 'NULL' failed
My model is:
gbm2=h2o.gbm(training_frame = train_gb,validation_frame = valid_gb,x=1:51,y=52,
model_id="gbm2_covType_v2",
ntrees=200,
max_depth = 30,
sample_rate = .7,
col_sample_rate = .7,
learn_rate=.3,
stopping_round=2,
stopping_tolerance = .01,
score_each_iteration = T,seed=2000000)
finalRF_prediction=h2o.predict(object=gbm2,newdata = test_gb)
summary(gbm2)
h2o.hit_ratio_table(gbm2,valid=T)[1,2]
mean(finalRF_prediction$predict==test_gb$Cover_Type)
Without having a dataset to rerun your code on it's hard to say what caused the error. For your second error, check if the column Cover_Type exists in your test_gb dataframe.
The code you have seems to be fine, so I would just double check your column names.
In addition here is a code snippet with xgboost that shows you, you can use the hit_ratio_table() successfully.
library(h2o)
h2o.init()
iris.hex <- h2o.importFile( "http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")
i.sid <- h2o.runif(iris.hex)
iris.train <- h2o.assign(iris.hex[i.sid > .2, ], "iris.train")
iris.test <- h2o.assign(iris.hex[i.sid <= .2, ], "iris.test")
iris.xgboost.valid <- h2o.xgboost(x = 1:4, y = 5, training_frame = iris.train, validation_frame = iris.test)
# Hit ratio
hrt.valid.T <- h2o.hit_ratio_table(iris.xgboost.valid,valid = TRUE)
print(hrt.valid.T)

Questions of xgboost with R

I used xgboost to do logistic regression. I followed the steps from, but I got two problems.The datasets are found here.
First, when I run the follow code:
bst <- xgboost(data = sparse_matrix, label = output_vector,nrounds = 39,param)
Then, I got
[0]train-rmse:0.350006
[1]train-rmse:0.245008
[2]train-rmse:0.171518
[3]train-rmse:0.120065
[4]train-rmse:0.084049
[5]train-rmse:0.058835
[6]train-rmse:0.041185
[7]train-rmse:0.028830
[8]train-rmse:0.020182
[9]train-rmse:0.014128
[10]train-rmse:0.009890
[11]train-rmse:0.006923
[12]train-rmse:0.004846
[13]train-rmse:0.003392
[14]train-rmse:0.002375
[15]train-rmse:0.001662
[16]train-rmse:0.001164
[17]train-rmse:0.000815
[18]train-rmse:0.000570
[19]train-rmse:0.000399
[20]train-rmse:0.000279
[21]train-rmse:0.000196
[22]train-rmse:0.000137
[23]train-rmse:0.000096
[24]train-rmse:0.000067
[25]train-rmse:0.000047
[26]train-rmse:0.000033
[27]train-rmse:0.000023
[28]train-rmse:0.000016
[29]train-rmse:0.000011
[30]train-rmse:0.000008
[31]train-rmse:0.000006
[32]train-rmse:0.000004
[33]train-rmse:0.000003
[34]train-rmse:0.000002
[35]train-rmse:0.000001
[36]train-rmse:0.000001
[37]train-rmse:0.000001
[38]train-rmse:0.000000
train-rmse is finally equal to 0! Is that normal? Usually,I know train-rmse can't be equal to 0. So,where is my problem?
Second, when I run
importance <- xgb.importance(sparse_matrix#Dimnames[[2]], model = bst)
Then, I got a Error:
Error in eval(expr, envir, enclos) : object 'Yes' not found.
I don't know what does it mean, maybe the first question leads to the second one.
library(data.table)
train_x<-fread("train_x.csv")
str(train_x)
train_y<-fread("train_y.csv")
str(train_y)
train<-merge(train_y,train_x,by="uid")
train$uid<-NULL
test<-fread("test_x.csv")
require(xgboost)
require(Matrix)
sparse_matrix <- sparse.model.matrix(y~.-1, data = train)
head(sparse_matrix)
output_vector = train[,y] == "Marked"
param <- list(objective = "binary:logistic", booster = "gblinear",
nthread = 2, alpha = 0.0001,max.depth = 4,eta=1,lambda = 1)
bst <- xgboost(data = sparse_matrix, label = output_vector,nrounds = 39,param)
importance <- xgb.importance(sparse_matrix#Dimnames[[2]], model = bst)
I ran into the same problem (Error in eval(expr, envir, enclos) : object 'Yes' not found.) and the reason was the following:
I tried to do
dt = data.table(x = runif(10), y = 1:10, z = 1:10)
label = as.logical(dt$z)
train = dt[, z := NULL]
trainAsMatrix = as.matrix(train)
label = as.matrix(label)
bst <- xgboost(data = trainAsMatrix, label = label, max.depth = 8,
eta = 0.3, nthread = 2, nround = 50, objective = "reg:linear")
bst$featureNames = names(train)
xgb.importance(model = bst)
The problem comes from the line
label = as.logical(dt$z)
I got this line in there because the last time I used xgboost, I wanted to predict a categorial variable. Now since I want to do regression it should read:
label = dt$z
Maybe something similar causes the problem in your case?
Perhaps this is of any help. I'm often getting the same error when the labels have zero variation. Using the current CRAN version of xgboost, which is somewhat old already (0.4.4). xgb.train happily accepts this (showing a .50 AUC) but the error then shows when calling xgb.importance.
Cheers
Otto
[0] train-auc:0.500000 validate-auc:0.500000
[1] train-auc:0.500000 validate-auc:0.500000
[2] train-auc:0.500000 validate-auc:0.500000
[3] train-auc:0.500000 validate-auc:0.500000
[4] train-auc:0.500000 validate-auc:0.500000
[1] "XGB error: Error in eval(expr, envir, enclos): object 'Yes' not found\n"

Resources