Trouble with running h2o.hit_ratio_table - r

I got this issue when I run H2o for xgboost. May I ask how can I solve this issue? Thank you.
I run this code
h2o.hit_ratio_table(gbm2,valid =T)
And I encounter this error
" Error in names(v) <- v_names :
'names' attribute [1] must be the same length as the vector [0]"
Then I proceed run
mean(finalRF_prediction$predict==test_gb$Cover_Type)
and I got the error:
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, :
ERROR MESSAGE:
Name lookup of 'NULL' failed
My model is:
gbm2=h2o.gbm(training_frame = train_gb,validation_frame = valid_gb,x=1:51,y=52,
model_id="gbm2_covType_v2",
ntrees=200,
max_depth = 30,
sample_rate = .7,
col_sample_rate = .7,
learn_rate=.3,
stopping_round=2,
stopping_tolerance = .01,
score_each_iteration = T,seed=2000000)
finalRF_prediction=h2o.predict(object=gbm2,newdata = test_gb)
summary(gbm2)
h2o.hit_ratio_table(gbm2,valid=T)[1,2]
mean(finalRF_prediction$predict==test_gb$Cover_Type)

Without having a dataset to rerun your code on it's hard to say what caused the error. For your second error, check if the column Cover_Type exists in your test_gb dataframe.
The code you have seems to be fine, so I would just double check your column names.
In addition here is a code snippet with xgboost that shows you, you can use the hit_ratio_table() successfully.
library(h2o)
h2o.init()
iris.hex <- h2o.importFile( "http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")
i.sid <- h2o.runif(iris.hex)
iris.train <- h2o.assign(iris.hex[i.sid > .2, ], "iris.train")
iris.test <- h2o.assign(iris.hex[i.sid <= .2, ], "iris.test")
iris.xgboost.valid <- h2o.xgboost(x = 1:4, y = 5, training_frame = iris.train, validation_frame = iris.test)
# Hit ratio
hrt.valid.T <- h2o.hit_ratio_table(iris.xgboost.valid,valid = TRUE)
print(hrt.valid.T)

Related

R: Parallel Processing Error: "Errorr in checkForRemoteErrors(val) : 5 nodes produced errors; first error: Objekt 'RAD' not found"

I am currently conducting an CRQA on heart rate and respiration data. I try to parallelize the computation in the following way, because I want to run them on an external server:
RAD = round(median(Param_list$radius), digits = 2); RAD # defining the final parameter settings
DEL = max(Param_list$delay); DEL
EMB = max(Param_list$emddim); EMB
my_crqa = function(x, y){
crqa::crqa(x, y,
radius = RAD, embed = EMB, delay = DEL,
rescale = 2, normalize = 2,
mindiagline = 25, minvertline = 25, tw = 0, whiteline = FALSE,
side = "both", method = "crqa",
metric = "euclidean", datatype = "continuous")
}
memory.size(max = TRUE)
core = round(detectCores()*0.8, digits = 0)
cl<-makeCluster(core, type="SOCK")
clusterEvalQ(cl, c("my_crqa", "EMB", "DEL", "RAD"))
CRQA_list = clusterMap(cl, my_crqa, HR_clean, HR_clean)
stopCluster(cl) # Stop cores properly.
Unfortunately, I get the following error:
"Errorr in checkForRemoteErrors(val) : 5 nodes produced errors;
first error: Objekt 'RAD' not found"
Why does the cluster not find the RAD, but the the other variables? Also, when I ran the code earlier it worked.
I appreciate any tip!
Best,
Johnson

R -> Error in `row.names<-.data.frame`

Following this other question (Get p-value about contrast hypothesis for rectangular matrix) I am trying to run the following code in R, but the line:
colnames(posmat) <- "pos_c1"
causes an error when calling the function summary().
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘Pos’
Does anybody knows why this error comes up?
Here the MWE:
library(lme4)
library(lmerTest)
library(corpcor)
database <- data.frame(
Clos=factor(c(4,4,1,4,4,3,2,1,2,1,2,2,4,3,1,2,1,4,1,3,2,2,4,4,4,4,2,1,4,2,2,1,4,2,4,2,1,4,4,3)),
Pos=factor(c(2,4,1,2,5,6,7,2,2,2,5,6,3,3,3,8,5,3,4,2,1,4,3,3,2,6,1,8,3,7,5,7,8,3,6,6,1,6,3,7)),
RF=c(8,6,2,9,7,1,7,6,3,4,6,4,5,2,5,5,3,4,1,3,1,2,3,1,2,2,3,1,8,5,2,2,7,1,9,4,5,6,4,2),
Score=c(4,3,3,5,4,3,2,4,5,2,2,3,3,4,4,4,3,2,3,3,5,4,3,4,4,2,3,4,3,4,1,2,2,2,3,4,5,3,1,2)
)
clos_c1 = c(0,0,-1,1)
clos_c2 = c(0,-1,0,1)
clos_c3 = c(-1,0,0,1)
closmat.temp = rbind(constant = 1/4,clos_c1,clos_c2,clos_c3)
closmat = solve(closmat.temp)
closmat = closmat[, -1]
closmat
pos_c1 = c(1/2,1/2,-1/6,-1/6,-1/6,-1/6,-1/6,-1/6)
posmat.temp = rbind(pos_c1)
posmat = pseudoinverse(posmat.temp)
colnames(posmat) <- "pos_c1"
contrasts(database$Clos) = closmat
contrasts(database$Pos) = posmat
model = lmer(Score~Clos+Pos+(1|RF), data = database, REML = TRUE)
summary(model)
The problem is that when you run the model, you have the contrasts(database$Pos) without colnames but just one.
You can see that by running your model variable and you will see 6 variables with the name "Pos". This causes trouble in reading the summary() command. Just by adding the line
colnames(contrasts(database$Pos))<-c("pos1","pos2","pos3","pos4","pos5","pos6","pos7")
after the creation of your contrasts(database$Pos) <- posmat
your code will work. Feel free to put the colnames you require.
The whole code is as follows then:
library(lme4)
library(lmerTest)
library(corpcor)
database <- data.frame(
Clos=factor(c(4,4,1,4,4,3,2,1,2,1,2,2,4,3,1,2,1,4,1,3,2,2,4,4,4,4,2,1,4,2,2,1,4,2,4,2,1,4,4,3)),
Pos=factor(c(2,4,1,2,5,6,7,2,2,2,5,6,3,3,3,8,5,3,4,2,1,4,3,3,2,6,1,8,3,7,5,7,8,3,6,6,1,6,3,7)),
RF=c(8,6,2,9,7,1,7,6,3,4,6,4,5,2,5,5,3,4,1,3,1,2,3,1,2,2,3,1,8,5,2,2,7,1,9,4,5,6,4,2),
Score=c(4,3,3,5,4,3,2,4,5,2,2,3,3,4,4,4,3,2,3,3,5,4,3,4,4,2,3,4,3,4,1,2,2,2,3,4,5,3,1,2)
)
clos_c1 = c(0,0,-1,1)
clos_c2 = c(0,-1,0,1)
clos_c3 = c(-1,0,0,1)
closmat.temp = rbind(constant = 1/4,clos_c1,clos_c2,clos_c3)
closmat = solve(closmat.temp)
closmat = closmat[, -1]
closmat
pos_c1 = c(1/2,1/2,-1/6,-1/6,-1/6,-1/6,-1/6,-1/6)
posmat.temp = rbind(pos_c1)
posmat <- pseudoinverse(posmat.temp)
colnames(posmat) <- "pos_c1"
contrasts(database$Clos) <- closmat
contrasts(database$Pos) <- posmat
##NEW LINE
colnames(contrasts(database$Pos))<-c("pos1","pos2","pos3","pos4","pos5","pos6","pos7")
model <- lmer(Score~Clos+Pos+(1|RF), data = database, REML = TRUE)
summary(model)
I hope it helps. Cheers!

R error : "attempt to select less than one element in get1index"

I'm a beginner in R and I'm trying to use the package ClonEvol, however the documentation on the github webpage is very limited. So for now I'm using their example code and trying to adapt it to my data called ce.
ce <- data.frame(
cluster = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7),
gene = c("geneA","geneB","geneC","geneD","geneA","geneB","geneC","geneD","geneA","geneB","geneC","geneD","geneA","geneB","geneC",
"geneD","geneA","geneB","geneC","geneD","geneA","geneB","geneC","geneD","geneA","geneB","geneC","geneD"),
prim.vaf = c(0.5,0,0,0,0.5,0.5,0,0,1,0.5,0,0,1,0.5,0,0.5,0.5,0.5,0,0.5,0.5,0.5,0,1,0.5,0.5,0.5,0)
)
cluster <- ce$cluster
gene <- ce$gene
prim.vaf <- ce$prim.vaf
x <- ce
vaf.col.names <- grep('prim.vaf', colnames(x), value=T)
sample.names <- gsub('prim.vaf', '', vaf.col.names)
x[, sample.names] <- x[, vaf.col.names]
vaf.col.names <- sample.names
sample.groups <- c('P', 'R');
names(sample.groups) <- vaf.col.names
x <- x[order(x$cluster),]
pdf('box.pdf', width = 3, height = 5, useDingbats = FALSE, title='')
pp <- variant.box.plot(x,
cluster.col.name = ce$cluster,
show.cluster.size = FALSE,
cluster.size.text.color = 'blue',
vaf.col.names = vaf.col.names,
vaf.limits = 70,
sample.title.size = 20,
violin = FALSE,
box = FALSE,
jitter = TRUE,
jitter.shape = 1,
jitter.color = clone.colors,
jitter.size = 3,
jitter.alpha = 1,
jitter.center.method = 'median',
jitter.center.size = 1,
jitter.center.color = 'darkgray',
jitter.center.display.value = 'none',
highlight = 'is.driver',
highlight.note.col.name = 'gene',
highlight.note.size = 2,
highlight.shape =16,
order.by.total.vaf = FALSE
)
dev.off()
However, I get the following error :
Error in .subset2(x, i, exact = exact) : recursive indexing failed at level 2
And if I delete cluster.col.name=ce$cluster and vaf.col.names=vaf.col.names, the error becomes the following :
Error in .subset2(x, i, exact = exact) : attempt to select less than one
element in get1index
Has someone any idea of what went wrong ?
I met this error message today. I know little about the R package in this question, but I think I can show what the error message means here. And maybe it is useful for you to find out the problem.
The error occurs when we use NULL as an index to subset a list.
Here are the expressions invoking this error message:
any.list <- list(1, 2, 3)
# If single brackets, no error occurs:
any.list[NULL]
## list()
# If double brackets, the error occurs:
any.list[[NULL]]
## Error in any.list[[NULL]] :
## attempt to select less than one element in get1index
The list above can be any list, or even a vector.
a.vector <- c(1, 2, 3)
a.vector[[NULL]]
## Error in a.vector[[NULL]] :
## attempt to select less than one element in get1index
Here is a similar error message:
any.list[[0]]
## Error in any.list[[0]] :
## attempt to select less than one element in get1index <real>
END

Questions of xgboost with R

I used xgboost to do logistic regression. I followed the steps from, but I got two problems.The datasets are found here.
First, when I run the follow code:
bst <- xgboost(data = sparse_matrix, label = output_vector,nrounds = 39,param)
Then, I got
[0]train-rmse:0.350006
[1]train-rmse:0.245008
[2]train-rmse:0.171518
[3]train-rmse:0.120065
[4]train-rmse:0.084049
[5]train-rmse:0.058835
[6]train-rmse:0.041185
[7]train-rmse:0.028830
[8]train-rmse:0.020182
[9]train-rmse:0.014128
[10]train-rmse:0.009890
[11]train-rmse:0.006923
[12]train-rmse:0.004846
[13]train-rmse:0.003392
[14]train-rmse:0.002375
[15]train-rmse:0.001662
[16]train-rmse:0.001164
[17]train-rmse:0.000815
[18]train-rmse:0.000570
[19]train-rmse:0.000399
[20]train-rmse:0.000279
[21]train-rmse:0.000196
[22]train-rmse:0.000137
[23]train-rmse:0.000096
[24]train-rmse:0.000067
[25]train-rmse:0.000047
[26]train-rmse:0.000033
[27]train-rmse:0.000023
[28]train-rmse:0.000016
[29]train-rmse:0.000011
[30]train-rmse:0.000008
[31]train-rmse:0.000006
[32]train-rmse:0.000004
[33]train-rmse:0.000003
[34]train-rmse:0.000002
[35]train-rmse:0.000001
[36]train-rmse:0.000001
[37]train-rmse:0.000001
[38]train-rmse:0.000000
train-rmse is finally equal to 0! Is that normal? Usually,I know train-rmse can't be equal to 0. So,where is my problem?
Second, when I run
importance <- xgb.importance(sparse_matrix#Dimnames[[2]], model = bst)
Then, I got a Error:
Error in eval(expr, envir, enclos) : object 'Yes' not found.
I don't know what does it mean, maybe the first question leads to the second one.
library(data.table)
train_x<-fread("train_x.csv")
str(train_x)
train_y<-fread("train_y.csv")
str(train_y)
train<-merge(train_y,train_x,by="uid")
train$uid<-NULL
test<-fread("test_x.csv")
require(xgboost)
require(Matrix)
sparse_matrix <- sparse.model.matrix(y~.-1, data = train)
head(sparse_matrix)
output_vector = train[,y] == "Marked"
param <- list(objective = "binary:logistic", booster = "gblinear",
nthread = 2, alpha = 0.0001,max.depth = 4,eta=1,lambda = 1)
bst <- xgboost(data = sparse_matrix, label = output_vector,nrounds = 39,param)
importance <- xgb.importance(sparse_matrix#Dimnames[[2]], model = bst)
I ran into the same problem (Error in eval(expr, envir, enclos) : object 'Yes' not found.) and the reason was the following:
I tried to do
dt = data.table(x = runif(10), y = 1:10, z = 1:10)
label = as.logical(dt$z)
train = dt[, z := NULL]
trainAsMatrix = as.matrix(train)
label = as.matrix(label)
bst <- xgboost(data = trainAsMatrix, label = label, max.depth = 8,
eta = 0.3, nthread = 2, nround = 50, objective = "reg:linear")
bst$featureNames = names(train)
xgb.importance(model = bst)
The problem comes from the line
label = as.logical(dt$z)
I got this line in there because the last time I used xgboost, I wanted to predict a categorial variable. Now since I want to do regression it should read:
label = dt$z
Maybe something similar causes the problem in your case?
Perhaps this is of any help. I'm often getting the same error when the labels have zero variation. Using the current CRAN version of xgboost, which is somewhat old already (0.4.4). xgb.train happily accepts this (showing a .50 AUC) but the error then shows when calling xgb.importance.
Cheers
Otto
[0] train-auc:0.500000 validate-auc:0.500000
[1] train-auc:0.500000 validate-auc:0.500000
[2] train-auc:0.500000 validate-auc:0.500000
[3] train-auc:0.500000 validate-auc:0.500000
[4] train-auc:0.500000 validate-auc:0.500000
[1] "XGB error: Error in eval(expr, envir, enclos): object 'Yes' not found\n"

R2WinBUGS - Warning messages

I am trying to use R2WinBUGS using this example:
code
(Please only consider the part: ### 5.4. Analysis using WinBUGS)
I am getting this error message:
Error in file(con, "wb") : cannot open the connection
In addition: Warning messages:
1: In file.create(to[okay]) :
cannot create file 'c:/Program Files/WinBUGS14//System/Rsrc/Registry_Rsave.odc', reason 'Permission denied'
2: In file(con, "wb") :
cannot open file 'c:/Program Files/WinBUGS14//System/Rsrc/Registry.odc': Permission denied
Warning message:
running command '"c:/Program Files/WinBUGS14//WinBUGS14.exe" /par "D:/R2WinBUGS/normal/script.txt"' had status 1
>
I am not sure whether this is crucial for correct functionality (everything else seems to look ok). Is there a way to get rid of this?
Thanks.
Christian
PS:
This is the R code:
library(R2WinBUGS)
setwd("D:/R2WinBUGS/normal")
y10 <- rnorm(n = 10, mean = 600, sd = 30) # Sample of 10 birds
y1000 <- rnorm(n = 1000, mean = 600, sd = 30) # Sample of 1000 birds
# Save BUGS description of the model to working directory
sink("model.txt")
cat("
model {
# Priors
population.mean ~ dunif(0,5000) # Normal parameterized by precision
precision <- 1 / population.variance # Precision = 1/variance
population.variance <- population.sd * population.sd
population.sd ~ dunif(0,100)
# Likelihood
for(i in 1:nobs){
mass[i] ~ dnorm(population.mean, precision)
}
}
",fill=TRUE)
sink()
# Package all the stuff to be handed over to WinBUGS
# Bundle data
win.data <- list(mass = y1000, nobs = length(y1000))
# Function to generate starting values
inits <- function()
list (population.mean = rnorm(1,600), population.sd = runif(1, 1, 30))
# Parameters to be monitored (= to estimate)
params <- c("population.mean", "population.sd", "population.variance")
# MCMC settings
nc <- 3 # Number of chains
ni <- 1000 # Number of draws from posterior (for each chain)
nb <- 1 # Number of draws to discard as burn-in
nt <- 1 # Thinning rate
# Start Gibbs sampler: Run model in WinBUGS and save results in object called out
out <- bugs(data = win.data, inits = inits, parameters.to.save = params, model.file = "model.txt",
n.thin = nt, n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE, DIC = TRUE, working.directory = getwd())
ls()
out # Produces a summary of the object
names(out)
str(out)
hist(out$summary[,8]) # Rhat values in the eighth column of the summary
which(out$summary[,8] > 1.1) # None in this case
par(mfrow = c(3,1))
matplot(out$sims.array[1:999,1:3,1], type = "l")
matplot(out$sims.array[,,2] , type = "l")
matplot(out$sims.array[,,3] , type = "l")
par(mfrow = c(3,1))
matplot(out$sims.array[1:20,1:3,1], type = "l")
matplot(out$sims.array[1:20,,2] , type = "l")
matplot(out$sims.array[1:20,,3] , type = "l")
par(mfrow = c(3,1))
hist(out$sims.list$population.mean, col = "grey")
hist(out$sims.list$population.sd, col = "blue")
hist(out$sims.list$population.variance, col = "green")
par(mfrow = c(1,1))
plot(out$sims.list$population.mean, out$sims.list$population.sd)
pairs(cbind(out$sims.list$population.mean, out$sims.list$population.sd, out$sims.list$population.variance))
summary(out$sims.list$population.mean)
summary(out$sims.list$population.sd)
sd(out$sims.list$population.mean)
sd(out$sims.list$population.sd)
summary(lm(y1000 ~ 1))
Probably it is windows UAC fault. By default UAC doesn't allow programms to write in almost anything except the user's folder. You can change that by running R as administrator. But I think that will change the library folder unless it is hardcoded in Renviron.site (inside R\etc folder), but I'm not 100% sure about that.
I was able to fix the problem by defining "bugs.directory".
out <- bugs(data = win.data, inits = inits, parameters.to.save = params, model.file = "model.txt", n.thin = nt, n.chains = nc, n.burnin = nb, n.iter = ni, debug = FALSE, DIC = TRUE, working.directory = getwd(), bugs.directory = 'c:/WinBUGS14')
Your link goes out to a huge file that spans many chapters of a book. In the comments section it says:
# You may have to add a 'working.directory' argument to calls to
# the function bugs().
Have you done that yet? There's also a bunch of user-specific stuff like:
setwd("C:/_Marc Kery/_WinBUGS book/Naked code") # May have to adapt that
Have you appropriately modified those items?

Resources