Using your own model in train (caret package)? - r

I am trying to use train from Caret with a package which is not included, and I get an error that I don't manage to figure out, any idea ? I used the following link to get started
bmsMeth<-list(type="Regression",library="BMS",loop=NULL,prob=NULL)
prm<-data.frame(parameter="mprior.size",class="numeric",label="mprior.size")
bmsMeth$parameters<-prm
bmsGrid<-function(x,y,len=NULL){
out<-expand.grid(mprior.size=seq(2,3,by=len))
out
}
bmsMeth$grid<-bmsGrid
bmsFit<-function(x,y,param, lev=NULL) {bms(cbind(y,x),burn=5000,iter=100000,nmodel=1000,mcmc="bd",g="UIP",mprior.size=param$mprior.size)}
bmsMeth$fit<-bmsFit
bmsPred<-function(modelFit,newdata,preProcess=NULL,submodels=NULL){predict(modelFit,newdata)}
bmsMeth$predict<-bmsPred
library(caret)
data.train<-data.frame(runif(100),runif(100),runif(100),runif(100),runif(100))#synthetic data for testing
bms(cbind(data.train[,1],data.train[,-1]),burn=5000,iter=100000,nmodel=1000,mcmc="bd",g="UIP",mprior.size=2)#function out of caret is working
preProcess=c('center','scale')
myTimeControl <- trainControl(method = "timeslice",initialWindow = 0.99*nrow(data.train), horizon = 1, fixedWindow = FALSE)
tune <- train(data.train[,-1],data.train[,1],preProcess=preProcess,method = bmsMeth,tuneLength=2,metric= "RMSE",trControl =myTimeControl,type="Regression")
Error I get :
Error in train.default(data.train[, -1], data.train[, 1], preProcess =
preProcess, : Stopping In addition: Warning messages: 1: In
eval(expr, envir, enclos) : model fit failed for Training1:
mprior.size=2 Error in method$fit(x = x, y = y, wts = wts, param =
tuneValue, lev = obsLevels, : unused arguments (wts = wts, last =
last, classProbs = classProbs, type = "Regression")
2: In nominalTrainWorkflow(x = x, y = y, wts = weights, info =
trainInfo, : There were missing values in resampled performance
measures.

Apparantly, I just had to put the arguments in the function even if I never use them :
bmsFit<-function(x,y,param, lev=NULL, last, weights, classProbs, ...) {bms(data.frame(y,x),burn=5000,iter=100000,nmodel=1000,mcmc="bd",g="UIP",mprior.size=param$mprior.size)}

Your function bms() does not seem to exist ...

Related

Got a ' Model is empty!' error when using CIBERSORT for R

when I used CIBERSORT in R to evaluate my data for immune infiltration I got following error form R :
> CIBERSORT_all <- CIBERSORT(LM22.file,exp.file, perm = 1000, QN = T)
Error in predict.svm(ret, xhold, decision.values = TRUE) :
Model is empty!
Called from: predict.svm(ret, xhold, decision.values = TRUE)
and the error located in this function
function (object, newdata, decision.values = FALSE, probability = FALSE, ..., na.action = na.omit)
{...
`if (object$tot.nSV < 1) `
stop("Model is empty!")
...}
but I can't figure out where I got my problem,since I used this function according to the instruction.

Error in parse(text = x, keep.source = FALSE) : <text>:1:15: unexpected symbol 1: ID ~ 0+Offset Length

I would like to ask you for help. I am trying to perform MICE imputation of missing values in my dataset. Here is part of the code:
imputed_Data <- mice(data, m=5, maxit = 10, method = "PMM", seed = 200)
Unfortunately, this code returns the following error:
Error in parse(text = x, keep.source = FALSE) :
<text>:1:15: unexpected symbol
1: ID ~ 0+Offset Length
^
Does anybody knows where the mistake is? "ID" and "Offset Length" are variables in my dataset.
Thank you,
Blanka

Retrieving Variable Importance from Caret trained model with "lda2", "qda", "lda"

I can get variable importance out from "nnet" and "knn" models, but not from "lda", "lda2", and "qda".
I am using varImp(). I've tried everything I can think of and just can't get a proper idea of what the variable importance is.
Here is my code for training the model:
lda_model <- train(quality2 ~ .,
data = train_data,
method = "lda",
preProcess = c("center", "scale"),
trControl = trainControl(method = "repeatedcv",
number = 10,
repeats = 2),
importance = TRUE)
and here is the error I get when I try to check importance:
> varImp(lda_model)
Error in model.frame.default(formula = y ~ x, na.action = na.omit, drop.unused.levels = TRUE) :
invalid type (list) for variable 'y'
In addition: Warning messages:
1: In mean.default(y, rm.na = TRUE) :
argument is not numeric or logical: returning NA
2: In Ops.factor(left, right) : ‘-’ not meaningful for factors
I know this means it's treating it as an object class list instead of a trained model, and I've tried it on lda_model$finalmodel and others, but it's still not working.
How can I get proper feedback when using lda/qda on how my model is performing and which variables are performing best?
I had the same problem and it seems to come from the way of the dataset is imported in R. I first imported with the {readxl} package and varImp() didn't work. Then I tried to import throught the clipboard and now varImp is working on my lda model build with {caret}.
My code with {readxl} :
library(readxl)
glauc <- read_excel("Glaucome.xlsx", sheet="GlaucomaM")
rownames(glauc) <- glauc$IDENT
glauc$IDENT <- NULL
glauc$Class <- as.factor(glauc$Class)
library(caret)
numappr <- createDataPartition(glauc$Class, p=0.7)
appr <- glauc[numappr$Resample1,]
test <- glauc[-numappr$Resample1,]
Ctrl <- trainControl(summaryFunction=twoClassSummary,
classProbs=TRUE)
appr.lda <- train(Class~., data=appr, method="lda",
trControl=Ctrl, preProc = c("center","scale"),
metric="ROC")
varImp(appr.lda)
This leads to the same error message as yours.
Error: $ operator is invalid for atomic vectors
In addition: Warning messages:
1: In mean.default(y, rm.na = TRUE) :
argument is not numeric or logical: returning NA
2: In Ops.factor(left, right) : ‘-’ not meaningful for factors
And my code with read.table() and the clipboard :
glauc <- read.table("clipboard", header=T, sep="\t", dec=".")
rownames(glauc) <- glauc$IDENT
glauc$IDENT <- NULL
library(caret)
numappr <- createDataPartition(glauc$Class, p=0.7)
appr <- glauc[numappr$Resample1,]
test <- glauc[-numappr$Resample1,]
Ctrl <- trainControl(summaryFunction=twoClassSummary,
classProbs=TRUE)
appr.lda <- train(Class~., data=appr, method="lda",
trControl=Ctrl, preProc = c("center","scale"),
metric="ROC")
varImp(appr.lda)
This one leads to the result (only the first ones here):
varImp(appr.lda)
ROC curve variable importance
only 20 most important variables shown (out of 62)
Importance
vari 100.00
varg 97.14
vars 94.52
phci 93.69
hic 92.02
phcg 90.55
tms 89.96
Hope it helps.
Sophie

error tuning custom algorithm with caret r

I want to tune two parameters of my custom algorithm with caret. Un parameter (lambda) is numeric and the other parameter (prior) is character. This parameter can take two values "known" or "unknown". I've tuned the algorithm with just the lambda parameter. It's okay. But when I add the character parameter (prior) gives me the following error:
1: In eval(expr, envir, enclos) : model fit failed for Resample01:
lambda=1, prior=unknown Error in mdp(Class = y, data = x, lambda =
param$lambda, prior = param$prior, : object 'assignment' not found
the error must be related with the way to specify the character parameter (prior). Here is my code:
my_mod$parameters <- data.frame(
parameter = c("lambda","prior"),
class = c("numeric", "character"),
label = c("sample_length", "prior_type"))
## The grid Element
my_mod$grid <- function(x, y, len = NULL){expand.grid(lambda=1:2,prior=c("unknown", "known"))}
mygrid<-expand.grid(lambda=1:2,prior=c('unknown','known'))
## The fit Element
my_mod$fit <- function(x, y, wts, param, lev, last, classProbs, ...){
mdp(Class=y,data=x,lambda=param$lambda,prior=param$prior,info.pred ="yes")
}
## The predict Element
mdpPred <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
predict.mdp(modelFit, newdata)
my_mod$predict <- mdpPred
fitControl <- trainControl(method = "cv",number = 5,repeats = 5)
train(x=data, y = factor(Class),method = my_mod,trControl = fitControl, tuneGrid = mygrid)
That is because you must specify as.character(param$prior) in the fit function.

Questions of xgboost with R

I used xgboost to do logistic regression. I followed the steps from, but I got two problems.The datasets are found here.
First, when I run the follow code:
bst <- xgboost(data = sparse_matrix, label = output_vector,nrounds = 39,param)
Then, I got
[0]train-rmse:0.350006
[1]train-rmse:0.245008
[2]train-rmse:0.171518
[3]train-rmse:0.120065
[4]train-rmse:0.084049
[5]train-rmse:0.058835
[6]train-rmse:0.041185
[7]train-rmse:0.028830
[8]train-rmse:0.020182
[9]train-rmse:0.014128
[10]train-rmse:0.009890
[11]train-rmse:0.006923
[12]train-rmse:0.004846
[13]train-rmse:0.003392
[14]train-rmse:0.002375
[15]train-rmse:0.001662
[16]train-rmse:0.001164
[17]train-rmse:0.000815
[18]train-rmse:0.000570
[19]train-rmse:0.000399
[20]train-rmse:0.000279
[21]train-rmse:0.000196
[22]train-rmse:0.000137
[23]train-rmse:0.000096
[24]train-rmse:0.000067
[25]train-rmse:0.000047
[26]train-rmse:0.000033
[27]train-rmse:0.000023
[28]train-rmse:0.000016
[29]train-rmse:0.000011
[30]train-rmse:0.000008
[31]train-rmse:0.000006
[32]train-rmse:0.000004
[33]train-rmse:0.000003
[34]train-rmse:0.000002
[35]train-rmse:0.000001
[36]train-rmse:0.000001
[37]train-rmse:0.000001
[38]train-rmse:0.000000
train-rmse is finally equal to 0! Is that normal? Usually,I know train-rmse can't be equal to 0. So,where is my problem?
Second, when I run
importance <- xgb.importance(sparse_matrix#Dimnames[[2]], model = bst)
Then, I got a Error:
Error in eval(expr, envir, enclos) : object 'Yes' not found.
I don't know what does it mean, maybe the first question leads to the second one.
library(data.table)
train_x<-fread("train_x.csv")
str(train_x)
train_y<-fread("train_y.csv")
str(train_y)
train<-merge(train_y,train_x,by="uid")
train$uid<-NULL
test<-fread("test_x.csv")
require(xgboost)
require(Matrix)
sparse_matrix <- sparse.model.matrix(y~.-1, data = train)
head(sparse_matrix)
output_vector = train[,y] == "Marked"
param <- list(objective = "binary:logistic", booster = "gblinear",
nthread = 2, alpha = 0.0001,max.depth = 4,eta=1,lambda = 1)
bst <- xgboost(data = sparse_matrix, label = output_vector,nrounds = 39,param)
importance <- xgb.importance(sparse_matrix#Dimnames[[2]], model = bst)
I ran into the same problem (Error in eval(expr, envir, enclos) : object 'Yes' not found.) and the reason was the following:
I tried to do
dt = data.table(x = runif(10), y = 1:10, z = 1:10)
label = as.logical(dt$z)
train = dt[, z := NULL]
trainAsMatrix = as.matrix(train)
label = as.matrix(label)
bst <- xgboost(data = trainAsMatrix, label = label, max.depth = 8,
eta = 0.3, nthread = 2, nround = 50, objective = "reg:linear")
bst$featureNames = names(train)
xgb.importance(model = bst)
The problem comes from the line
label = as.logical(dt$z)
I got this line in there because the last time I used xgboost, I wanted to predict a categorial variable. Now since I want to do regression it should read:
label = dt$z
Maybe something similar causes the problem in your case?
Perhaps this is of any help. I'm often getting the same error when the labels have zero variation. Using the current CRAN version of xgboost, which is somewhat old already (0.4.4). xgb.train happily accepts this (showing a .50 AUC) but the error then shows when calling xgb.importance.
Cheers
Otto
[0] train-auc:0.500000 validate-auc:0.500000
[1] train-auc:0.500000 validate-auc:0.500000
[2] train-auc:0.500000 validate-auc:0.500000
[3] train-auc:0.500000 validate-auc:0.500000
[4] train-auc:0.500000 validate-auc:0.500000
[1] "XGB error: Error in eval(expr, envir, enclos): object 'Yes' not found\n"

Resources