Max-min Markov blanket feature selection: R code error - r

I am using Max-min Markov blanket algorithm for variable selection in R from MXM package. Following is my code:
library(MXM)
dataset = read.table('data.txt', na.string = c("", "NA"), sep = '\t', header = FALSE)
dataset = dataset[, colSums(is.na(dataset)) == 0]
D = as.matrix(as.data.frame(lapply(dataset, as.numeric)))
target = read.table('class_num.txt')
target = c(target)
aa = mmmb(target, D, max_k = 3, threshold = 0.05, test = "testIndFisher", user_test = NULL, robust = FALSE, ncores = 2)
I am getting the following error:
Error in unique(as.numeric(target)) :
(list) object cannot be coerced to type 'double'
According to the mmmb manual page my dataset D is a matrix of continuous value of dimension (95933 x 85) and my target is a vector of [0, 1] of size 95933.
Can someone help me understand the error?

Got the solution:
The target is a list instead of an array. The following line solved the issue:
target = array(as.numeric(unlist(target)))
Thanks!

Related

Error in Sensobol function, object 'Y_A' not found

I try to calculating sobol index using Sensobol package in R.
So I make a sobol matrix using:
input.df <- sobol_matricies(N=50, params = c('x1','x2','x3','x4','x5')).
And then I calculate 'y' with this matrix.
y <- output.df$MY
And then try to calculating sobol index as follow:
MY.sobol <- sobol_indices(matrices = input.df, Y = y, N = 50, params = c('x1','x2','x3','x4','x5'))
But R studio returned error as follow:
Error in sobol_boot(d = d, N = N, params = params, first = first, total = total, :
object 'Y_A' not found
I am wondering what is causing this error.
Thank you.

ParBayesianOptimization suddenly fails while logging epoch results

I am currently using the R package ParBayesianOptimization to tune parameters for ML methods. While searching for an optimal cost parameter for the svmLinear2 model (contained in caret), the optimization stopped with a sudden error after successfully completing 15 iterations.
Here is the error traceback:
Error in rbindlist(l, use.names, fill, idcol) :
Item 2 has 9 columns, inconsistent with item 1 which has 10 columns. To fill missing columns use fill=TRUE.
7.
rbindlist(l, use.names, fill, idcol)
6.
rbind(deparse.level, ...)
5.
rbind(scoreSummary, data.table(Epoch = rep(Epoch, nrow(NewResults)),
Iteration = 1:nrow(NewResults) + nrow(scoreSummary), inBounds = rep(TRUE,
nrow(NewResults)), NewResults))
4.
addIterations(optObj, otherHalting = otherHalting, iters.n = iters.n,
iters.k = iters.k, parallel = parallel, plotProgress = plotProgress,
errorHandling = errorHandling, saveFile = saveFile, verbose = verbose,
...)
3.
ParBayesianOptimization::bayesOpt(FUN = ...
So somehow the data tables storing the summary information each iteration suddenly differ in the number of columns present. Is this a common bug with the ParBayesianOptimization package? Has anyone else encountered a similar problem? Did you find a fix - other than rewriting the addIterations function to fill the missing columns?
EDIT:I don't have an explanation for why the error may suddenly occur after a number of successful iterations. However, this issue has reoccurred when using svmLinear and svmRadial. I was able to reconstruct a similar case with the same error on the iris dataset:
library(data.table)
library(caret)
library(ParBayesianOptimization)
set.seed(1234)
bayes.opt.bounds = list()
bayes.opt.bounds[["svmRadial"]] = list(C = c(0,1000),
sigma = c(0,500))
svmRadScore = function(...){
grid = data.frame(...)
mod = caret::train(Species~., data=iris, method = "svmRadial",
trControl = trainControl(method = "repeatedcv",
number = 7, repeats = 5),
tuneGrid = grid)
return(list(Score = caret::getTrainPerf(mod)[, "TrainAccuracy"], Pred = 0))
}
bayes.create.grid.par = function(bounds, n = 10){
grid = data.table()
params = names(bounds)
grid[, c(params) := lapply(bounds, FUN = function(minMax){
return(runif(n, minMax[1], minMax[2]))}
)]
return(grid)
}
prior.grid.rad = bayes.create.grid.par(bayes.opt.bounds[["svmRadial"]])
svmRadOpt = ParBayesianOptimization::bayesOpt(FUN = svmRadScore,
bounds = bayes.opt.bounds[["svmRadial"]],
initGrid = prior.grid.rad,
iters.n = 100,
acq = "ucb", kappa = 1, parallel = FALSE,plotProgress = TRUE)
Using this example, the error occurred on the 9th epoch.
Thanks!
It appears that the scoring function returned NAs in place of accuracy measures leading to the error later downstream. This has been described by the library's creator at
https://github.com/AnotherSamWilson/ParBayesianOptimization/issues/33.
It looks like the SVM is trying a cost of 0 during the 9th iteration. Given the problem statement the SVM is solving, the cost parameter should probably be positive.
According to AnotherSamWilson, this error may commonly occur when the scoring function "returns something unexpected".

Problem with Over- and Under-Sampling with ROSE in R

I have a dataset to classify between won cases (14399) and lost cases (8677). The dataset has 912 predicting variables.
I am trying to oversample the lost cases in order to reach almost the same number as the won cases (so having 14399 cases for each of the won and lost cases).
TARGET is the column with lost (0) and won (1) cases:
table(dat_train$TARGET)
0 1
8677 14399
Now I am trying to balance them using ROSE ovun.sample
dat_train_bal <- ovun.sample(dat_train$TARGET~., data = dat_train, p=0.5, seed = 1, method = "over")
I get this error:
Error in parse(text = x, keep.source = FALSE) :
<text>:1:17538: unexpected symbol
1: PPER_409030143+BP_RESPPER_9639064007+BP_RESPPER_7459058285+BP_RESPPER_9339059882+BP_RESPPER_9339058664+BP_RESPPER_5209073603+BP_RESPPER_5209061378+CRM_CURRPH_Initiation+Quotation+CRM_CURRPH_Ne
Can anyone help?
Thanks :-)
Reproducing your code from a sham example I found an error in your formula dat_train$TARGET~. needs to be corrected as TARGET~.
dframe <- tibble::tibble(val = sample(c("a", "b"), size = 100, replace = TRUE, prob = c(.1, .9))
, xvar = rnorm(100)
)
# Use oversampling
dframe_os <- ROSE::ovun.sample(formula = val ~ ., data = dframe, p=0.5, seed = 1, method = "over")
table(dframe_os$data$val)

What to use for ancestry adjustment using eigenstrat

I'm confused with the eigenstrat function. Do I use the output top k eigenvectors in succeeding regressions for ancestry adjustment? (since it is n x k) But the papers say the adjustment using 4 principal components (solving for this will make the matrices non-conformable - assuming I followed the input genofile where the rows are the p SNPs and the n subjects are the columns). Basically, I'm confused on the usage of eigenvector & principal components in papers and documentation. I think people use it interchangeably.
I tried following the documentation so I used the output eigenvectors in the succeeding regressions.
(basically what's on the documentation)
write.table(t(genotype), file = "riskalleles.txt", quote = FALSE, sep = "", row.names = FALSE, col.names = FALSE)
pca.eg <- eigenstrat(genoFile = "riskalleles.txt", outFile.Robj = "eigenstrat.result.list" outFile.txt = "riskalleles.result.txt", rm.marker.index = NULL,rm.subject.index = NULL, miss.val = 9, num.splits = 10,topK = NULL, signt.eigen.level = 0.01, signal.outlier = FALSE, iter.outlier = 5, sigma.thresh = 6)
ev <- pca.eg$topK.eigenvectors
gamma <- t(ev)%*%genotype
adjusted <- t(genotype - ev%*%gamma)
(solving for PCs using formula from my multivariate class)
pc <- t(genotype)%*%ev

(list) object cannot be coerced to type 'double'

I just started the package SIS in R. I use their test data set an get an error. I am quite sure there is a problem.
install.packages("SIS",dependencies=T)
library(SIS)
data(prostate.test)
I then try to use the function SIS, which has as input
SIS(x, y, family = c("gaussian","binomial","poisson","cox"),
penalty=c("SCAD","MCP","lasso"), concavity.parameter =
switch(penalty, SCAD=3.7, 3), tune = c("cv","aic","bic","ebic"),
nfolds = 10, type.measure = c("deviance","class","auc","mse",
"mae"), gamma.ebic = 1, nsis = NULL, iter = TRUE, iter.max =
ifelse(greedy==FALSE,10,floor(nrow(x)/log(nrow(x)))), varISIS =
c("vanilla","aggr","cons"), perm = FALSE, q = 1, greedy = FALSE,
greedy.size = 1, seed = 0, standardize = TRUE)
where x is the design matrix, of dimensions n * p, without an intercept. Each row is an observation vector and y the response vector of dimension n * 1. I format their test data (the last column is the response)
prostate.test->k
k[,-dim(k)[2]]->k1
k[,dim(k)[2]]->k11
SIS(k1,k11)
then I get Error in storage.mode(x) = "numeric" :
(list) object cannot be coerced to type 'double'
Could somebody tell me how I can avoid that error?

Resources