How to evaluate this scheme in RWeka? - r

The scheme I am trying to evaluate is:
weka.classifiers.meta.AttributeSelectedClassifier -E "weka.attributeSelection.CfsSubsetEval " -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W weka.classifiers.functions.SMOreg -- -C 1.0 -N 0 -I "weka.classifiers.functions.supportVector.RegSMOImproved -L 0.0010 -W 1 -P 1.0E-12 -T 0.0010 -V" -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0"
i.e. I am trying to run an AttributeSelectedClassifier with an SMOreg classifier inside. Every other parameter is the default value of the respective classifier.
So the R code is:
optns <- Weka_control(W = "weka.classifiers.functions.SMOreg")
ASC <- make_Weka_classifier("weka/classifiers/meta/AttributeSelectedClassifier")
model <- ASC(class ~ ., data = as.data.frame(dat), control = optns)
evaluation <- evaluate_Weka_classifier(model, numFolds = 10)
evaluation
When I run the above R code I get this error:
Error in .jcall(evaluation, "D", x, ...) : java.lang.NullPointerException
The above error happens in RWeka's evaluate.R where it tries to call the WEKA methods: "pctCorrect", "pctIncorrect", "pctUnclassified", "kappa", "meanAbsoluteError","rootMeanSquaredError","relativeAbsoluteError","rootRelativeSquaredError"
I also tried manually specifying the default values using the Weka_control object like so:
optns <- Weka_control(E = "weka.attributeSelection.CfsSubsetEval ",
S = list("weka.attributeSelection.BestFirst", D = 1,N = 5),
W = list("weka.classifiers.functions.SMOreg", "--",
C=1.0, N=0,
I = list("weka.classifiers.functions.supportVector.RegSMOImproved",
L = 0.0010, W=1,P=1.0E-12,T=0.0010,V=TRUE),
K = list("weka.classifiers.functions.supportVector.PolyKernel",
C=250007, E=1.0)))
ASC <- make_Weka_classifier("weka/classifiers/meta/AttributeSelectedClassifier")
model <- ASC(class ~ ., data = as.data.frame(dat), control = optns)
evaluation <- evaluate_Weka_classifier(model, numFolds = 10)
evaluation
and I get this error:
Error in .jcall(classifier, "V", "buildClassifier", instances) :
java.lang.Exception: Can't find class called: weka.classifiers.functions.SMOreg -- -C 1 -N 0 -I weka.classifiers.functions.supportVector.RegSMOImproved -L 0.001 -W 1 -P 1e-12 -T 0.001 -V -K weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1

I tried your example but got a different error (where dat is my own data frame)
Error in model.frame.default(formula = class ~ ., data = dat) :
object is not a matrix
Your error may be not directly related to syntax of calling this Weka function but some issues with path setup.

Related

makecluster of parallely package

cl <- parallelly::makeClusterPSOCK(2, autoStop = TRUE)
Error:
Error in system(test_cmd, intern = TRUE, input = input) :
'CreateProcess' failed to run 'C:\Users\xxx~1\ONEDRI~1\DOCUME~1\R\R-40~1.3\bin\x64\Rscript.exe -e "try(suppressWarnings(cat(Sys.getpid(),file=\"C:/Users/LOCAL_~1/Temp/RtmpIP7vSI/worker.rank=1.parallelly.parent=19988.4e147c0a5082.pid\")), silent = TRUE)" -e "file.exists(\"C:/Users/LOCAL_~1/Temp/RtmpIP7vSI/worker.rank=1.parallelly.parent=19988.4e147c0a5082.pid\")"'
I was trying to create the clusters for parallel execution. But does not work and throws the above error.

Parsing error when space is present in test event for Custom R Lambda Environment

I've followed this tutorial in setting up R as a custom environment for AWS Lambda:
https://www.r-bloggers.com/2019/07/how-to-use-r-in-aws-lambda/
I've found that I can successfully run code the configured test event does not include a space. To demonstrate this, this is an example handler function:
handler <- function(key1) {
print('test')
This will run fine when the test event is as follows:
{
"key1": "value1"
}
However, if you were to change the test event to:
{
"key1": "value 1"
}
It returns:
"parse error: premature EOF\\n {\\\"key1\\\":\\\"value\\n (right here) ------^\\n\""
As R runs fine with the first test event, I suspect the issue is associated with the bootstrap and runtime.R files within the layer.
This is the bootstrap files:
#!/bin/sh
while true
do
HEADERS="$(mktemp)"
EVENT_DATA=$(curl -sS -LD "$HEADERS" -X GET "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next")
REQUEST_ID=$(grep -Fi Lambda-Runtime-Aws-Request-Id "$HEADERS" | tr -d '[:space:]' | cut -d: -f2)
RESPONSE=$(/opt/R/bin/Rscript /opt/runtime.R $EVENT_DATA)
RESPONSE_CODE=$?
if [ $RESPONSE_CODE = 0 ]; then
OUT="response"
elif [ $RESPONSE_CODE = 100 ]; then
OUT="error"
fi
curl -X POST "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/$OUT" -d "$RESPONSE"
done
This is the runtime.R file:
output <- tryCatch(
{
library(jsonlite)
HANDLER <- Sys.getenv("_HANDLER")
args <- commandArgs(trailingOnly = TRUE)
EVENT_DATA <- args[1]
HANDLER_split <- strsplit(HANDLER, ".", fixed = TRUE)[[1]]
file_name <- paste0(HANDLER_split[1], ".R")
function_name <- HANDLER_split[2]
source(file_name)
params <- fromJSON(EVENT_DATA)
output <- tryCatch(
list(out = do.call(function_name, params), quit_status = 0),
error = function(e) {
list(out = e$message, quit_status = 100)
}
)
list(out = output$out, quit_status = output$quit_status)
},
error = function(e) {
list(out = e$message, quit_status = 100)
}
)
output$out
quit(status = output$quit_status)
Any help would be much appreciated, even if it just tips on how to debug this issue. Many thanks.

How to get an R script to run in parallel on a mosix cluster?

I am trying to re-create the example given in part 3 of this paper, which performs a simple calculation across several instances managed by a cluster. The main calculation happens in this script, "sim.R":
# sim.R
# If the "batch" package has not been installed, run the line below:
# install.packages("batch", repos = "http://cran.cnr.Berkeley.edu")
seed <- 1000
n <- 50
nsim <- 10000
mu <- c(0, 0.5)
sd <- c(1, 1)
library("batch")
parseCommandArgs()
set.seed(seed)
pvalue <- rep(0,nsim)
for(i in 1:nsim) {
X <- rnorm(n = n, mean = mu[1], sd = sd[1])
Y <- rnorm(n = n, mean = mu[2], sd = sd[2])
pvalue[i] <- t.test(X, Y)$p.value
}
power <- mean(pvalue <= 0.05)
out <- data.frame(seed = seed, nsim = nsim, n = n,
mu = paste(mu, collapse = ","),
sd = paste(sd, collapse = ","), power = power)
outfilename <- paste("res", seed, ".csv", sep = "")
print(out)
write.csv(out, outfilename, row.names = FALSE)
To run multiple, parallel instances of sim.R, there is another script "param-sim.R"
library("batch")
seed <- 1000
for(i in 1:10) {
seed <- rbatch("sim.R", seed = seed, n = 25, mu = c(0, i / 10))
rbatch.local.run() # My understanding from the linked paper is that this line will do nothing if the script is run on a mosix cluster and not locally.
}
To run this on a mosix cluster, I use the following command from the terminal:
R --vanilla --args RBATCH mosix < param-sim.R
I would expect this output to generate 10 .csv files, labeled res1000.csv - res1009.csv. Instead, here's what I get (I am running this command in an Ubuntu environment):
$ R --vanilla --args RBATCH mosix < param-sim.R
R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library("batch")
> seed <- 1000
> for(i in 1:10) {
+ seed <- rbatch("sim.R", seed = seed, n = 25, mu = c(0, i / 10))
+ rbatch.local.run()
+ }
nohup mosrun -e -b -q R --vanilla --args seed 1000 n 25 mu "c(0,0.1)" < sim.R > sim.Rout1000 &
rbatch.local.run: no commands have been batched.
nohup: redirecting stderr to stdout
nohup mosrun -e -b -q R --vanilla --args seed 1001 n 25 mu "c(0,0.2)" < sim.R > sim.Rout1001 &
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args seed 1002 n 25 mu "c(0,0.3)" < sim.R > sim.Rout1002 &
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args seed 1003 n 25 mu "c(0,0.4)" < sim.R > sim.Rout1003 & nohup:
redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args seed 1004 n 25 mu "c(0,0.5)" < sim.R > sim.Rout1004 &
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args seed 1005 n 25 mu "c(0,0.6)" < sim.R > sim.Rout1005 &
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args seed 1006 n 25 mu "c(0,0.7)" < sim.R > sim.Rout1006 &
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args seed 1007 n 25 mu "c(0,0.8)" < sim.R > sim.Rout1007 &
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args seed 1008 n 25 mu "c(0,0.9)" < sim.R > sim.Rout1008 &
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args seed 1009 n 25 mu "c(0,1)" < sim.R > sim.Rout1009 &
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
>
nohup: redirecting stderr to stdout
No .csv files are generated, and each of the output files (i.e. sim.Rout1000) contains identical information:
mosrun - MOSIX Version 4.3.4
Usage: mosrun [location-options] [program-options] {program} [args]...
mosrun -S{maxjobs} [location-options] [program-options]
{commands-file}[,{failed-file}]
mosrun -R{filename} [-O{fd=filename}][,{fd2=fn2}]... [location-options]
mosrun -I{filename}
Location options - Node specification:
-b try to start on 'best' available node
-r{hostname} start on given host
-{a.b.c.d} start on the node of given IP address
-{n} start on given logical node number
-h start on home node
Other location options:
-F do not fail if requested node is not available
-L lock, disallow automatic migration
-l unlock, allowing automatic migration
-g disallow automatic freezing
-G allow automatic freezing
-m{mb} try to run only on nodes with >= mb free memory
-A {minutes} auto checkpoint interval in minutes (0-10000000)
-N {max} max. # of checkpoints before cycle (0-10000000)
Program options:
-e unsupported system calls produce -1/errno=ENOSYS
-w as -e, but print warnings for unsupported calls
-u unsupported system calls kill mosrun (default)
-d {0-10000} specify decay rate per second in parts of 10000
-c consider program as a pure CPU job (ignore I/O)
-n reverse '-c', so to include I/O considerations
-C{filename} test given checkpoint file
-X{/directory} declare private directory
-z program arguments start at argument #0 (not #1)
Which leads me to think that the program never ran or entered a cluster queue. I have also checked the system processes with the "top" command, and uncovered nothing. For the record, I have been able to successfully run simple C++ programs on a mosix cluster.
Have I missed a key detail to allow this program to work?

R plotting has issues with producing text/number labels

I am having issues with my general R plotting functions. From a fresh R restart, and without loading any packages, I run this:
png(file ="mtcars.png", width = 1600, height = 1600, units = "px", res = 300)
hist(mtcars$mpg, breaks = 'FD', col = 'blue', xlab = 'MPG', main = 'MPG of mtcars Dataset')
dev.off()
And the result is what you see in this post. There's some weird squares where the labels should be. I am running R in a new docker image and have no idea what I forgot to install. Somebody help!
My dockerfile contains:
RUN conda install -y r-base && conda install -c bioconda -y r-devtools
RUN \
/opt/conda/lib/R/bin/R -e "getOption(\"unzip\") ; options(repos = list(CRAN=\"http://cran.rstudio.com/\")) ; Sys.getenv(\"TAR\") ; options(unzip = \"/opt/conda/bin/unzip\") ; \
Sys.setenv(TAR = \"/bin/tar\"); library(devtools) ; install.packages(c('Seurat','XML','ggplot2')) ; \
devtools::install_github(repo = 'satijalab/seurat-wrappers',quiet=T) ; \
if (!requireNamespace(\"BiocManager\", quietly = TRUE)) { \
install.packages(c(\"BiocManager\"), quiet = T) \
} ; \
BiocManager::install('monocle') ; devtools::install_github('hms-dbmi/conos')"

R supress console output of a system or shell command

I have this windows-batchfile which I'm calling from R using the shell() command. This batchfile does some calculations and writes them on the disk but also on the screen. I'm interested in the disk-output, only. I cannot change the batchfile.
The batchfile might be something silly like:
#echo off
echo 1 + 2
#echo 1 + 2 > C:\TEMP\batchoutput.txt
exit
I tried
shell("batchfile.bat", invisible = TRUE)
1 + 2
shell("batchfile.bat", show.output.on.console = FALSE)
Error in system(cmd, intern = intern, wait = wait | intern, show.output.on.console = wait, :
formal argument "show.output.on.console" matched by multiple actual arguments
system("batchfile.bat", invisible = T)
1 + 2
system("batchfile.bat", show.output.on.console = F)
Warning message:
running command 'C:\TEMP\batchfile.bat' had status 1
Is there a way of supressing the console-output on R?
options(warn = -1)
shell("You command")
options(warn = 0)

Resources