I am asking for help to design a specific R6class and especially to design a run method to would run processes in parallel. Note that all of the code example listed below have not been tested and most likely contain errors. They are just here to help convey how I am thinking about implementing the parallelization of the jobs within my R6Class.
I built a R6class type of object called Input that is a wrapper for a simulation platform. The goal of the class is to ease the writing of individual set of paramers for inputs to the simulation platform. It might look like
input = Input$new()
input$set_parameter_x(...)
input$set_parameter_y(...)
I would like the class to be able to directly run the simulations (with the run method) and do so with a defined number of threads but I am not sure how to best achieve that goal. Note that each process started by run is single threaded. However, I would like that each process started by run can run in parallel of call to method run made from a different instance of the class Input. Something like
input_1$run(executable = "/path/to/executable", maxNbThreads = 4)
input_2$run(executable = "/path/to/executable", maxNbThreads = 4)
input_3$run(executable = "/path/to/executable", maxNbThreads = 4)
input_4$run(executable = "/path/to/executable", maxNbThreads = 4)
would all run in parallel. I don't know much about paralization in R (hence my question) but of course one could do
foreach (input_index = 1:nbInputs) %dopar%
{
require(myPackage)
input = Input$new()
input$set_parameter_x(...)
input$set_parameter_y(...)
input$run(executable = "/path/to/executable", maxNbThreads = 1)
}
instead but I'd wish that the work of parallelizing the processes would be taken into account by the R6class Input and not by the user of this class.
I am thinking about having a vector called runningThreads shared among all instances of the class (a static attribute of the class) using an environment (as explained here). runningThreads would contain the pid's of the currently running jobs. Then, everytime the run method is called, the user would specify the maximal number of threads (maxNbThreads) and, in a while loop it would remove from runningThreads the pid's of jobs that are not active anymore until the length of runningThreads is shorter than the argument maxNbThreads provided to run. run would then run the job and add its pid to runningThreads. The public method run (and private methods hat run would call) might look something like
isAThreadAvailable = function(maxNbThreads)
{
for (thread_index in 1:length(private$runningThreads))
{
thread = private$runningThreads[thread_index]
if (!isThreadRunning(thread))
{
private$runningThreads = private$runningThreads[-thread_index]
}
}
return (length(private$runningThreads) < maxNbThreads)
}
isThreadRunning = function(thread)
{
return (system(paste("kill -0", pid), intern=TRUE))
}
run = function(exec = defaultExecutable, maxNbThreads = 1, sleepTimeInSec = 2)
{
stopifnot(maxNbThreads > 0)
if (maxNbThreads == 1)
{
# Then just run it and wait for it to end
system(paste(exec, paste(private$data, collapse=" ")))
} else
{
while (!isAThreadAvailable(maxNbThreads))
{
Sys.sleep(sleepTimeInSec)
}
newThread = system(paste(exec, paste(private$data, collapse=" "), "&; echo !#"), intern=TRUE)
private$runningThreads = c(private$runningThreads, newThread)
}
}
Does it sound like a good method? There are probably packages that could ease the building my R6class. Would you be so kind as to point me to these packages and maybe show a small example of how it could be used for my run method in my R6Class?
Thanks to the package processx that #HongOoi mentionned in a comment I could implement what I needed. I did in a very similar style as I designed before but the processx package made everything so much easier.
Here is the code for the run method
isAThreadAvailable = function(maxNbThreads)
{
while (private$shared$isOtherProcessCheckingThreads)
{
Sys.sleep(0.001)
}
private$shared$isOtherProcessCheckingThreads = TRUE
thread_index = 1
while (thread_index <= length(private$shared$runningThreads))
{
thread = private$shared$runningThreads[[thread_index]]
if (!thread$is_alive())
{
private$shared$runningThreads = private$shared$runningThreads[-thread_index]
} else
{
thread_index = thread_index + 1
}
}
r = length(private$shared$runningThreads) < maxNbThreads
private$shared$isOtherProcessCheckingThreads = FALSE
return(r)
},
run = function(exec = "SimBit", maxNbThreads = 1, sleepTimeInSec = 1, waitEndOfThread = FALSE)
{
stopifnot(maxNbThreads > 0)
stopifnot(sleepTimeInSec >= 0)
while (!self$isAThreadAvailable(maxNbThreads))
{
Sys.sleep(sleepTimeInSec)
}
newThread = processx::process$new(exec, paste(private$data, collapse=" "))
private$shared$runningThreads = c(private$shared$runningThreads, newThread)
if (waitEndOfThread)
{
while (newThread$is_alive())
{
Sys.sleep(sleepTimeInSec)
}
}
}
I can't find in docs how I can do negative statement like one below:
if !($some_var) {
... enter here if $some_var doesn't exist or empty
}
I know that I can check that the variable is exits using the if statement:
if ($some_var) {
...
}
But I can't find how to do if not statement
Is it possible for NGINX?
Try this:
if($var = false) {
....
}
Is there something in R (either a package or base idiom) that is like an Option as found in Scala and other languages (see tag optional for details). Specifically, I'm looking for the following features some object that can:
signify the absence of a value but easily
hold attributes
return a default value in the face of having no contained value without requiring that the result of the default value be calculated unless it is actually needed
I'm sure there are a lot of other nice characteristics of Options that I haven't fully recognized as I'm relatively new to the idiom. Any answer that can provide more than the above listed features gets bonus points, especially if the additional features can be described well.
I tried writing a poor substitute using an R6 class (below). Anything that works better or is more idiomatically aligned with R would be greatly appreciated.
library(R6)
Option <- R6Class("Option",
public = list(
initialize = function(value=NULL) {
self$value <- value
}
,get = function() {
return(self$value)
}
,set = function(value) {
self$value <- value
return(value)
}
,getOrElse = function(...) {
if(self$isDefined()) {
return(self$value)
} else {
return(eval(...))
}
}
,isDefined = function() {
return(!all(is.null(self$value)) && !all(is.na(self$value)))
}
, value = NULL
)
,private = list()
,active = list()
) #end Option
Example:
bob <- Option$new()
bob$isDefined() == FALSE
bob$getOrElse("a") == "a"
bob$getOrElse({Sys.sleep(2);"b"})=="b"
bob$set(value = "a")
bob$isDefined() == TRUE
bob$getOrElse({Sys.sleep(2);"b"})=="a"
The followings are the data to be passed:
get.member.x = function() {
return( list(info.file='x') )
}
get.member.y = function() {
return( list(info.file='y') )
}
I want to use the data in a function. I can pass these data in two ways: 1. Global variables 2. Argument passing.
This is how to pass the data using global variables:
test = function() {
print(get()$info.file)
}
main3 = function(){
get <<- get.member.x
test()
get <<- get.member.y
test()
}
This is how to pass the data using argument passing:
test2 = function(get) {
print(get()$info.file)
}
main2 = function(){
get = get.member.x
test2(get)
get = get.member.y
test2(get)
}
The result of executing the code is here:
> main2()
[1] "x"
[1] "y"
> main3()
[1] "x"
[1] "y"
Now, I wonder if there is any better alternative than these two ways to pass data. Both methods have some disadvantages.
Using global variables is dangerous, because it has side effects on other parts of the code.
Using argument passing is safe but it clutters the function arguments overall the code. In every function where I need these data members, I will have to pass the data as arguments to the caller and called functions.
If there was a way to encapsulate data members in objects, we could overcome these two issue.
Is there any better alternative way of solving this problem? Or do you think that the current way is better than encapsulating the data in objects?
Update:
What do I try to do with this?
My intention is to change the behaviour of functions without changing the code of the functions such that common code is reused. Consider the following code:
test.duplicate = function() {
result = common.fun.1()
result = common.fun.2()
}
common.fun.1 = function() {
# common steps
result = different.fun.1()
# common steps
}
common.fun.2 = function() {
# common steps
result = different.fun.2()
# common steps
}
Above, common.fun.1 and common.fun.2 have a lot of common code. But since one line is different, I have duplicated all the common code.
To prevent the duplication of common code, I can encapsulate the changing part into an external parameter injected into the function:
test.reuse = function() {
result = common.fun()
}
common.fun = function() {
# common steps
result = get()$different.fun()
# common steps
}
Now, in order to change the behaviour we need to change the injected parameter:
get.member.x = function() {
return( list(different.fun=different.fun.1) )
}
get.member.y = function() {
return( list(different.fun=different.fun.2) )
}
get <<- get.member.x
test.reuse()
get <<- get.member.y
test.reuse()
So, we eliminated the duplicated code while changing the behaviour of the function.
I have some particularly finicky code that behaves differently on different platforms, but also behaves differently if run under valgrind ... right now I know that it
gives a warning if run on 32-bit Linux not under valgrind
gives an error if run elsewhere or on 32-bit Linux with R -d valgrind
The code below works (sorry for the lack of reproducible example, you can probably see that it would be pretty hard to write one) if I'm not running under valgrind, but under valgrind it fails because we get an error rather than a warning.
if (sessionInfo()$platform=="i686-pc-linux-gnu (32-bit)") {
expect_warning(update(g0, .~. +year), "failed to converge")
} else {
expect_error(update(g0, .~. +year), "pwrssUpdate did not converge in")
}
I would like an expect_warning_or_error() function; I suppose I could make one by hacking together the guts of expect_error and expect_warning, which don't look too complicated, but I welcome other suggestions.
Alternatively, I could figure out how to detect whether I am running under valgrind or not (seems harder).
A sort-of reproducible example:
library(testthat)
for (i in c("warning","stop")) {
expect_warning(get(i)("foo"))
expect_error(get(i)("foo"))
}
My solution, hacked together from gives_warning() and throws_error(). I'm not sure it's completely idiomatic/robust ...
gives_error_or_warning <- function (regexp = NULL, all = FALSE, ...)
{
function(expr) {
res <- try(evaluate_promise(expr),silent=TRUE)
no_error <- !inherits(res, "try-error")
if (no_error) {
warnings <- res$warnings
if (!is.null(regexp) && length(warnings) > 0) {
return(matches(regexp, all = FALSE, ...)(warnings))
} else {
return(expectation(length(warnings) > 0, "no warnings or errors given",
paste0(length(warnings), " warnings created")))
}
}
if (!is.null(regexp)) {
return(matches(regexp, ...)(res))
}
else {
expectation(TRUE, "no error thrown", "threw an error")
}
}
}
#Ben I may be misunderstanding but it comes to mind here that if you want to know if something errored/warned or not you could use tryCatch. If this is not what you want or you were hoping for a more testthat approach feel free to say, "You're way of the mark" but add an emoticon like :-) and it will make everything better.
First I make a temperamental function to mimic what you describe. Then I make an is.bad function and just look for errors or warnings (don't worry about OS as this behavior is hard to predict). Then I wrap with expect_true or expect_false:
temperamental <- function(x) {
if (missing(x)){
ifelse(sample(c(TRUE, FALSE), 1), stop("Robot attack"), warning("Beware of bots!"))
} else {
x
}
}
temperamental()
temperamental(5)
is.bad <- function(code) {
isTRUE(tryCatch(code,
error = function(c) TRUE,
warning = function(c) TRUE
))
}
expect_true(is.bad(temperamental()))
expect_false(is.bad(temperamental(5)))
I had the same problem and after reading the source for both functions I found a good solution. Actually is very simple, you only need to add a small if statement in the code from expect_error.
This is the code from expect_error
function (object, regexp = NULL, ..., info = NULL, label = NULL)
{
lab <- make_label(object, label)
error <- tryCatch({
object
NULL
}, error = function(e) {
e
})
if (identical(regexp, NA)) {
expect(is.null(error), sprintf("%s threw an error.\n%s",
lab, error$message), info = info)
}
else if (is.null(regexp) || is.null(error)) {
expect(!is.null(error), sprintf("%s did not throw an error.",
lab), info = info)
}
else {
expect_match(error$message, regexp, ..., info = info)
}
invisible(NULL)
}
Adding an if statement before the return value you check if an error was not thrown and check for warnings (remember to add the all argument to the new function). The new function code is this:
expect_error_or_warning <- function (object, regexp = NULL, ..., info = NULL, label = NULL, all = FALSE)
{
lab <- testthat:::make_label(object, label)
error <- tryCatch({
object
NULL
}, error = function(e) {
e
})
if (identical(regexp, NA)) {
expect(is.null(error), sprintf("%s threw an error.\n%s",
lab, error$message), info = info)
} else if (is.null(regexp) || is.null(error)) {
expect(!is.null(error), sprintf("%s did not throw an error.",
lab), info = info)
} else {
expect_match(error$message, regexp, ..., info = info)
}
if(is.null(error)){
expect_warning(object = object, regexp = regexp, ..., all = all, info = info, label = label)
}
invisible(NULL)
}
This code is very robust and easy to maintain. If you are writing a package and can't use functions that aren't exported (:::) you can bring the code from make_label to the function, is only one line.