foreach behaviour in function, in package, with custom iterator

foreach behaviour in function, in package, with custom iterator - r

I have created a custom iterator, inheriting from there iter class in the iterators package. The iterator and its methods are not exported from the package. Here is the iterator and a test function in a script that is reproducible and runnable, the iterator is called pairsRef:
library(Biostrings)
library(iterators)
library(foreach)
setGeneric("maskSequences", function(object, seqnames, invert = TRUE, append = "union"){
standardGeneric("maskSequences")
})
setMethod("maskSequences",
signature("DNAMultipleAlignment", "character", "logical", "character"),
function(object, seqnames, invert, append) {
sequenceNames <- rownames(object)
rowmask(object, append = append, invert = invert) <-
which(sequenceNames %in% seqnames)
return(object)
})
pairsRef <- function(obj, ...){
UseMethod('pairsRef')
}
pairsRef.DNAMultipleAlignment <-
function(obj, ref = NULL, checkFun = function(x, ...) TRUE){
state <- new.env()
state$i <- 0L
state$obj <- obj
if(is.null(ref)){
state$ref <- rownames(obj)[1]
} else {
state$ref <- ref
}
state$nonRefs <- rownames(obj)
state$nonRefs <- state$nonRefs[state$nonRefs != state$ref]
it <- list(state=state, checkFun = checkFun)
class(it) <- c("pairsRef", "abstractiter", "iter")
return(it)
}
nextElem.pairsRef <- function(obj, ...){
repeat {
obj$state$i <- obj$state$i + 1L
if(obj$state$i > length(obj$state$nonRefs))
stop('StopIteration', call.=FALSE)
pair <- maskSequences(obj$state$obj,
c(obj$state$ref, obj$state$nonRefs[obj$state$i]),
invert = TRUE,
append = "replace"
)
if(obj$checkFun(pair)){
return(pair)
}
}
}
Test2 <- function(dna, ref){
pit <- pairsRef(dna, ref = ref, checkFun = function(x) TRUE)
results <- foreach(x = pit, .combine = c, .multicombine = TRUE) %do% {x}
return(results)
}
dna <-
readDNAMultipleAlignment(filepath =
system.file("extdata",
"msx2_mRNA.aln",
package="Biostrings"),
format="clustal")
Test2(dna, rownames(dna)[1])
However, I want to use this iterator to do foreach loops in a package of mine.
If I put Test2 in a package (exported), and I have all the other functions in the package (unexported), and I have the package namespace import Biostrings, iterators, and foreach. It does not work. With a fresh R session, loading the package, and the running:
dna <-
Biostrings::readDNAMultipleAlignment(filepath =
system.file("extdata",
"msx2_mRNA.aln",
package="Biostrings"),
format="clustal")
Test2(dna, rownames(dna)[1])
Results in an error: "Error in { : attempt to apply non-function"
Is this because the custom iterator is internal to the package? Any help or suggestions are greatly appreciated.
[EDIT] - If I export the iterator and it's functions from the package. Then all works fine. But I don't necessarily want to export iterators of the package.
Thanks,
Ben.

In order for it to work inside the package, the method nextElem must be imported from iterators, and then the additional method unique to the package, exported, such that it is visible to the functions in the foreach package namespace.

Related

How to Write R Package Documentation for a Function with Parallel Backend

I want to write this function as an R package
Edit
#' create suns package
#''
#' More detailed Description
#'
#' #describeIn This sums helps to
#'
#' #importFrom foreach foreach
#'
#' #importFrom doParallel registerDoParallel
#'
#' #param x Numeric Vector
#'
#' #importFrom doParallel `%dopar%`
#'
#' #importFrom parallel parallel
#'
#' #export
sums <- function(x){
plan(multisession)
n_cores <- detectCores()# check for howmany cores present in the Operating System
cl <- parallel::makeCluster(n_cores)# use all the cores pdectected
doParallel::registerDoParallel(cores = detectCores())
ss <- function(x){
`%dopar%` <- foreach::`%dopar%`
foreach::foreach(i = x, .combine = "+") %dopar% {i}
}
sss <- function(x){
`%dopar%` <- foreach::`%dopar%`
foreach::foreach(i = x, .combine = "+") %dopar% {i^2}
}
ssq <- function(x){
`%dopar%` <- foreach::`%dopar%`
foreach::foreach(i = x, .combine = "+") %dopar% {i^3}
}
sums <- function(x, methods = c("sum", "squaredsum", "cubedsum")){
output <- c()
if("sum" %in% methods){
output <- c(output, ss = ss(x))
}
if("squaredsum" %in% methods){
output <- c(output, sss = sss(x))
}
if("cubedsum" %in% methods){
output <- c(output, ssq = ssq(x))
}
return(output)
}
parallel::stopCluster(cl = cl)
x <- 1:10
sums(x)
.
What I Need
Assuming my vector x is such large that it will take a serial processing about 5 hours to complete the task like x <- 1:9e9 where parallel processing can help.
How do I include:
n_cores <- detectCores()
#cl <- makeCluster(n_cores)
#registerDoParallel(cores = detectCores())
in my .R file and DESCRIPTION file such that it will be worthy of R package documentation?

Even if it is not very easy to see the scope of the question, I'll try to make relevent suggestions. I understand that you have problems running check on your package with examples/tests that use parallel computation.
First of all, remember that check uses CRAN standards and it is impossible in a CRAN package to run examples or tests that use more than 2 cores for compatibility reasons. So your examples must be simple enough to be dealt with by 2 cores.
Then there is a problem in your code as your create a cluster but don't use it in the doParallel
Next you are using in your piece of code parallel package and doParallel package, therefore they must be included in the DESCRIPTION file running in your console:
usethis::use_package("parallel")
usethis::use_package("doParallel")
This will add both packages in the "Imports" section of your description. And then your won't load these libraries explicitely in your package.
Then you should also clarify your function in your example using "::" after the name of the relevant package which would make your example look like:
n_cores <- 2
cl <- parallel::makeCluster(n_cores)
doParallel::registerDoParallel(cl = cl)
...
parallel::stopCluster(cl = cl)
You can also refer to the registerDoParallel documentation to get a similar piece of code, you will also find that it is limited to 2 cores.
To be complete, I do not think your really need foreach package since default parallelization in R is very powerful. If you want to be able to use your function with detectCores, I would suggest you add a limitint parameter. This function should do what you want in a more "R like" manner:
sums <- function(x, methods, maxcores) {
n_cores <- min(maxcores,
parallel::detectCores())# check for howmany cores present in the Operating System
cl <- parallel::makeCluster(n_cores)# use all the cores pdectected
outputs <- sapply(
X = methods,
FUN = function(method) {
if ("sum" == method) {
output <- parallel::parSapply(
cl = cl,
X = x,
FUN = function(i)
i
)
}
if ("squaredsum" == method) {
output <-
parallel::parSapply(
cl = cl,
X = x,
FUN = function(i)
i ** 2
)
}
if ("cubedsum" == method) {
output <-
parallel::parSapply(
cl = cl,
X = x,
FUN = function(i)
i ** 3
)
}
return(sum(output))
}
)
parallel::stopCluster(cl = cl)
return(outputs)
}
x <- 1:10000000
sums(x = x, c("sum", "squaredsum"), 2)

parallel::clusterExport how to pass nested functions from global environment?

I'm making a function (myFUN) that calls parallel::parApply at one point, with a function yourFUN that is supplied as an argument.
In many situations, yourFUN will contain custom functions from the global environment.
So, while I can pass "yourFUN" to parallel::clusterExport, I cannot know the names of functions inside it beforehand, and clusterExport returns me an error because it cannot find them.
I don't want to export the whole enclosing environment of yourFUN, since it might be very big.
Is there a way for me to export only the variables necessary for running yourFUN?
The actual function is very long, here is a minimized example of the error:
mydata <- matrix(data = 1:9, 3, 3)
perfFUN <- function(x) 2*x
opt_perfFUN <- function(y) max(perfFUN(y))
avg_perfFUN <- function(w) perfFUN(mean(w))
myFUN <- function(data, yourFUN, n_cores = 1){
cl <- parallel::makeCluster(n_cores)
parallel::clusterExport(cl, varlist = c("yourFUN"), envir = environment())
parallel::parApply(cl, data, 1, yourFUN)
}
myFUN(data = mydata, yourFUN = opt_perfFUN)
myFUN(data = mydata, yourFUN = avg_perfFUN)
Error in checkForRemoteErrors(val) : one node produced an error: could not find function "perfFUN"
Thank you very much!

A possible solution, use:
myFUN <- function(data, yourFUN, n_cores = 1) {
cl <- parallel::makeCluster(n_cores)
on.exit(parallel::stopCluster(cl), add = TRUE)
envir <- environment(yourFUN)
parallel::clusterExport(cl, varlist = ls(envir), envir = envir)
parallel::parApply(cl, data, 1, yourFUN)
}

How to get Vectorize return the results invisibly?

I have a drawing function f that should not return any output.
f <- function(a=0) invisible(NULL)
f(10)
After vectorizing f, it does return NULL.
f_vec <- Vectorize(f)
f_vec(10)
[[1]]
NULL
How can I prevent this, i.e. make the output invisible here as well.
I could of course use a wrapper to suppress it.
f_wrapper <- function(a=0) {
dummy <- f_vec(a)
}
f_wrapper(10)
Is there a way to avoid the wrapper and get what I want straight away?

Yeah there is. This new version of Vectorize will do it:
Vectorize_2 <- function (FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, USE.NAMES = TRUE) {
arg.names <- as.list(formals(FUN))
arg.names[["..."]] <- NULL
arg.names <- names(arg.names)
vectorize.args <- as.character(vectorize.args)
if (!length(vectorize.args))
return(FUN)
if (!all(vectorize.args %in% arg.names))
stop("must specify names of formal arguments for 'vectorize'")
FUNV <- function() {
args <- lapply(as.list(match.call())[-1L], eval, parent.frame())
names <- if (is.null(names(args)))
character(length(args))
else names(args)
dovec <- names %in% vectorize.args
invisible(do.call("mapply", c(FUN = FUN, args[dovec], MoreArgs = list(args[!dovec]),
SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES)))
}
formals(FUNV) <- formals(FUN)
FUNV
}
But, how did I know to do this? Did I spend 20 minutes writing a brand new version of Vectorize? NOPE! I just ran dput(Vectorize) to see the R code behind Vectorize and added the invisible where necessary! You can do this with all R functions. You don't even need the dput! Just run Vectorize!

R iGraph graph.bfs and environment

Having trouble with running Breadth-first search algorithm on a graph, my current concern is with the environment parameter, where the callback function is evaluated.
This is the callback function
f.in <- function(graph, data, extra) {
time <- get.vertex.attribute(graph, "time", index=data["vid"])
root_time <- get.vertex.attribute(graph, "time", index=extra)
print(ls(environment()))
if (time != 0){
time_difference <- time - root_time
result_list <- c(list(), time_difference)
}
}
This is the context where the functions is called
graphs <- decompose.graph(network_graph, max.comps = NA, min.vertices = 0)
lapply(graphs, function(g){
v0 <- which(degree(g, mode="out") == 0)
t0 <- get.vertex.attribute(g, "time", index=v0)
if (t0 != 0) {
bfs_environment <- new.env()
assign("result_list", list(), envir=bfs_environment)
graph.bfs(g, v0, neimode="in", callback=f.in, extra=v0, rho=bfs_environment)
}
})
Now the print of environment shows me the following variables
"data" "extra" "graph" "root_time" "time"
The question is that when I have passed the environment where the callback function is evaluated, then why the "result_list" is not available? Is there something wrong with passing this environment?
Btw using R 2.15.3 and iGraph 0.7.0

When the documentation states:
rho: The environment in which the callback function is evaluated
it means that the passed environment is the parent of the callback environment.
So you can retrieve your variable using get() and parent.frame() functions, as shown in the following example:
myCallBack <- function(graph, data, extra) {
print(ls(parent.frame()))
print(get('result_list', envir=parent.frame()))
stop('just to stop at the first call...')
}
# a simple tree with 3 nodes
g <- graph.tree(3, children = 2, mode='out')
bfs_environment <- new.env()
assign("result_list", list(A=3), envir=bfs_environment)
graph.bfs(g, 1, callback=myCallBack, extra=NULL, rho=bfs_environment)
Output:
[1] "result_list"
$A
[1] 3

Is it possible to see source code of a value of function

I am using a function from a package. this function returns some values. For example:
k<-dtw(v1,v2, keep.internals=TRUE)
and I can get this value:
k$costMatrix
Does it possible to see the source code of costMatrix? if yes how can I do that?
UPDATE
this is the source code of the function:
function (x, y = NULL, dist.method = "Euclidean", step.pattern = symmetric2,
window.type = "none", keep.internals = FALSE, distance.only = FALSE,
open.end = FALSE, open.begin = FALSE, ...)
{
lm <- NULL
if (is.null(y)) {
if (!is.matrix(x))
stop("Single argument requires a global cost matrix")
lm <- x
}
else if (is.character(dist.method)) {
x <- as.matrix(x)
y <- as.matrix(y)
lm <- proxy::dist(x, y, method = dist.method)
}
else if (is.function(dist.method)) {
stop("Unimplemented")
}
else {
stop("dist.method should be a character method supported by proxy::dist()")
}
wfun <- .canonicalizeWindowFunction(window.type)
dir <- step.pattern
norm <- attr(dir, "norm")
if (!is.null(list(...)$partial)) {
warning("Argument `partial' is obsolete. Use `open.end' instead")
open.end <- TRUE
}
n <- nrow(lm)
m <- ncol(lm)
if (open.begin) {
if (is.na(norm) || norm != "N") {
stop("Open-begin requires step patterns with 'N' normalization (e.g. asymmetric, or R-J types (c)). See papers in citation().")
}
lm <- rbind(0, lm)
np <- n + 1
precm <- matrix(NA, nrow = np, ncol = m)
precm[1, ] <- 0
}
else {
precm <- NULL
np <- n
}
gcm <- globalCostMatrix(lm, step.matrix = dir, window.function = wfun,
seed = precm, ...)
gcm$N <- n
gcm$M <- m
gcm$call <- match.call()
gcm$openEnd <- open.end
gcm$openBegin <- open.begin
gcm$windowFunction <- wfun
lastcol <- gcm$costMatrix[np, ]
if (is.na(norm)) {
}
else if (norm == "N+M") {
lastcol <- lastcol/(n + (1:m))
}
else if (norm == "N") {
lastcol <- lastcol/n
}
else if (norm == "M") {
lastcol <- lastcol/(1:m)
}
gcm$jmin <- m
if (open.end) {
if (is.na(norm)) {
stop("Open-end alignments require normalizable step patterns")
}
gcm$jmin <- which.min(lastcol)
}
gcm$distance <- gcm$costMatrix[np, gcm$jmin]
if (is.na(gcm$distance)) {
stop("No warping path exists that is allowed by costraints")
}
if (!is.na(norm)) {
gcm$normalizedDistance <- lastcol[gcm$jmin]
}
else {
gcm$normalizedDistance <- NA
}
if (!distance.only) {
mapping <- backtrack(gcm)
gcm <- c(gcm, mapping)
}
if (open.begin) {
gcm$index1 <- gcm$index1[-1] - 1
gcm$index2 <- gcm$index2[-1]
lm <- lm[-1, ]
gcm$costMatrix <- gcm$costMatrix[-1, ]
gcm$directionMatrix <- gcm$directionMatrix[-1, ]
}
if (!keep.internals) {
gcm$costMatrix <- NULL
gcm$directionMatrix <- NULL
}
else {
gcm$localCostMatrix <- lm
if (!is.null(y)) {
gcm$query <- x
gcm$reference <- y
}
}
class(gcm) <- "dtw"
return(gcm)
}
but if I write globalCostMatrix I dont get the source code of this function

The easiest way to find how functions work is by looking at the source. You have a good chance that by typing function name in the R console, you will get the function definitions (although not always with good layout, so seeking the source where brackets are present, is a viable option).
In your case, you have a function dtw from the same name package. This function uses a function called globalCostMatrix. If you type that name into R, you will get an error that object was not found. This happens because the function was not exported when the package was created, probably because the author thinks this is not something a regular user would use (but not see!) or to prevent clashes with other packages who may use the same function name.
However, for an interested reader, there are at least two ways to access the code in this function. One is by going to CRAN, downloading the source tarballs and finding the function in the R folder of the tar ball. The other one, easier, is by using getAnywhere function. This will give you the definition of the function just like you're used for other, user accessible functions like dtw.
> library(dtw)
> getAnywhere("globalCostMatrix")
A single object matching ‘globalCostMatrix’ was found
It was found in the following places
namespace:dtw
with value
function (lm, step.matrix = symmetric1, window.function = noWindow,
native = TRUE, seed = NULL, ...)
{
if (!is.stepPattern(step.matrix))
stop("step.matrix is no stepMatrix object")
n <- nrow(lm)
... omitted for brevity

I think you want to see what the function dtw() does with your data. I seems that it creates a data.frame containing a column named costMatrix.
To find out how the data in the column costMatrix was generated, just type and execute dtw (without brackets!). R will show you the source of the function dtw() afterwards.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

foreach behaviour in function, in package, with custom iterator - r

In order for it to work inside the package, the method nextElem must be imported from iterators, and then the additional method unique to the package, exported, such that it is visible to the functions in the foreach package namespace.

Related

How to Write R Package Documentation for a Function with Parallel Backend

parallel::clusterExport how to pass nested functions from global environment?

How to get Vectorize return the results invisibly?

R iGraph graph.bfs and environment

Is it possible to see source code of a value of function

Categories

Resources