Importing a library inside a function - r

I am using tensorflow with Rstudio, and trying to make it as simple and as functionalized as possible. I was wondering if there is a way to call a library inside a function, without having to do this :
library(tensorflow)
myFunction(args)
Is there a way to embed the first command in the function, so that I don't have to call it each time before using the function ?
I tried something like that :
Lamdadou <- function(R) {
library(tensorflow)
sess =tf$Session()
K <- sess$run(R)
print(K)
}
But an error rises when I call it :
Error: Python module tensorflow was not found.

Within functions you should use require and not library to load packages.
So your function should look more like this:
Lamdadou <- function(R) {
if (!require(tensorflow)) {
stop("tensorflow not installed")
} else {
sess <- tf$Session()
K <- sess$run(R)
print(K)
}
}

Related

Use Rcpp function in foreach %dopar%

I have created an Rcpp test package called "test" using the Rcpp package skeleton to try to run c++ code in parallel but keep running into errors. I'm running R 4.1.2 on Mac OS and have updated all parallel computing packages. I added to the package skeleton an R script containing
# wrap c++ function in R function
test_func <- function()
{
return(rcpp_hello_world())
}
# attempt to parallelize
parallelize <- function()
{
# create cluster
cl <- parallel::makeCluster(parallel::detectCores() - 1)
parallel::clusterExport(cl,varlist = c("test_func","rcpp_hello_world"),envir = environment())
doParallel::registerDoParallel(cl)
# call test_func in parallel
res <- foreach::`%dopar%`(foreach::foreach(i = 1:5,.combine = c),ex = {test_func()})
# clean up
parallel::stopCluster(cl)
return(res)
}
I loaded my package using devtools::load_all(), but typing parallelize() in my console I get the error "Error in { : task 1 failed - "object '_test_rcpp_hello_world' not found" ". When I add "_test_rcpp_hello_world" to the clusterExport call I get the error "Error in { : task 1 failed - "NULL value passed as symbol address" ".
Everything works fine when I switch %dopar% to %do%, but I'm hoping to be able to still parallelize.
I know that similar questions have been asked here, but I can't use a solution which calls sourceCpp on each worker (the c++ code in my actual R package is huge and this operation would defeat the purpose of parallelizing).
Any help would be greatly appreciated!!
(Continuing from the comments)
The key is that to execute 'local' code on a node, you cannot send a (compiled) function to the node. The node needs to have it, and the best way it to have the node(s) have access to the same package(s), load them and thus be ready to run code using them. I just glanced at some old slide decks from presentations I gave and I didn't find an perfect example -- but a pointer to a (thirteen-plus (!!) year old) directory of example scripts including this for running (cpu-wise expensive) DieHarder tests on nodes via Rmpi:
#!/usr/bin/env r
suppressMessages(library(Rmpi))
suppressMessages(library(snow))
cl <- NULL
mpirank <- mpi.comm.rank(0)
if (mpirank == 0) {
cl <- makeMPIcluster()
} else { # or are we a slave?
sink(file="/dev/null")
slaveLoop(makeMPImaster())
mpi.finalize()
q()
}
clusterEvalQ(cl, library(RDieHarder))
res <- parLapply(cl, c("mt19937","mt19937_1999",
"mt19937_1998", "R_mersenne_twister"),
function(x) {
dieharder(rng=x, test="operm5",
psamples=100, seed=12345)
})
stopCluster(cl)
print( do.call(rbind, lapply(res, function(x) { x[[1]] } )))
mpi.quit()
The key is in the middle: clusterEvalQ(cl, library(RDieHarder)) All worker nodes are asked to load the RDieHarder package. Conceptually, you want to do the same here, and the foreach family lets you do it too.

R: Understand the ".call" function in R

I am using R. I am working with a library called "mco" : https://cran.r-project.org/web/packages/mco/index.html
I was looking over some of the function definitions used in this library at the github repository, for example: https://github.com/olafmersmann/mco/blob/master/R/nsga2.R
Over here, I came across the following lines of code:
res <- .Call(do_nsga2,
ff, cf, sys.frame(),
as.integer(odim),
as.integer(cdim),
as.integer(idim),
lower.bounds, upper.bounds,
as.integer(popsize), as.integer(generations),
cprob, as.integer(cdist),
mprob, as.integer(mdist))
if (1 == length(res)) {
res <- res[[1]]
names(res) <- c("par", "value", "pareto.optimal")
class(res) <- c("nsga2", "mco")
} else {
for (i in 1:length(res)) {
names(res[[i]]) <- c("par", "value", "pareto.optimal")
class(res[[i]]) <- c("nsga2", "mco")
}
class(res) <- "nsga2.collection"
}
return (res)
}
In the very first line of this code, it makes reference to some object called "do_nsga2". But apart from this function, I can't find any reference to "do_nsga2" within the entire package.
Does anyone know what exactly is being "called"?
Thanks
Note: I am trying to copy/paste all the functions from the github repository into my R session, since I am working with an older computer in which directly installing libraries from CRAN is not possible. When I tried to copy/paste all these functions, I got the following error:
Error in nsga2....
object 'do_nsga2' not found

Why is using %dopar% with foreach causing R to not recognize package?

I was trying to get my code to run in parallel on R by using the doParallel package with the foreach package. I am also using the sf package to manipulate shp files. I made sure all my code worked in the foreach loop just using %do% so if there was an error I could better track it down. My code worked fine using foreach and %do% but when I changed it do %dopar% R would keep giving me the following error:
Error in { : task 1 failed - "could not find function "st_geometry_type""
Even though I clearly use require(sf) at the top of the R script. I made a small function that just prints out "check" if the statement is true to replicate the error.
require(sf)
require(doParallel)
doParallel::registerDoParallel(cores = 2)
testforeach <- function(sfObject)
{
foreach(i=1:10) %dopar% {
if (st_geometry_type(sfObject[i,]) == "LINESTRING")
{
print("check")
}
}
}
When I run this code it throws the same exact error:
Error in { : task 1 failed - "could not find function "st_geometry_type""
However when I replace %dopar% with %do% it prints out all of the expected "check" messages.
Is this a bug in R or am I missing something? I tried reinstalling my packages but that didn't seem to have any affect as I continued to get the same error. Any help would be greatly appreciated.
You need to include the packages you will use inside the loop in the foreachfunction
foreach(i=1:10,.packages="sf") %dopar% {
if (st_geometry_type(sfObject[i,]) == "LINESTRING")
{
print("check")
}
}

TryCatch with parLapply (Parallel package) in R

I am trying to run something on a very large dataset. Basically, I want to loop through all files in a folder and run the function fromJSON on it. However, I want it to skip over files that produce an error. I have built a function using tryCatch however, that only works when i use the function lappy and not parLapply.
Here is my code for my exception handling function:
readJson <- function (file) {
require(jsonlite)
dat <- tryCatch(
{
fromJSON(file, flatten=TRUE)
},
error = function(cond) {
message(cond)
return(NA)
},
warning = function(cond) {
message(cond)
return(NULL)
}
)
return(dat)
}
and then I call parLapply on a character vector files which contains the full paths to the JSON files:
dat<- parLapply(cl,files,readJson)
that produces an error when it reaches a file that doesn't end properly and does not create the list 'dat' by skipping over the problematic file. Which is what the readJson function was supposed to mitigate.
When I use regular lapply, however it works perfectly fine. It generates the errors, however, it still creates the list by skipping over the erroneous file.
any ideas on how I could use exception handling with parLappy parallel such that it will skip over the problematic files and generate the list?
In your error handler function cond is an error condition. message(cond) signals this condition, which is caught on the workers and transmitted as an error to the master. Either remove the message calls or replace them with something like
message(conditionMessage(cond))
You won't see anything on the master though, so removing is probably best.
What you could do is something like this (with another example, reproducible):
test1 <- function(i) {
dat <- NA
try({
if (runif(1) < 0.8) {
dat <- rnorm(i)
} else {
stop("Error!")
}
})
return(dat)
}
cl <- parallel::makeCluster(3)
dat <- parallel::parLapply(cl, 1:100, test1)
See this related question for other solutions. I think using foreach with .errorhandling = "pass" would be another good solution.

Access the code of the R_igraph_minimum_size_separators in R

I'm trying to access the code of the minimum.size.separators function from igraph package from R. I tried everything I could using what is suggested here, and only the following appears for me:
function (graph)
{
if (!is_igraph(graph)) {
stop("Not a graph object")
}
on.exit(.Call("R_igraph_finalizer", PACKAGE = "igraph"))
res <- .Call("R_igraph_minimum_size_separators", graph, PACKAGE = "igraph")
if (igraph_opt("return.vs.es")) {
for (i_ in seq_along(res)) {
res[[i_]] <- create_vs(graph, res[[i_]])
}
}
res
}
<environment: namespace:igraph>
In fact this is not the code that performs what the function does, but rather it is the code that calls R_igraph_minimum_size_separators, which I did not find its code. Does anyone know how to find the R_igraph_minimum_size_separators code?

Resources