How to initialize libraries by their string names in cluster? - r

I want to initialize libraries in cluster by their names represented as strings.
This code works fine:
library(snowfall, rlecuyer, rsprng)
sfInit(parallel = TRUE, cpus = 4, type = "SOCK")
sfClusterEval(library(e1071))
And this code produces en error: 4 nodes produced errors; first error: object 'expr' not found
library(snowfall, rlecuyer, rsprng)
sfInit(parallel = TRUE, cpus = 4, type = "SOCK")
lib <- "e1071"
expr <- parse(text=paste("library(", lib, ")", sep=""))
sfClusterEval(expr)
So sfClusterEval try to evaluate expr and not an expression which expr contains. I cannot undestand which type of expression should be passed to sfClusterEval function, which uses substitute in its body
> sfClusterEval
function (expr, stopOnError = TRUE)
{
sfCheck()
if (sfParallel()) {
return(sfClusterCall(eval, substitute(expr), env = globalenv(),
stopOnError = stopOnError))
}
else {
return(eval(expr, envir = globalenv(), enclos = parent.frame()))
}
}
This question seems simple, but I could not solve it and need someone's advice.
UPDATE:
Further investigation details on simplier examples. I feel that the truth is near.
This code works fine
sfClusterEval(library("e1071"))
But this call produces en error: 4 nodes produced errors; first error: object 'lib' not found
lib <- "e1071"
sfClusterEval(library(lib, character.only=TRUE))
ANSWER:
The variable lib should be exported to the cluster previously. And after that it can be removed.
lib <- "e1071"
sfExport("lib")
sfClusterEval(library(lib, character.only=TRUE))
sfRemove("lib")
Thanks for Richie, for giving the starting idea!

You can use sfLibrary to load extra packages on workers. See ?snowfall and click snowfall-tools.

Whether in a cluster or not, you simply use the character.only argument to library.
library("e1071", character.only = TRUE)
If your nodes report an error stating that they can't find the package, double check that the package is installed on that machine, in a location that is one of .libPaths(). If all else fails, explicitly provide the location of the package in the lib.loc argument to library.

Related

Redefine arguments within an R core package function

My fingers are starting to tire of typing update.packages(checkBuilt = T, ask = F). I was wondering whether it's possible to redefine the default parameters within the update.packages() function. So far, I've tried adding the following to my .Rprofile file:
utils::assignInNamespace(
"update.packages",
function(checkBuilt = TRUE, ask = FALSE, ...) {
update.packages(checkBuilt = checkBuilt, ask = ask, ...)
},
"utils"
)
But when I try to use the function in R I get the following error:
update.packages()
Error: C stack usage 7976404 is too close to the limit
I've also just tried using formals() with the following in the .Rprofile:
local({
args_new <- alist(lib.loc = .libPaths(), ask = FALSE, checkBuilt = TRUE)
ind <- which(methods::formalArgs(update.packages) %in% names(args_new))
formals(update.packages)[ind] <- args_new
})
But that results in the following error upon launching R:
Error in formals(update.packages) : object 'update.packages' not found
As #Roland said in the comments, your definition is recursive. You shouldn't bother with the assignInNamespace: keeping the new function in your workspace is good enough. Then you can use utils::update.packages in its definition, e.g.
update.packages <- function(checkBuilt = TRUE, ask = FALSE, ...)
utils::update.packages(checkBuilt = checkBuilt, ask = ask, ...)
You should avoid using assignInNamespace for the reasons listed in its help page.

Parallel processing stopped working with error: object 'mcinteractive' not found

For a long time I've been successfully running a program which uses parallel processing. a couple of days ago to code stopped working with the error message:
"Error in get("mcinteractive", pkg) : object 'mcinteractive' not
found
traceback()
8: get("mcinteractive", pkg)
7: .customized_mcparallel({
result <- mclapply(X, function(...) {
res <- FUN(...)
writeBin(1L, progressFifo)
return(res)
}, ..., mc.cores = mc.cores, mc.preschedule = mc.preschedule,
mc.set.seed = mc.set.seed, mc.cleanup = mc.cleanup,
mc.allow.recursive = mc.allow.recursive)
if ("try-error" %in% sapply(result, class)) {
writeBin(-1L, progressFifo)
}
close(progressFifo)
result
})
6: pbmclapply(1:N, FUN = function(i) {
max_score = max(scores[i, ])
topLabels = names(scores[i, scores[i, ] >= max_score -
fine.tune.thres])
if (length(topLabels) == 0) {
return(names(which.max(scores[i, ])))
}
(I have more traceback if you are interested, but I think it mainly belongs to the "surrounding" code and is not so interesting for the error per se. Tell me if you need it and I'll make an edit!)
I do not know anything about parallel processing, and I haven't been able to understand the issue by digging into the code. From what I've understood, parallel::mcparallel is a function containing the argument mcinteractive for which you can choose TRUE or FALSE. Earlier I got the tip to decrease the number of cores used in the processing. Before I used 16 cores without any issues. After the error started occurring I tried to set the number of cores to both 8 and 1 with the same result. If it is some memory problem I guess I'm in the wrong forum, sorrysorrysorry!! But I only experience problems when using RStudio, which is why I'm writing here. The only other thing that I can think of, that might be related, is that my processing (through RStudio) sometimes gets stuck and the only thing I found is that the RAM memory is full and I have to restart the computer. Then the processing works as usual again. However, this does not help with the new error when using parallel computation.
Do anyone recognize this issue and have any lead to what could be the cause? Is it the code, teh package, studioR or my computer? Any checks I can run? :)
Edit:
Including a short version of the error while searching the code after changing pbmclapply to mclapply.
> packageVersion("parallel")
[1] ‘3.4.4’
> labels = parallel::pbmclapply(1:N, FUN = function(i) {
. . .
+ }, mc.cores = numCores)
Error: 'pbmclapply' is not an exported object from 'namespace:parallel'
> labels = pbmcapply::pbmclapply(1:N, FUN = function(i) {
. . .
+ }, mc.cores = numCores)
Error in get("mcinteractive", pkg) : object 'mcinteractive' not found
> labels = parallel::mclapply(1:N, FUN = function(i) {
. . .
+ }, mc.cores = numCores)
Warning message:
In parallel::mclapply(1:N, FUN = function(i) { :
all scheduled cores encountered errors in user code
#inside mclapply
> job.res <- lapply(seq_len(cores), inner.do)
Error in mcfork() : could not find function "mcfork"
#inside inner.do
> f <- parallel::mcfork()
Error: 'mcfork' is not an exported object from 'namespace:parallel'
Edit 2: came a bit further in my error searching.
I had to add a triple colon before a lot of functions for parallel, meaning that i'm attaching an internal function (?), which in turn should mean that paralell is no longer part of my search path(?)
parallel:::mcfork()
parallel:::mc.advance.stream()
parallel:::selectChildren()
parallel:::isChild()
#Had to change .check_ncores(cores) to
parallel::detectCores()
This problem occurs because pbmclapply was updated and now only works with R >3.5, updating R solved my problem.

R - Parallel Processing and ldply error

I am trying to use the below code to make API calls in a parallel process to speed up the API calls. (I know this isn't the best way to speed up API calls but it works)
It only fails when I try to use parallel, otherwise it works. In the ldply function I am getting the below error:
Error in do.ply(i) :
task 1 failed - "object of type 'closure' is not subsettable"
In addition:
Warning messages:
1: : ... may be used in an incorrect context: ‘.fun(piece, ...)’
2: : ... may be used in an incorrect context: ‘.fun(piece, ...)’
any help would be appreciated!
One <- 26
cl<-makeCluster(4)
registerDoSNOW(cl)
func.time <- Sys.time()
## API CALL ONE FOR "kline"
url <- "https://api.binance.com"
path <- paste("/api/v1/klines?symbol=",pairs[1],"&interval=1m&limit=1", sep = "")
raw.results <- GET(url = url, path = path)
text_content <- content(raw.results, as = "text", encoding = "UTF-8")
kline <- data.frame(text_content %>% fromJSON())
kline$symbol <- pairs[1]
## API FUNCTION TO BE APPLIED FOR REST
loopfunction <- function(i){
url <- "https://api.binance.com"
path <- paste("/api/v1/klines?symbol=",pairs[i],"&interval=1m&limit=1", sep = "")
raw.results <- GET(url = url, path = path)
text_content <- content(raw.results, as = "text", encoding = "UTF-8")
kline_temp <- data.frame(text_content %>% fromJSON())
kline_temp$symbol <- pairs[i]
kline <- rbind(kline,kline_temp)
return(kline)
}
## DPLY PARALLEL FUNCTION
kline2 <- data.frame(ldply(2:(One - 1), .fun = loopfunction, .parallel = T, .paropts = c("httr", "jsonlite", "dplyr"))) ##"ONE" is a list varriable created earlier
stopCluster(cl)
func.end.time <- Sys.time()
func.tot.time <- func.end.time - func.time
Your question isn't fully reproducible, so the following is an educated guess.
Your loopfunction() references an object called pairs. It seems from your script that a variable called pairs is defined somewhere in your local environment. However, when loopfunction() is passed to ldply(), it no longer has access to that variable (ordinarily, it would, but parallelization requires fresh R environments to be created). Having failed to find an object called pairs in the environment, R continues searching, and finds a match in stats::pairs(). This is a plotting function, not a subsettable object like a vector or data frame. Hence the error message, "object of type 'closure' is not subsettable".
I'm not especially familiar with how ldply implements parallel processing, but you could probably modify your function definition like this:
loopfunction <- function(i, pairs) {
...[body of function]...
}
And pass pairs as an extra parameter in your ldply call:
kline2 <- data.frame(ldply(2:(One - 1), .fun = loopfunction, pairs = pairs, .parallel = T, .paropts = list(.packages = c("httr", "jsonlite", "dplyr"))))

dplyr clashes with testthat package when matches is used

I am getting an error because testthat::matches clashes with dplyr::matches, and I want to know how to use testthat::test_file to check functions which contain calls to matches(), without having to specify dplyr::matches in the function body.
E.g.:
> testthat::test_file('tmp_fn_testthat_test.R')
Attaching package: ‘testthat’
The following object is masked from ‘package:dplyr’:
matches
The following object is masked from ‘package:purrr’:
is_null
Show Traceback
Rerun with Debug
Error in -matches("tmp") : invalid argument to unary operator In addition: Warning message:
package ‘testthat’ was built under R version 3.2.5
DONE =========================================================================================================================================
This error can be reproduced by saving the following code in a file called tmp_fn_testthat_test.R in your working directory, and running the command testthat::test_file('tmp_fn_testthat_test_20161115.R'). Note that sourcing or running the expect_equal command while testthat is not loaded makes the test pass.
tmp_fn <- function() {
tmp_df <- data.frame(tmp_a = 1, tmp_b = 2)
tmp_df %>%
select(-matches('tmp')) %>%
ncol
}
testthat::expect_equal(tmp_fn(), 0)
This is a known issue with dplyr 0.5. The recommended solution is to use an explicit namespace prefix: dplyr::matches.
A work around appears to be commenting out the library(testthat) in the definition of testthat::test_file, and making function calls explicit (not sure whether this will have bad side effects):
my_test_that_file <- function (path, reporter = "summary", env = testthat::test_env(), start_end_reporter = TRUE,
load_helpers = TRUE)
{
# library(testthat)
reporter <- testthat:::find_reporter(reporter)
if (load_helpers) {
testthat:::source_test_helpers(dirname(path), env = env)
}
lister <- testthat:::ListReporter$new()
if (!is.null(reporter)) {
reporter <- testthat:::MultiReporter$new(reporters = list(reporter,
lister))
}
else {
reporter <- lister
}
testthat::with_reporter(reporter = reporter, start_end_reporter = start_end_reporter,
{
lister$start_file(basename(path))
testthat::source_file(path, new.env(parent = env), chdir = TRUE)
testthat:::end_context()
})
invisible(lister$get_results())
}

Why can't I vectorize source_url in knitr?

I am trying to vectorize this call to source_url, in order to load some functions from GitHub:
library(devtools)
# Find ggnet functions.
fun = c("ggnet.R", "functions.R")
fun = paste0("https://raw.github.com/briatte/ggnet/master/", fun)
# Load ggnet functions.
source_url(fun[1], prompt = FALSE)
source_url(fun[2], prompt = FALSE)
The last two lines should be able to work in a lapply call, but for some reason, this won't work from knitr: to have this code work when I process a Rmd document to HTML, I have to call source_url twice.
The same error shows up with source_url from devtools and with the one from downloader: somehwere in my code, an object of type closure is not subsettable.
I suspect that this has something to do with SHA; any explanation would be most welcome.
It has nothing to do with knitr or devtools or vectorization. It is just an error in your(?) code, and it is fairly easy to find it out using traceback().
> library(devtools)
> # Find ggnet functions.
> fun = c("ggnet.R", "functions.R")
> fun = paste0("https://raw.github.com/briatte/ggnet/master/", fun)
> # Load ggnet functions.
> source_url(fun[1], prompt = FALSE)
SHA-1 hash of file is 2c731cbdf4a670170fb5298f7870c93677e95c7b
> source_url(fun[2], prompt = FALSE)
SHA-1 hash of file is d7d466413f9ddddc1d71982dada34e291454efcb
Error in df$Source : object of type 'closure' is not subsettable
> traceback()
7: which(df$Source == x) at file34af6f0b0be5#14
6: who.is.followed.by(df, "JacquesBompard") at file34af6f0b0be5#19
5: eval(expr, envir, enclos)
4: eval(ei, envir)
3: withVisible(eval(ei, envir))
2: source(temp_file, ...)
1: source_url(fun[2], prompt = FALSE)
You used df in the code, and df is a function in the stats package (density of the F distribution). I know you probably mean a data frame, but you did not declare that in the code.

Resources