Error in running sfLapply in R - r

My piece of code looks like:
x<- c(1,2,3,4,5)
library(snowfall)
f1<- function(a,list){
f2<-function(b,num){ return(abs(num-b))}
l1<-sfLapply(list, f2, num=a)
l1<-sum(unlist(l1))
return(l1)
}
sfInit(parallel=TRUE,cpus=4)
l2<-(sfLapply(x, f1, list=x))
sfStop()
l2
when I run the last four lines, it gives an error:
l2<-(sfLapply(x, f1, list=x))
Error in checkForRemoteErrors(val) :
4 nodes produced errors; first error: could not find function "sfLapply"
When I switch to sequential processing, using lapply, it runs perfectly.
> l2<-(lapply(x, f1, list=x))
> l2
[[1]]
[1] 10
[[2]]
[1] 7
[[3]]
[1] 6
[[4]]
[1] 7
[[5]]
[1] 10
Why is sfLapply throwing an error?

You need to load the snowfall package on the cluster nodes. So insert
sfLibrary(snowfall)
after sfInit().
EDIT: For clarification:
Your function f1 contains the function sfLapply, which is found in the snowfall package. When you initialize the cluster using sfInit as above, the snow package is loaded on each node of the cluster, but not the snowfall package. Without the latter, there is no object (function or otherwise) called sfLapply on the nodes, and you get the error.

Related

Send messages from within parallel function (parallel or future framework)

I would like to send messages from within a function to the R console during a parallel process using the parallel package or the future framework and pblapply::pblapply().
Here is a reprex that does not send any messages to the R console:
# library
library(pbapply)
library(stringi)
library(parallel)
# make fun
fun_func <- function(x){
cat(paste0("hello world ",x))
return(paste0("hello world ",x))}
# get data
set.seed(23)
d <- stri_rand_strings(100, 2, '[a-z]')
names(d) <- d
# make cluster
cl <- parallel::makeCluster(3)
# load func
clusterExport(cl, c("fun_func"))
# run function
pblapply(cl=cl,X=d,FUN=fun_func) -> res
# stop cluster
parallel::stopCluster(cl)
# show res
head(res)
#> $of
#> [1] "hello world of"
#>
#> $is
#> [1] "hello world is"
#>
#> $vl
#> [1] "hello world vl"
#>
#> $zz
#> [1] "hello world zz"
#>
#> $vz
#> [1] "hello world vz"
#>
#> $ws
#> [1] "hello world ws"
Created on 2022-12-07 with reprex v2.0.2
Update:
I just learned that it mitght be hard to get console message in Windows/RStudio. However, logging the messages with ParallelLogger might be an option.
Unfortunaltely I could not implement it.
So I would be happy to have a solution either for sending the messages to the console or to a file.
If you use a parallelization method on top of the future framework, e.g. future_lapply, furrr, foreach with doFuture, and soon also pbapply (https://github.com/psolymos/pbapply/issues/54), output from cat(), print(), message(), warning(), and so on in parallel workers are captured and truly re-outputted ("relayed") in your main R session as the parallel tasks completed. You can read about this in https://future.futureverse.org/articles/future-2-output.html.
If you want to see near-live output, that is, output that is produced while the parallel tasks are still running, then you use the progressr package. It is designed to send near-live progress updates when using Futureverse. Those updates can also include custom messages. For examples, see https://progressr.futureverse.org/#parallel-processing-and-progress-updates.

Log all warnings with futile.logger

Trying to log all errors and warnings with futile.logger.
Somewhat satisfied with this for dealing with errors:
library(futile.logger)
options(error = function() { flog.error(geterrmessage()) ; traceback() ; stop() })
log("a")
# Error in log("a") : argument non numérique pour une fonction mathématique
# ERROR [2016-12-01 21:12:07] Error in log("a") : argument non numérique pour une fonction mathématique
#
# No traceback available
# Erreur pendant l'emballage (wrapup) :
There is redundancy, but I can easily separate between stderr, stdout and log file, so it's not a problem. It's certainly not pretty, there is an additional "wrapup" error message somehow caused by the final stop() that I don't understand, so I'm open to suggestions.
I cannot find a similar solution for warnings. What I tried:
options(warn = 1L)
options(warning.expression = expression(flog.warn(last.warning)))
log(- 1)
# [1] NaN
But to no avail.
Follow-up question: Are there best practices that I am unknowingly ignoring?
How about:
options(warning.expression =
quote({
if(exists("last.warning",baseenv()) && !is.null(last.warning)){
txt = paste0(names(last.warning),collapse=" ")
try(suppressWarnings(flog.warn(txt)))
cat("Warning message:\n",txt,'\n',sep = "")
}
}))
In can contribute two options to log R conditions like warnings with futile.logger and catch all warnings no matter how deep the function call stack is:
Wrap your code with withCallingHandlers for a basic solution
Use my package tryCatchLog for an advanced solution (for compliance reason: I am the author)
To explain the solutions I have created a simple R script that produces warnings and errors:
# Store this using the file name "your_code_file.R"
# This could be your code...
f1 <- function(value) {
print("f1() called")
f2(value) # call another function to show what happens
print("f1() returns")
}
f2 <- function(value) {
print("f2() called")
a <- log(-1) # This throws a warning: "NaNs produced"
print(paste("log(-1) =", a))
b <- log(value) # This throws an error if you pass a string as value
print("f2() returns")
}
f1(1) # produces a warning
f1("not a number") # produces a warning and an error
Executed "as is" (without logging) this code produces this output:
[1] "f1() called"
[1] "f2() called"
[1] "log(-1) = NaN"
[1] "f2() returns"
[1] "f1() returns"
[1] "f1() called"
[1] "f2() called"
[1] "log(-1) = NaN"
Error in log(value) : non-numeric argument to mathematical function
Calls: source -> withVisible -> eval -> eval -> f1 -> f2
In addition: Warning messages:
1: In log(-1) : NaNs produced
2: In log(-1) : NaNs produced
Solution 1 (withCallingHandlers)
Create a new R file that is called by R and sources your unchanged (!) original R script:
# Store this using the file name "logging_injector_withCallingHandlers.R"
# Main function to inject logging of warnings without changing your original source code
library(futile.logger)
flog.threshold(INFO)
# Injecting the logging of errors and warnings:
tryCatch(withCallingHandlers({
source("your_code_file.R") # call your unchanged code by sourcing it!
}, error = function(e) {
call.stack <- sys.calls() # "sys.calls" within "withCallingHandlers" is like a traceback!
log.message <- e$message
flog.error(log.message) # let's ignore the call.stack for now since it blows-up the output
}, warning = function(w) {
call.stack <- sys.calls() # "sys.calls" within "withCallingHandlers" is like a traceback!
log.message <- w$message
flog.warn(log.message) # let's ignore the call.stack for now since it blows-up the output
invokeRestart("muffleWarning") # avoid warnings to "bubble up" to being printed at the end by the R runtime
})
, error = function(e) {
flog.info("Logging injector: The called user code had errors...")
})
If you execute this wrapper code the R output is:
$ Rscript logging_injector_withCallingHandlers.R
NULL
[1] "f1() called"
[1] "f2() called"
WARN [2017-06-08 22:35:53] NaNs produced
[1] "log(-1) = NaN"
[1] "f2() returns"
[1] "f1() returns"
[1] "f1() called"
[1] "f2() called"
WARN [2017-06-08 22:35:53] NaNs produced
[1] "log(-1) = NaN"
ERROR [2017-06-08 22:35:53] non-numeric argument to mathematical function
INFO [2017-06-08 22:35:53] Logging injector: The called user code had errors...
As you can see
warnings are logged now
the call stack could be output too (I have disabled this to avoid flooding this answer)
References: https://stackoverflow.com/a/19446931/4468078
Solution 2 - package tryCatchLog (I am the author)
Solution 1 has some drawbacks, mainly:
The stack trace ("traceback") does not contain file names and line numbers
The stack trace is flooded with internal function calls you don't want to see (believe me or try it with your non-trival R scripts ;-)
Instead of copying&pasting the above code snippet again and again I have developed a package that encapsulates the above withCallingHandlers logic in a function and adds additional features like
logging of errors, warnings and messages
identifying the origin of errors and warnings by logging a stack trace with a reference to the source file name and line number
support post-mortem analysis after errors by creating a dump file with all variables of the global environment (workspace) and each function called (via dump.frames) - very helpful for batch jobs that you cannot debug on the server directly to reproduce the error!
To wrap the above R script file using tryCatchLog create a wrapper file
# Store this using the file name "logging_injector_tryCatchLog.R"
# Main function to inject logging of warnings without changing your original source code
# install.packages("devtools")
# library(devtools)
# install_github("aryoda/tryCatchLog")
library(tryCatchLog)
library(futile.logger)
flog.threshold(INFO)
tryCatchLog({
source("your_code_file.R") # call your unchanged code by sourcing it!
#, dump.errors.to.file = TRUE # Saves a dump of the workspace and the call stack named dump_<YYYYMMDD_HHMMSS>.rda
})
and execute it via Rscript to get this (shortened!) result:
# $ Rscript -e "options(keep.source = TRUE); source('logging_injector_tryCatchLog.R')" > log.txt
[1] "f1() called"
[1] "f2() called"
WARN [2017-06-08 23:13:31] NaNs produced
Compact call stack:
1 source("logging_injector_tryCatchLog.R")
2 logging_injector_tryCatchLog.R#12: tryCatchLog({
3 logging_injector_tryCatchLog.R#13: source("your_code_file.R")
4 your_code_file.R#18: f1(1)
5 your_code_file.R#6: f2(value)
6 your_code_file.R#12: .signalSimpleWarning("NaNs produced", quote(log(-1)))
Full call stack:
1 source("logging_injector_tryCatchLog.R")
2 withVisible(eval(ei, envir))
...
<a lot of logging output omitted here...>
As you can see clearly at the call stack level 6 the source code file name and line number (#12) is logged as the source of the warning together with the source code snippet throwing the warning:
6 your_code_file.R#12 .signalSimpleWarning("NaNs produced", quote(log(-1)))
The way you should use futile.logger is shown in its documentation. Here us a quick example, of how it typically is used, and how to set the threshold.
# set log file to write to
flog.appender (appender.file ("mylog.log"))
# set log threshold, this ensures only logs at or above this threshold are written.
flog.threshold(WARN)
flog.info("DOES NOT LOG")
flog.warn ("Logged!")

Possible reasons for " Error in checkForRemoteErrors(lapply(cl, recvResult)) : ... object '.doSnowGlobals' not found" Error?

I have been executing a function script repeatedly in R for many years. Within the function definition, I set up a parallel cluster using on my multi-core Windows workstation using:
# cores0 <- 20 (cores set to 20 outside of function definition)
cl.spec <- rep("localhost", cores0)
cl <- makeCluster(cl.spec, type="SOCK", outfile="")
registerDoParallel(cl, cores=cores0)
As of yesterday, my function execution is no longer working, and was getting hung up for hours. (Additionally, using the Resource Monitor, I could see that none of my CPUs were active despite my script specifying 20 cores). When I went back into the function and tested line, by line, I discovered that the following line is not executing (i.e., is getting hung up when it would usually execute in a few seconds):
cl.spec <- rep("localhost", cores0)
cl <- makeCluster(cl.spec, type="SOCK", outfile="")
I tried looking up the problem and found several references to using "PSOCK" type, but could not determine when to use PSOCK versus SOCK. Nonetheless, I attempted the same script using "PSOCK" instead of "SOCK":
cl <- makeCluster(cl.spec, type="PSOCK", outfile="")
registerDoParallel(cl, cores=cores0)
With the PSOCK modification, it no longer got hung up and it appeared to execute this as well as the registerDoParallel() call.
However, when I then executed the complete function containing the above two lines and then called the function, as below, I got an error I had never seen:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
20 nodes produced errors; first error: object '.doSnowGlobals' not found
I also tried not specifying the type or outfile, but this produced the identical error as using type="PSOCK"
cl <- makeCluster(cl.spec)
registerDoParallel(cl, cores=cores0)
My questions:
1. Why might the makeCluster() line be getting hung up when it never has before?
cl <- makeCluster(cl.spec, type="SOCK", outfile="")
The problem happens whether I have only the parallel and doParallel packages loaded AND if I also have the snow and doSNOW packages loaded. Are all 4 packages required to execute foreach() commands?
Here is the function definition and function call containing the makeCluster() and registerDoParallel() calls, as above:
# FUNCTION DEFINITION
FX_RFprocessingSNPruns <- function(path, CurrentRoundSNPlist, colSAMP, Nruns, ntreeIN, coresIN,CurrentRoundGTframeRDA){
...do a bunch of steps ...
#&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
# SET UP INTERNAL FUNCTION
#&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
ImpOOBerr<-function(x,y,d) {
create function
}
#################################################################
# SET UP THE CLUSTER
#################################################################
#Setup clusters via parallel/DoParallel
cl.spec <- rep("localhost", cores0)
cl <- makeCluster(cl.spec, type="PSOCK", outfile="")
registerDoParallel(cl, cores=cores0)
#################################################################
# *** EMPLOY foreach TO CARRY OUT randomForest IN PARALLEL
#################################################################
system.time(RFoutput_runs <- foreach(i=1:Nruns0, .combine='cbind', .packages= 'randomForest', .inorder=FALSE, .multicombine=TRUE, .errorhandling="remove")
%dopar% {
...do a bunch of steps ...
ImpOOBerr(x,y,d)
})
#################################################################
# STOP THE CLUSTER
#################################################################
stopCluster(cl)
return(RFoutput_runs)
}
# CALL FUNCTION
path0="C:/USERS/KDA/WORKING/"
system.time(GTtest_5runs <- FX_RFprocessingSNPruns(
path=path0,
CurrentRoundSNPlist="SNPlist.rda",
colSAMP=20,
Nruns=5,
ntreeIN=150,
coresIN=5,
CurrentRoundGTframeRDA="GT.rda"))
#Error in checkForRemoteErrors(lapply(cl, recvResult)) :
# 20 nodes produced errors; first error: object '.doSnowGlobals' not found.
I found these posts that reference the error, but the solutions are not working for me:
error: object '.doSnowGlobals' not found?
http://grokbase.com/t/r/r-sig-hpc/148880dpsm/error-object-dosnowglobals-not-found
I'm working on Windows 8 machine, 64-bit with 40 cores.
R.Version()
$platform
[1] "x86_64-w64-mingw32"
$arch
[1] "x86_64"
$os
[1] "mingw32"
$system
[1] "x86_64, mingw32"
$status
[1] ""
$major
[1] "3"
$minor
[1] "3.0"
$year
[1] "2016"
$month
[1] "05"
$day
[1] "03"
$`svn rev`
[1] "70573"
$language
[1] "R"
$version.string
[1] "R version 3.3.0 (2016-05-03)"
$nickname
[1] "Supposedly Educational"
R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
It was institutional anti-virus software preventing access to the cores.

Error when using cv.tree

Hi I tried using the function cv.tree from the package tree. I have a binary categorical response (called Label) and 30 predictors. I fit a tree object using all predictors.
I got the following error message that I don't understand:
Error in as.data.frame.default(data, optional = TRUE) :
cannot coerce class ""function"" to a data.frame
The data is the file 'training' taken from this site.
This is what I did:
x <- read.csv("training.csv")
attach(x)
library(tree)
Tree <- tree(Label~., x, subset=sample(1:nrow(x), nrow(x)/2))
CV <- cv.tree(Tree,FUN=prune.misclass)
The error occurs once cv.tree calls model.frame. The 'call' element of the tree object must contain a reference to a data frame whose name is also not the name of a loaded function.
Thus, not only will subsetting in the call to tree generate the error when cv.tree later uses the 'call' element of the tree object, using a dataframe with a name like "df" would give an error as well because model.frame will take this to be name of an existing function (i.e. the 'density of F distribution' from the stats package).
I think the problem is in the dependent variable list. The following works, but I think you need to read the problem description more carefully. First, setup the formula without weight.
x <- read.csv("training.csv")
vars<-setdiff(names(x),c("EventId","Label","Weight"))
fmla <- paste("Label", "~", vars[1], "+",
paste(vars[-c(1)], collapse=" + "))
Here's what you've been running
Tree <- tree(fmla, x, subset=sample(1:nrow(x), nrow(x)/2))
plot(Tree)
$size
[1] 6 5 4 3 1
$dev
[1] 25859 25859 27510 30075 42725
$k
[1] -Inf 0.0 1929.0 2791.0 6188.5
$method
[1] "misclass"
attr(,"class")
[1] "prune" "tree.sequence"
You may want to consider package rpart also
urows = sample(1:nrow(x), nrow(x)/2)
x_sub <- x[urows,]
Tree <- tree(fmla, x_sub)
plot(Tree)
CV <- cv.tree(Tree,FUN=prune.misclass)
CV
library(rpart)
tr <- rpart(fmla, data=x_sub, method="class")
printcp(tr)
Classification tree:
rpart(formula = fmla, data = x_sub, method = "class")
Variables actually used in tree construction:
[1] DER_mass_MMC DER_mass_transverse_met_lep
[3] DER_mass_vis
Root node error: 42616/125000 = 0.34093
n= 125000
CP nsplit rel error xerror xstd
1 0.153733 0 1.00000 1.00000 0.0039326
2 0.059274 2 0.69253 0.69479 0.0035273
3 0.020016 3 0.63326 0.63582 0.0034184
4 0.010000 5 0.59323 0.59651 0.0033393
If you include weight, then that is the only split.
vars<-setdiff(names(x),c("EventId","Label"))

What arguments were passed to the functions in the traceback?

In R, if execution stops because of an error, I can evaluate traceback() to see which function the error occurred in, which function was that function called from, etc. It'll give something like this:
8: ar.yw.default(x, aic = aic, order.max = order.max, na.action = na.action,
series = series, ...)
7: ar.yw(x, aic = aic, order.max = order.max, na.action = na.action,
series = series, ...)
6: ar(x[, i], aic = TRUE)
5: spectrum0.ar(x)
4: effectiveSize(x)
Is there a way to find what arguments were passed to these functions? In this case, I'd like to know what arguments were passed to effectiveSize(), i.e. what is x.
The error does not occur in my own code, but in a package function. Being new to R, I'm a bit lost.
Not knowing how to do this properly, I tried to find the package function's definition and modify it, but where the source file should be I only find an .rdb file. I assume this is something byte-compiled.
I'd suggest setting options(error=recover) and then running the offending code again. This time, when an error is encountered, you'll be thrown into an interactive debugging environment in which you are offered a choice of frames to investigate. It will look much like what traceback() gives you, except that you can type 7 to enter the evaluation environment of call 7 on the call stack. Typing ls() once you've entered a frame will give you the list of its arguments.
An example (based on that in ?traceback) is probably the best way to show this:
foo <- function(x) { print(1); bar(2) }
bar <- function(x) { x + a.variable.which.does.not.exist }
## First with traceback()
foo(2) # gives a strange error
# [1] 1
# Error in bar(2) : object 'a.variable.which.does.not.exist' not found
traceback()
# 2: bar(2) at #1
# 1: foo(2)
## Then with options(error=recover)
options(error=recover)
foo(2)
# [1] 1
# Error in bar(2) : object 'a.variable.which.does.not.exist' not found
#
# Enter a frame number, or 0 to exit
#
# 1: foo(2)
# 2: #1: bar(2)
Selection: 1
# Called from: top level
Browse[1]> ls()
# [1] "x"
Browse[1]> x
# [1] 2
Browse[1]> ## Just press return here to go back to the numbered list of envts.
#
# Enter a frame number, or 0 to exit
#
# 1: foo(2)
# 2: #1: bar(2)
R has many helpful debugging tools, most of which are discussed in the answers to this SO question from a few years back.
You can use trace() to tag or label a function as requiring a "detour" to another function, the logical choice being browser().
?trace
?browser
> trace(mean)
> mean(1:4)
trace: mean(1:4)
[1] 2.5
So that just displayed the call. This next mini-session shows trace actually detouring into the browser:
> trace(mean, browser)
Tracing function "mean" in package "base"
[1] "mean"
> mean(1:4)
Tracing mean(1:4) on entry
Called from: eval(expr, envir, enclos)
Browse[1]> x #once in the browser you can see what values are there
[1] 1 2 3 4
Browse[1]>
[1] 2.5
> untrace(mean)
Untracing function "mean" in package "base"
As far as seeing what is in a function, if it is exported, you can simply type its name at the console. If it is not exported then use: getAnywhere(fn_name)

Resources