I am new to R parallel processing, I'm trying to move this from a very slow working for-loop to multithread and this I cant figure out Error in checkForRemoteErrors(val) : one node produced an error: promise already under evaluation: recursive default argument reference or earlier problems?
my xml wrangling is poor but this serves the purpose. The error comes from
process_entry <- function(ent,component_entry, flattenXMLx=flattenXMLx, EntryComponent=EntryComponent, componentv=componentv){
library(data.table)
flat<-NULL
path <- character(0)
df <- data.frame(elem. = character(0), templateID = character(0), elemid. = integer(0), attr. = character(0), value. = character(0), stringsAsFactors = FALSE)
component<-list()
component[["Document"]][["block"]][["Body"]][['component']][['section']]<-component_entry[ent]
entryc<-component[["Document"]][["block"]][["Body"]][['component']][['section']][['entry']][['organ']]
y <- list()
for (a in 1:length(entryc)) {
if (any(names(entryc[a]) != "component")) {
y = c(y, a)
}
}
if (length(y)>0){
entry_component <- component_entry[[ent]]$organizer
} else{
entry_component<-NULL
}
if (any(length(entry_component)>0)){
entcom<- EntryComponent(entry_component)
} else {
entcom<-NULL
}
componente <- flattenXMLx(xml2::xml_root(xml2::as_xml_document(component)), df, path)
if(any(!is.na(entcom))){
componente <<- dplyr::bind_rows(componente,entcom)}
componentv <<- dplyr::bind_rows(componentv, componente)
return(componentv)
}
this calls the function
componentv <- do.call(cbind, parLapply(cl, 1:length(component_entry), process_entry, component_entry=component_entry) )
the issue seems to come from componentv <<- dplyr::bind_rows(componentv, componente)
Related
I want to create a custom log function, that would get used in other functions. I am having issues with the custom function where arguments don't seem to flow through to the inner log function. My custom log function is inspired by the logger package but I am planning to expand this usage a bit further (so logger doesn't quite meet my needs)
log_fc <- function(type = c("INFO", "ERROR"), ...) {
print(
glue::glue("[{type} {Sys.time()}] ", ...)
)
}
Next I am planning to use log_fc in various other custom functions, one example:
test_fc <- function(forecast) {
log_fc(type = "INFO", "{forecast} is here")
#print(forecast)
}
If I test this, I get the following error:
> test_fc(forecast = "d")
Error in eval(parse(text = text, keep.source = FALSE), envir) :
object 'forecast' not found
I am not sure why argument forecast is not being picked up by the inner test_fc function. TIA
You could use the .envir argument:
log_fc <- function(type = c("INFO", "ERROR"), ...) {
env <- new.env(parent=parent.frame())
assign("type",type,env)
print(
glue::glue("[{type} {Sys.time()}] ", ...,.envir = env)
)
}
test_fc <- function(forecast) {
log_fc(type = "INFO", "{forecast} is here")
}
test_fc("My forecast")
#> [INFO 2022-12-18 12:44:11] My forecast is here
There are two things going on.
First, the name forecast is never passed to log_fc. The paste solution never needs the name, it just needs the value, so it still works. You'd need something like
log_fc(type = "INFO", "{forecast} is here", forecast = forecast)
to get the name into log_fc.
The second issue is more complicated. It's a design decision in many tidyverse functions. They want to be able to have code like f(x = 3, y = x + 1) where the x in the second argument gets the value that was bound to it in the first argument.
Standard R evaluation rules would not do that; they would look for x in the environment where f was called, so f(y = x + 1, x = 3) would bind the same values in the function as putting the arguments in the other order.
The tidyverse implementation of this non-standard evaluation messes up R's internal handling of .... The workaround (described here: https://github.com/tidyverse/glue/issues/231) is to tell glue() to evaluate the arguments in a particular location. You need to change your log function to fix this.
One possible change is shown below. I think #Waldi's change is actually better, but I'll leave this one to show a different approach.
log_fc <- function(type = c("INFO", "ERROR"), ...) {
# Get all the arguments from ...
args <- list(...)
# The unnamed ones are messages, the named ones are substitutions
named <- which(names(args) != "")
# Put the named ones in their own environment
e <- list2env(args[named])
# Evaluate the substitutions in the new env
print(
glue::glue("[{type} {Sys.time()}] ", ..., .envir = e)
)
}
test_fc <- function(forecast) {
log_fc(type = "INFO", "{forecast} is here", forecast = forecast)
}
test_fc(forecast = "d")
#> [INFO 2022-12-18 06:25:29] d is here
Created on 2022-12-18 with reprex v2.0.2
The reason for this is that when your test_fc function connects to the log_fc function, the forecats variable wouldn't be able to be found, because it's not a global function; thus, you can't access it from the other function.
The way to fix this is by defining a global variable:
log_fc <- function(type = c("INFO", "ERROR"), ...) {
print(
glue::glue("[{type} {Sys.time()}] ", ...)
)
}
test_fc <- function(forecast) {
forecast <<- forecast
log_fc(type = "INFO", "{forecast} is here")
}
print(test_fc(forecast = "d"))
Output:
d is here
Since you're already using glue you could use another glue::glue in test_fc to accomplish the pass-through, such as:
log_fc <- function(type = c("INFO", "ERROR"), ...) {
print(
glue::glue("[{type} {Sys.time()}] ", ...)
)
}
test_fc <- function(forecast) {
log_fc(type = "INFO", glue::glue("{forecast} is here"))
}
which yields
> test_fc('arctic blast')
[INFO 2022-12-21 15:56:18] arctic blast is here
>
I want to create a large lookup table of key value pairs, attempting it like this:
# actual use case is length ~5 million
key <- do.call(paste0, Map(stringi::stri_rand_strings, n=2e5, length = 16))
val <- sample.int(750, size = 2e5, replace = T)
make_dict <- function(keys, values){
require(rlang)
e <- new.env(size = length(keys))
l <- list2(!!!setNames(values, keys))
list2env(l, envir = e, hash = T) # problem in here...?
}
d <- make_dict(key, val)
Problem
When make_dict is run it throws Error: protect(): protection stack overflow. Specifically in RStudio when the input is a vector of length is greater than 49991, which seems very similar to this stackoverflow post.
However, when I run accessor functions to grab some of the values, it seems that make_dict ran fine after all, as I can't find any oddities in its result:
`%||%` <- function(x,y) if(is.null(x)) y else x
grab <- function(...){
vector("integer", length(..2)) |>
(\(.){. = Vectorize(\(e, x) e[[x]] %||% NA_integer_, list("x"), T, F)(..1, ..2); .})()
}
out <- vector("integer", length(key))
out <- grab(d, sample(key)) # using sample to scramble the keys
anyNA(out) | !lobstr::obj_size(out) == lobstr::obj_size(val)
[1] FALSE
Running the same code in RGui does not throw the error.
Oddities
The d environment object does not appear in environment pane in RStudio for size > 5e4.
The R console returns swiftly back to > (signaling the function has finished), but is unresponsive until the error is thrown
Error is thrown if manually setting options(expressions = 5e5), or retaining the default value of 5000
When the error is thrown is proportional to the size of the input vector
tryCatch(make_dict(key, val), error = function(e) e) doesn't catch an error
The error also occurs if code is run from package (Packaged version available through remotes::install_github("D-Se/minimal"))
Question
What's going on here? How to troubleshoot such an error?
options(error = traceback) as advised here didn't give any results. Inserting a browser() after list2env in the make_dict function throws an error long after the browser has opened. A traceback() gives the function .rs.describeObject, which is used to generate the summary in the Environment pane, and can be found here.
traceback()
# .rs.describeObject
(function (env, objName, computeSize = TRUE)
{
obj <- get(objName, env)
hasNullPtr <- .Call("rs_hasExternalPointer", obj, TRUE, PACKAGE = "(embedding)")
if (hasNullPtr) {
val <- "<Object with null pointer>"
desc <- "An R object containing a null external pointer"
size <- 0
len <- 0
}
else {
val <- "(unknown)"
desc <- ""
size <- if (computeSize)
object.size(obj)
else 0
len <- length(obj)
}
class <- .rs.getSingleClass(obj)
contents <- list()
contents_deferred <- FALSE
if (is.language(obj) || is.symbol(obj)) {
val <- deparse(obj)
}
else if (!hasNullPtr) {
if (size > 524288) {
len_desc <- if (len > 1)
paste(len, " elements, ", sep = "")
else ""
if (is.data.frame(obj)) {
val <- "NO_VALUE"
desc <- .rs.valueDescription(obj)
}
else {
val <- paste("Large ", class, " (", len_desc,
format(size, units = "auto", standard = "SI"),
")", sep = "")
}
contents_deferred <- TRUE
}
else {
val <- .rs.valueAsString(obj)
desc <- .rs.valueDescription(obj)
if (class == "data.table" || class == "ore.frame" ||
class == "cast_df" || class == "xts" || class ==
"DataFrame" || is.list(obj) || is.data.frame(obj) ||
isS4(obj)) {
if (computeSize) {
contents <- .rs.valueContents(obj)
}
else {
val <- "NO_VALUE"
contents_deferred <- TRUE
}
}
}
}
list(name = .rs.scalar(objName), type = .rs.scalar(class),
clazz = c(class(obj), typeof(obj)), is_data = .rs.scalar(is.data.frame(obj)),
value = .rs.scalar(val), description = .rs.scalar(desc),
size = .rs.scalar(size), length = .rs.scalar(len), contents = contents,
contents_deferred = .rs.scalar(contents_deferred))
})(<environment>, "d", TRUE)
This github issue pointed out by #technocrat talks about a known bug in earlier versions of RStudio of disabling null external pointer checks, and has since been solved by adding an additional preference check in .rs.describeObject() of
.rs.readUiPref("check_null_external_pointers")
To check if code is run from within RStudio, and if that version is lower than that of before a certain version number (here I use the current official release), a check can be included in the function, or in the .OnAttach of a package:
if(!is.na(Sys.getenv("RSTUDIO", unset = NA)) && .rs.api.versionInfo()$version < "2021.9.1.372")){
# warning or action
}
edit: added full code
I made an S4 method for 'plot' that appears to be working, except it outputs some stray NULL to the console and I can't figure out where it's coming from. Here's the top level code:
print(plot(x = flux, y = 1, fastplot = TRUE, quietly = TRUE))
And the class:
flux <- setClass(
# Set the class name
"flux",
slots = c(
raw.data = "list",
source.files = "character",
data = "matrix",
time = "POSIXct",
datatype = "character",
metadata = "data.frame"
)
)
And the method:
setMethod("plot",
signature(x = "flux"),
function (x, y, ...) {
CheckFluxObject(x)
params <- LoadDefaults(flux = x)
# Interpret 'plot' arguments
par.restore <- par(no.readonly = TRUE)
on.exit(expr = par(par.restore), add = TRUE)
arguments <- list(...)
if (!("fastplot" %in% names(arguments))) {
fastplot <- FALSE
} else {
fastplot <- arguments$fastplot
arguments$fastplot <- NULL
}
if (!("quietly" %in% names(arguments))) {
quietly <- FALSE
} else {
quietly <- arguments$quietly
arguments$quietly <- NULL
}
par(ask=!(fastplot))
if (!("ylab" %in% arguments)) {
ylab <- params["units"]
} else {
ylab <- arguments$ylab
arguments$ylab <- NULL
}
# Pull relevant 'flux' class object data
data <- slot(x, "data")
if (missing("y")) {
y <- 1:ncol(data)
} else {
stopifnot(
is.integer(y),
all(y %in% 1:ncol(data))
)
}
# Bulk function execution
if (quietly == FALSE) {
message("Plotting data traces:")
}
plot.obj <- plot.new()
print("NULL is in the 'for' loop...")
for (i in y){
main <- colnames(data)[i]
plot.obj <- plot(slot(x, "time"), data[, i], main = main,
xlab = "Time", ylab = ylab, unlist(arguments))
print(plot.obj)
}
print("but is it also here??")
# Clean-up and exit
if (quietly == FALSE) {
message("Done plotting.")
}
if (length(y) == 1) {
invisible(plot.obj)
}
print("or here??")
invisible(NULL)
}
)
The output for that is:
[1] "NULL is in the 'for' loop..."
NULL
[1] "but is it also here??"
[1] "or here??"
NULL
If I throw in another print("what about here??") after the invisible(NULL),
then it does this:
[1] "NULL is in the 'for' loop..."
NULL
[1] "but is it also here??"
[1] "or here??"
[1] "what about here??"
[1] "what about here??"
Is there some behavior of the function return or print commands that I'm not anticipating? The CheckFluxObject function just checks to make sure all the slots are filled.
I'll leave this here till a better answer pops up, if ever:
Apparently the print method for plot objects returns a NULL, and if you're trying to generate a plot within a function it seems like the best way to do that is using invisible(plot.object) or invisible(plot(x, y, ...)), NOT print.
I'm still not sure where the 2nd NULL is coming from...
edit: Found the second one! Just like the print(plot.obj) in the method itself, the print in the top-level code was throwing a NULL. Dropping all of the print commands killed all of the ghosts.
Context
I am using in R, the "elipsis" or "dots" that wrap function calls
main_function <- function(...)
If I want to evaluate once, I do
main_function <- function(...) {
res = list(...)}
It works fine
Problem
fun_A <- function(arg_A){
print(paste("I am A", paste0(round(runif(arg_A, 0,1), 2),collapse = ", ")))
}
fun_B <- function(arg_B){
print(paste("I am B", paste0(round(runif(arg_B, 1,2), 2),collapse = ", ")))
}
Here the result is evaluated once and replicate 3 times :
main_fun_wrong <- function(..., times) {
res = list(...)
replicate(times, eval(res))
}
main_fun_wrong(fun_A(1), fun_B(2), times = 3)
Here it works :
main_fun <- function(..., times) {
calls = match.call(expand.dots = FALSE)$`...`
replicate(times, lapply(1:length(calls), function(num) eval(calls[[num]])), simplify = F)
}
main_fun(fun_A(1),fun_B(2), times = 3)
But now if arg_A is an object rather than a value, it will fail finding the arg_A and arg_B in the environment.
main_fun_problem <- function(arg_A, arg_B) {
main_fun(fun_A(arg_A),fun_B(arg_B), times = 3)
}
main_fun_problem(1,2)
I got an error :
Error in fun_A(arg_A) : object 'arg_A' not found
I do not know what R do when it find list(...) the first time in first example but I just want to repeat it multiple times.
Here is my solution, any alternative will be enjoyed.
The things is to substitute the variable by it's value at the moment we call the function.
main_fun_solution <- function(arg_A, arg_B) {
eval(substitute(main_fun(fun_A(arg_A),fun_B(arg_B), times = 3), list("arg_A" = arg_A, "arg_B" = arg_B)))
}
main_fun_solution(1,2)
NB: list("arg_A" = arg_A, "arg_B" = arg_B)` makes my heart bleed (the overall solution actually)
I am using a function from a package. this function returns some values. For example:
k<-dtw(v1,v2, keep.internals=TRUE)
and I can get this value:
k$costMatrix
Does it possible to see the source code of costMatrix? if yes how can I do that?
UPDATE
this is the source code of the function:
function (x, y = NULL, dist.method = "Euclidean", step.pattern = symmetric2,
window.type = "none", keep.internals = FALSE, distance.only = FALSE,
open.end = FALSE, open.begin = FALSE, ...)
{
lm <- NULL
if (is.null(y)) {
if (!is.matrix(x))
stop("Single argument requires a global cost matrix")
lm <- x
}
else if (is.character(dist.method)) {
x <- as.matrix(x)
y <- as.matrix(y)
lm <- proxy::dist(x, y, method = dist.method)
}
else if (is.function(dist.method)) {
stop("Unimplemented")
}
else {
stop("dist.method should be a character method supported by proxy::dist()")
}
wfun <- .canonicalizeWindowFunction(window.type)
dir <- step.pattern
norm <- attr(dir, "norm")
if (!is.null(list(...)$partial)) {
warning("Argument `partial' is obsolete. Use `open.end' instead")
open.end <- TRUE
}
n <- nrow(lm)
m <- ncol(lm)
if (open.begin) {
if (is.na(norm) || norm != "N") {
stop("Open-begin requires step patterns with 'N' normalization (e.g. asymmetric, or R-J types (c)). See papers in citation().")
}
lm <- rbind(0, lm)
np <- n + 1
precm <- matrix(NA, nrow = np, ncol = m)
precm[1, ] <- 0
}
else {
precm <- NULL
np <- n
}
gcm <- globalCostMatrix(lm, step.matrix = dir, window.function = wfun,
seed = precm, ...)
gcm$N <- n
gcm$M <- m
gcm$call <- match.call()
gcm$openEnd <- open.end
gcm$openBegin <- open.begin
gcm$windowFunction <- wfun
lastcol <- gcm$costMatrix[np, ]
if (is.na(norm)) {
}
else if (norm == "N+M") {
lastcol <- lastcol/(n + (1:m))
}
else if (norm == "N") {
lastcol <- lastcol/n
}
else if (norm == "M") {
lastcol <- lastcol/(1:m)
}
gcm$jmin <- m
if (open.end) {
if (is.na(norm)) {
stop("Open-end alignments require normalizable step patterns")
}
gcm$jmin <- which.min(lastcol)
}
gcm$distance <- gcm$costMatrix[np, gcm$jmin]
if (is.na(gcm$distance)) {
stop("No warping path exists that is allowed by costraints")
}
if (!is.na(norm)) {
gcm$normalizedDistance <- lastcol[gcm$jmin]
}
else {
gcm$normalizedDistance <- NA
}
if (!distance.only) {
mapping <- backtrack(gcm)
gcm <- c(gcm, mapping)
}
if (open.begin) {
gcm$index1 <- gcm$index1[-1] - 1
gcm$index2 <- gcm$index2[-1]
lm <- lm[-1, ]
gcm$costMatrix <- gcm$costMatrix[-1, ]
gcm$directionMatrix <- gcm$directionMatrix[-1, ]
}
if (!keep.internals) {
gcm$costMatrix <- NULL
gcm$directionMatrix <- NULL
}
else {
gcm$localCostMatrix <- lm
if (!is.null(y)) {
gcm$query <- x
gcm$reference <- y
}
}
class(gcm) <- "dtw"
return(gcm)
}
but if I write globalCostMatrix I dont get the source code of this function
The easiest way to find how functions work is by looking at the source. You have a good chance that by typing function name in the R console, you will get the function definitions (although not always with good layout, so seeking the source where brackets are present, is a viable option).
In your case, you have a function dtw from the same name package. This function uses a function called globalCostMatrix. If you type that name into R, you will get an error that object was not found. This happens because the function was not exported when the package was created, probably because the author thinks this is not something a regular user would use (but not see!) or to prevent clashes with other packages who may use the same function name.
However, for an interested reader, there are at least two ways to access the code in this function. One is by going to CRAN, downloading the source tarballs and finding the function in the R folder of the tar ball. The other one, easier, is by using getAnywhere function. This will give you the definition of the function just like you're used for other, user accessible functions like dtw.
> library(dtw)
> getAnywhere("globalCostMatrix")
A single object matching ‘globalCostMatrix’ was found
It was found in the following places
namespace:dtw
with value
function (lm, step.matrix = symmetric1, window.function = noWindow,
native = TRUE, seed = NULL, ...)
{
if (!is.stepPattern(step.matrix))
stop("step.matrix is no stepMatrix object")
n <- nrow(lm)
... omitted for brevity
I think you want to see what the function dtw() does with your data. I seems that it creates a data.frame containing a column named costMatrix.
To find out how the data in the column costMatrix was generated, just type and execute dtw (without brackets!). R will show you the source of the function dtw() afterwards.