force the evaluation of .SD in data.table

force the evaluation of .SD in data.table - r

In order to debug j in data.table I prefer to interactively inspect the resulting -by- dt´s with browser(). SO 2013 adressed this issue and I understand that .SD must be invoked in j in order for all columns to be evaluated. I use Rstudio and using the SO 2013 method, there are two problems:
The environment pane is not updated reflecting the browser environment
I often encounter the following error msg
Error: option error has NULL value
In addition: Warning message:
In get(object, envir = currentEnv, inherits = TRUE) :
restarting interrupted promise evaluation
I can get around this by doing:
f <- function(sd=force(.SD),.env = parent.frame(n = 1)) {
by = .env$.BY;
i = .env$.I;
sd = .env$.SD;
grp = .env$.GRP;
N = .env$.N;
browser()
}
library (data.table)
setDT(copy(mtcars))[,f(.SD),by=.(gear)]
But - in the data.table spirit of keeping things short and sweet- can I somehow force (the force in f does not work) the evaluation of .SD in the call to f so that the final code could run:
setDT(copy(mtcars))[,f(),by=.(gear)]

As far as I know,
data.table needs to explicitly see .SD somewhere in the code passed to j,
otherwise it won't even expose it in the environment it creates for the execution.
See for example this question and its comments.
Why don't you create a different helper function that always specifies .SD in j?
Something like:
dt_debugger <- function(dt, ...) {
f <- function(..., .caller_env = parent.frame()) {
by <- .caller_env$.BY;
i <- .caller_env$.I;
sd <- .caller_env$.SD;
grp <- .caller_env$.GRP;
N <- .caller_env$.N;
browser()
}
dt[..., j = f(.SD)]
}
dt_debugger(as.data.table(mtcars), by = .(gear))

Related

R: cannot make use of `data.table`'s environment, base environment and global environment all at once

Consider the dummy example below: I want to run a model on a range of subsets of the data.table in a loop, and want to specify the exact line to iterate as a string (with an iterator i)
library(data.table)
DT <- data.table(X = runif(100), Y = runif(100))
f1 <- function(code) {
for (i in c(20,30,50)) {
eval(parse(text = code))
}
}
f1("lm(X ~ Y, data = DT[sample(.N, i)])")
Obviously this doesn't return any output as lm() is merely evaluated in the background 3 times. The actual use case is more convoluted, but this is meant to be a theoretical simplification of it.
The example above, nonetheless, works fine. The problems begin when the function f1 is included in the package, instead of being defined in the global environment. If I'm not mistaken, in this case f1 is defined in the package's base env. Then, calling f1 from global env gives the error: Error in [.data.frame(x, i) : undefined columns selected. R can correctly access iterator i in its base env and DT in the global env, but cannot access the column by name inside data.table's square brackets.
I tried experimenting by setting envir and enclos arguments to eval() to baseenv(), globalenv(), parent.frame(), but haven't managed to find a combination that works.
For example, setting envir = globalenv() seems to result in accessing DT and i, but not X and Y from the DT inside lm(). Setting envir = baseenv() we lose the global env and cannot access DT (envir = baseenv(), enclos = globalenv() doesn't change it). Using envir = list(baseenv(), globalenv()) results in not being able to access anything inside data.table's square brackets, I think, error message: "Error in [.data.frame(x, i) : undefined columns selected".

The problem is that variables are resolved lexicographically. You could try passing in the expression and the substituting the value of i specifically before evaluating. This would take care of eliminating the need for explicit parsing.
f1 <- function(code) {
code <- substitute(code)
for (i in c(20,30,50)) {
cmd <- do.call("substitute", list(code, list(i=i)))
print(cmd)
result <- eval.parent(cmd)
print(result)
}
}
f1(lm(X ~ Y, data = DT[sample(.N, i)]))

How to use str_glue in data.table (environments question)

In a real life project I am trying to create a function to be used in data.table (j) which relies on string formatting using the str_glue function. I get an error related to environments.
What follows below is a trivial example (converting mpg to l/100km in the mtcars dataset) to highlight the behavior. My example might seem convoluted (e.g. a list holding only 1 element) but this is to better reflect my real-life problem which is of course more complex.
01 - Setting up the data.table and the list holding the conversion factor
conversion_rates = list(mpg_lkm = 282.5)
dt = as.data.table(mtcars, keep.rownames = "rn")
02 - Creating a function to calculate liters per 100km
mpg_to_lkm = function(mpg, cr) {
cri = cr$mpg_lkm
return(cri / mpg)
}
03 - using the function in j, by passing the column as the first argument, and the conversion rates from the global environment as the second
dt[, lkm := mpg_to_lkm(mpg, conversion_rates)] # passing conversion_rates works
head(dt)
So far so good. I seem to have established that you can pass arguments from the data.table scope and global scope without any problem.
04 - defining a function that will create a 'character' column which explains the formula
explain_formula = function(mpg, cr) {
print(ls(envir = environment())) # proves that mpg & cr are available in my envir
cri = cr$mpg_lkm
result = paste("lkm obtained by dividing {cri} by", mpg) # curly brace notation
result = purrr::map_chr(result, stringr::str_glue) # str_glue is not vectorised so I use map
return(result)
}
05 - testing my function shows that it runs smoothly
testcol = dt$mpg
result = explain_formula(testcol, conversion_rates)
result # --> output exactly as expected
06 - now trying it from j in data.table
dt[, cr := explain_formula(mpg, conversion_rates)]
throws error
Error in eval(parse(text = text, keep.source = FALSE), envir) :
object 'cri' not found
str_glue has as a default argument envir = parent.frame(). The parent frame of a function evaluation is the environment in which the function was called. In my case this has to be the environment of "explain_formula" right?
I have found out that 2 things work solve my issue:
creating cri in the global environment -> not usefull for my real life problem
passing .envir = environment() to str_glue -> this is my current solution
Can anyone explain me where my reasoning is bad? Why is '06' not giving the desired result?

Is there a way to use tryCatch (or similar) in R as a loop, or to manipulate the expr in the warning argument?

I have a regression model (lm or glm or lmer ...) and I do fitmodel <- lm(inputs) where inputs changes inside a loop (the formula and the data). Then, if the model function does not produce any warning I want to keep fitmodel, but if I get a warning I want to update the model and I want the warning not printed, so I do fitmodel <- lm(inputs) inside tryCatch. So, if it produces a warning, inside warning = function(w){f(fitmodel)}, f(fitmodel) would be something like
fitmodel <- update(fitmodel, something suitable to do on the model)
In fact, this assignation would be inside an if-else structure in such a way that depending on the warning if(w$message satisfies something) I would adapt the suitable to do on the model inside update.
The problem is that I get Error in ... object 'fitmodel' not found. If I use withCallingHandlers with invokeRestarts, it just finishes the computation of the model with the warning without update it. If I add again fitmodel <- lm(inputs) inside something suitable to do on the model, I get the warning printed; now I think I could try suppresswarnings(fitmodel <- lm(inputs)), but yet I think it is not an elegant solution, since I have to add 2 times the line fitmodel <- lm(inputs), making 2 times all the computation (inside expr and inside warning).
Summarising, what I would like but fails is:
tryCatch(expr = {fitmodel <- lm(inputs)},
warning = function(w) {if (w$message satisfies something) {
fitmodel <- update(fitmodel, something suitable to do on the model)
} else if (w$message satisfies something2){
fitmodel <- update(fitmodel, something2 suitable to do on the model)
}
}
)
What can I do?
The loop part of the question is because I thought it like follows (maybe is another question, but for the moment I leave it here): it can happen that after the update I get another warning, so I would do something like while(get a warning on update){update}; in some way, this update inside warning should be understood also as expr. Is something like this possible?
Thank you very much!
Generic version of the question with minimal example:
Let's say I have a tryCatch(expr = {result <- operations}, warning = function(w){f(...)} and if I get a warning in expr (produced in fact in operations) I want to do something with result, so I would do warning = function(w){f(result)}, but then I get Error in ... object 'result' not found.
A minimal example:
y <- "a"
tryCatch(expr = {x <- as.numeric(y)},
warning = function(w) {print(x)})
Error in ... object 'x' not found
I tried using withCallingHandlers instead of tryCatch without success, and also using invokeRestart but it does the expression part, not what I want to do when I get a warning.
Could you help me?
Thank you!

The problem, fundamentally, is that the handler is called before the assignment happens. And even if that weren’t the case, the handler runs in a different scope than the tryCatch expression, so the handler can’t access the names in the other scope.
We need to separate the handling from the value transformation.
For errors (but not warnings), base R provides the function try, which wraps tryCatch to achieve this effect. However, using try is discouraged, because its return type is unsound.1 As mentioned in the answer by ekoam, ‘purrr’ provides soundly typed functional wrappers (e.g. safely) to achieve a similar effect.
However, we can also build our own, which might be a better fit in this situation:
with_warning = function (expr) {
self = environment()
warning = NULL
result = withCallingHandlers(expr, warning = function (w) {
self$warning = w
tryInvokeRestart('muffleWarning')
})
list(result = result, warning = warning)
}
This gives us a wrapper that distinguishes between the result value and a warning. We can now use it to implement your requirement:
fitmodel = with(with_warning(lm(inputs)), {
if (! is.null(warning)) {
if (conditionMessage(warning) satisfies something) {
update(result, something suitable to do on the model)
} else {
update(result, something2 suitable to do on the model)
}
} else {
result
}
})
1 What this means is that try’s return type doesn’t distinguish between an error and a non-error value of type try-error. This is a real situation that can occur, for example, when nesting multiple try calls.

It seems that you are looking for a functional wrapper that captures both the returned value and side effects of a function call. I think purrr::quietly is a perfect candidate for this kind of task. Consider something like this
quietly <- purrr::quietly
foo <- function(x) {
if (x < 3)
warning(x, " is less than 3")
if (x < 4)
warning(x, " is less than 4")
x
}
update_foo <- function(x, y) {
x <- x + y
foo(x)
}
keep_doing <- function(inputs) {
out <- quietly(foo)(inputs)
repeat {
if (length(out$warnings) < 1L)
return(out$result)
cat(paste0(out$warnings, collapse = ", "), "\n")
# This is for you to see the process. You can delete this line.
if (grepl("less than 3", out$warnings[[1L]])) {
out <- quietly(update_foo)(out$result, 1.5)
} else if (grepl("less than 4", out$warnings[[1L]])) {
out <- quietly(update_foo)(out$result, 1)
}
}
}
Output
> keep_doing(1)
1 is less than 3, 1 is less than 4
2.5 is less than 3, 2.5 is less than 4
[1] 4
> keep_doing(3)
3 is less than 4
[1] 4

Are you looking for something like the following? If it is run with y <- "123", the "OK" message will be printed.
y <- "a"
#y <- "123"
x <- tryCatch(as.numeric(y),
warning = function(w) w
)
if(inherits(x, "warning")){
message(x$message)
} else{
message(paste("OK:", x))
}
It's easier to test several argument values with the code above rewritten as a function.
testWarning <- function(x){
out <- tryCatch(as.numeric(x),
warning = function(w) w
)
if(inherits(out, "warning")){
message(out$message)
} else{
message(paste("OK:", out))
}
invisible(out)
}
testWarning("a")
#NAs introduced by coercion
testWarning("123")
#OK: 123

Maybe you could assign x again in the handling condition?
tryCatch(
warning = function(cnd) {
x <- suppressWarnings(as.numeric(y))
print(x)},
expr = {x <- as.numeric(y)}
)
#> [1] NA
Perhaps not the most elegant answer, but solves your toy example.

Don't put the assignment in the tryCatch call, put it outside. For example,
y <- "a"
x <- tryCatch(expr = {as.numeric(y)},
warning = function(w) {y})
This assigns y to x, but you could put anything in the warning body, and the result will be assigned to x.
Your "what I would like" example is more complicated, because you want access to the expr value, but it hasn't been assigned anywhere at the time the warning is generated. I think you'll have to recalculate it:
fitmodel <- tryCatch(expr = {lm(inputs)},
warning = function(w) {if (w$message satisfies something) {
update(lm(inputs), something suitable to do on the model)
} else if (w$message satisfies something2){
update(lm(inputs), something2 suitable to do on the model)
}
}
)
Edited to add:
To allow the evaluation to proceed to completion before processing the warning, you can't use tryCatch. The evaluate package has a function (also called evaluate) that can do this. For example,
y <- "a"
res <- evaluate::evaluate(quote(x <- as.numeric(y)))
for (i in seq_along(res)) {
if (inherits(res[[i]], "warning") &&
conditionMessage(res[[i]]) == gettext("NAs introduced by coercion",
domain = "R"))
x <- y
}
Some notes: the res list will contain lots of different things, including messages, warnings, errors, etc. My code only looks at the warnings. I used conditionMessage to extract the warning message, but
it will be translated to the local language, so you should use gettext to translate the English version of the message for comparison.

eval: why does enclos = parent.frame() make a difference?

In this question, the following throws an error:
subset2 = function(df, condition) {
condition_call = eval(substitute(condition),df )
df[condition_call,]
}
df = data.frame(a = 1:10, b = 2:11)
condition = 3
subset2(df, a < condition)
## Error in eval(substitute(condition), df) : object 'a' not found
Josh and Jason from the original question did a great job explaining why this is. What I don't get is why supplying the enclos argument to eval apparently fixes it.
subset3 = function(df, condition) {
condition_call = eval(substitute(condition), envir = df, enclos = parent.frame())
df[condition_call, ]
}
subset3(df, a < condition)
## a b
## 1 1 2
## 2 2 3
I understand that skipping the function environment means R is no longer trying to evaluate the promise, and instead grabs the condition object from the global environment.
But I think supplying enclos = parent.frame() should not make a difference. From ?eval on the enclos argument:
Specifies the enclosure, i.e., where R looks for objects not found in envir.
But if not provided, it defaults to:
enclos = if(is.list(envir) || is.pairlist(envir)) parent.frame() else baseenv())
which, in my mind, should resolve to parent.frame() anyway, because surely, df satisfies the is.list() check.
This means that as long as some object data returns TRUE on is.list(), the behavior of eval(expr, envir = data) and eval(expr, envir = data, enclos = parent.frame()) should be identical. But as evidenced by the above, it isn't.
What am I missing?
EDIT: Thanks to SmokeyShakers who pointed out the difference between default and user-supplied arguments regarding the time of evaluation. I think this is actually already expressed here: https://stackoverflow.com/a/15505111/2416535
It might make sense to keep this one alive though, as it touches eval() specifically (the other does not), and it is not trivial to realize what the generalized question should be until one has the answer.

So, different parents. In the example that doesn't work parent.frame is looking up from inside eval into the internal environment of subset2. In the working example, parent.frame is looking up from inside subset3, likely to your global where your df sits.
Example:
tester <- function() {
print(parent.frame())
}
tester() # Global
(function() {
tester()
})() # Anonymous function's internal env

IT is getting confused by the double use of condition. Change one of them to cond:
cond <- 3
subset2(df, a < cond)
## a b
## 1 1 2
## 2 2 3
Note that even though this works it won't work if you put subset2 into a function since it will look into the global environment where subset2 was defined rather than looking inside the f execution frame and subset3 will be needed.
if (exists("cond")) rm(cond)
f <- function() {
cond <- 3
subset2(df, a < cond)
}
f() # error

Disable assignment via = in R

R allows for assignment via <- and =.
Whereas there a subtle differences between both assignment operators, there seems to be a broad consensus that <- is the better choice than =, as = is also used as operator mapping values to arguments and thus its use may lead to ambiguous statements. The following exemplifies this:
> system.time(x <- rnorm(10))
user system elapsed
0 0 0
> system.time(x = rnorm(10))
Error in system.time(x = rnorm(10)) : unused argument(s) (x = rnorm(10))
In fact, the Google style code disallows using = for assignment (see comments to this answer for a converse view).
I also almost exclusively use <- as assignment operator. However, the almost in the previous sentence is the reason for this question. When = acts as assignment operator in my code it is always accidental and if it leads to problems these are usually hard to spot.
I would like to know if there is a way to turn off assignment via = and let R throw an error any time = is used for assignment.
Optimally this behavior would only occur for code in the Global Environment, as there may well be code in attached namespaces that uses = for assignment and should not break.
(This question was inspired by a discussion with Jonathan Nelson)

Here's a candidate:
`=` <- function(...) stop("Assignment by = disabled, use <- instead")
# seems to work
a = 1
Error in a = 1 : Assignment by = disabled, use <- instead
# appears not to break named arguments
sum(1:2,na.rm=TRUE)
[1] 3

I'm not sure, but maybe simply overwriting the assignment of = is enough for you. After all, `=` is a name like any other—almost.
> `=` <- function() { }
> a = 3
Error in a = 3 : unused argument(s) (a, 3)
> a <- 3
> data.frame(a = 3)
a
1 3
So any use of = for assignment will result in an error, whereas using it to name arguments remains valid. Its use in functions might go unnoticed unless the line in question actually gets executed.

The lint package (CRAN) has a style check for that, so assuming you have your code in a file, you can run lint against it and it will warn you about those line numbers with = assignments.
Here is a basic example:
temp <- tempfile()
write("foo = function(...) {
good <- 0
bad = 1
sum(..., na.rm = TRUE)
}", file = temp)
library(lint)
lint(file = temp, style = list(styles.assignment.noeq))
# Lint checking: C:\Users\flodel\AppData\Local\Temp\RtmpwF3pZ6\file19ac3b66b81
# Lint: Equal sign assignemnts: found on lines 1, 3
The lint package comes with a few more tests you may find interesting, including:
warns against right assignments
recommends spaces around =
recommends spaces after commas
recommends spaces between infixes (a.k.a. binary operators)
warns against tabs
possibility to warn against a max line width
warns against assignments inside function calls
You can turn on or off any of the pre-defined style checks and you can write your own. However the package is still in its infancy: it comes with a few bugs (https://github.com/halpo/lint) and the documentation is a bit hard to digest. The author is responsive though and slowly making improvements.

If you don't want to break existing code, something like this (printing a warning not an error) might work - you give the warning then assign to the parent.frame using <- (to avoid any recursion)
`=` <- function(...){
.what <- as.list(match.call())
.call <- sprintf('%s <- %s', deparse(.what[[2]]), deparse(.what[[3]]))
mess <- 'Use <- instead of = for assigment '
if(getOption('warn_assign', default = T)) {
stop (mess) } else {
warning(mess)
eval(parse(text =.call), envir = parent.frame())
}
}
If you set options(warn_assign = F), then = will warn and assign. Anything else will throw an error and not assign.
examples in use
# with no option set
z = 1
## Error in z = 1 : Use <- instead of = for assigment
options(warn_assign = T)
z = 1
## Error in z = 1 : Use <- instead of = for assigment
options(warn_assign = F)
z = 1
## Warning message:
## In z = 1 : Use <- instead of = for assigment
Better options
I think formatR or lint and code formatting are better approaches.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

force the evaluation of .SD in data.table - r

Related

R: cannot make use of `data.table`'s environment, base environment and global environment all at once

How to use str_glue in data.table (environments question)

Is there a way to use tryCatch (or similar) in R as a loop, or to manipulate the expr in the warning argument?

eval: why does enclos = parent.frame() make a difference?

Disable assignment via = in R

Categories

Resources