Actively logging assignments in R - r

I'm trying to set up a way in R to print details of an each assignment while R code is run. So, for example, if the code x <- 1 is run then x has been assigned 1 will automatically be printed.
Is this possible?
I have two thoughts on how this might be done but can't figure out if either is possible.
redefine the = primitive so that it also prints a message
have an assignment trigger another function to run

one possible solution, but requires editing the code would be
# custom assignment function -----------------------------------------------------------------
`%<-%` <- function (lhs, rhs) {
cl <- match.call()
lhs <- substitute(lhs)
env <- parent.frame()
message("Info: `", lhs, "` defined as `", enquote(cl$rhs)[2], "`")
invisible(eval(assign(x = paste(lhs),
value = rhs,
envir = env))
)
}
# some tests ----------------------------------------------------------------------------------
ad %<-% c(1,2,33)
#> Info: `ad` defined as `c(1, 2, 33)`
ac %<-% 22
#> Info: `ac` defined as `22`
ad %<-% 22
#> Info: `ad` defined as `22`
df <- mtcars
df %<-% mtcars
#> Info: `df` defined as `mtcars`
If you don't want to modify files, you can define a modified source() function to replace the assignments with the newly defined %<-% function.
source_loudly <- function(filePath, ...) {
file_con <- file(filePath, open = "r")
txt <- readLines(file_con)
close(file_con)
txt_mod <- gsub(pattern = "<-", replace = "%<-%", x = txt)
source(textConnection(txt_mod), ...)
}
filePath <- "R/bits/example.R" # point to a local file on your pc
source_loudly(filePath = filePath, echo = T)
Created on 2021-03-19 by the reprex package (v1.0.0)

Here's a getter/setter hack that comes close without costing too much. While it does require you to change existing code, it has the benefit that you can change the initial assignment to list instead of tracer and everything continues to work unchanged.
tracer <- local({
.e <- NULL
function(..., name = "unk") {
.e <<- list(...)
.e$.name <<- name
`class<-`(.e, c("tracer", "environment"))
}
})
`[.tracer` <- `[[.tracer` <- `$.tracer` <- function(x, i) {
cat(sprintf("get: %s\n", deparse(substitute(i))))
NextMethod()
}
`[<-.tracer` <- `[[<-.tracer` <- `$<-.tracer` <- function(x, i, value) {
cat(sprintf("set: %s <- %s\n", deparse(substitute(i)),
substr(paste(deparse(substitute(value)), collapse = " "), 1, 80)))
NextMethod()
}
Notes:
deparse tends to split long lines into a vector of strings; this is mitigated here with paste(..., collapse=" ");
... but long literal values (e.g., frames) can be a bit annoying in the logs, so I arbitrarily chose substr(., 1, 80) as a reasonable size to log.
this hints at one problem I'll expand on below: this doesn't tell you which columns have been modified, just that the object has been updated.
Demonstration with "simple" objects:
quux <- tracer(a=1, b=2:3, d=list(pi, "a"), mt=mtcars[1:2,])
quux$a
# get: "a"
# [1] 1
quux$a <- 11
# set: "a" <- 11
quux$b
# get: "b"
# [1] 2 3
quux$b <- 2:5
# set: "b" <- 2:5
quux$b
# get: "b"
# [1] 2 3 4 5
So far, so good. Now onto the list:
quux$d
# get: "d"
# [[1]]
# [1] 3.141593
# [[2]]
# [1] "a"
quux$d[[1]]
# get: "d"
# [1] 3.141593
quux$d[[1]] <- pi^2
# get: "d"
# set: "d" <- list(9.86960440108936, "a")
The latter needs some explanation, notably about the order of operations. The assignment is really `[[<-`(quux$d, 1, pi^2), which is not traced. This adjusts the first element of the list, and then assigns this new list back to quux$d, where our $<-.tracer sees that full-list reassignment.
That is not completely unreasonable for small objects, but it becomes a little more annoying with larger objects:
quux$mt$cyl
# get: "mt"
# [1] 6 6
quux$mt$cyl <- quux$mt$cyl + 5
# get: "mt"
# get: "mt"
# set: "mt" <- structure(list(mpg = c(21, 21), cyl = c(11, 11), disp = c(160, 160), hp = c(110, 110), drat = c(3.9, 3.9), wt = c(2.62, 2.875 ), qsec = c(16.46, 17.02), vs = c(0, 0), am = c(1, 1), gear = c(4, 4), c
quux$mt$cyl
# get: "mt"
# [1] 11 11
Similarly, for an assignment we see both the first "get" step and then the whole-object-reassignment. (It is cutoff because I used substr(., 1, 80).)
Also, note that in both quux$d and quux$mt, the tracer functions never see the sub-element or column being adjusted. Since R orders the operations as it does, our tracer functions cannot reveal what is going on there (easily).
Now, when you're ready to remove this level of activity-logging, just replace your initial call to tracer(.) with list(.), and all operations continue to work but without logging.
quux <- list(a=1, b=2:3, d=list(pi, "a"), mt=mtcars[1:2,])
quux$a
# [1] 1
quux$a <- 11
quux$b
# [1] 2 3
quux$b <- 2:5
quux$b
# [1] 2 3 4 5
quux$d
# [[1]]
# [1] 3.141593
# [[2]]
# [1] "a"
quux$d[[1]]
# [1] 3.141593
quux$d[[1]] <- pi^2
quux$mt$cyl
# [1] 6 6
quux$mt$cyl <- quux$mt$cyl + 5
quux$mt$cyl
# [1] 11 11

Related

Is it possible to move a variable from the global environment into a separate environment?

Is it possible to move variables that reside in the global environment into a separate environment to declutter the global namespace? I understand how to create variables in a separate environment (with(env, ...)) but is there an efficient way to move them after creation in the global environment. I suppose it would be possible to copy them into a separate environment and then remove them from the global environment, but wanted to know if there was a more efficient manner.
Maybe:
library(purrr)
a <- 111
b <- 'hello'
my_envir <- new.env()
names(.GlobalEnv) %>%
walk(~ assign(.x, get(.x), envir = my_envir))
eapply(my_envir, function(x) x)
#> $my_envir
#> <environment: 0x7fed59e56dc8>
#>
#> $a
#> [1] 111
#>
#> $b
#> [1] "hello"
Or
library(purrr)
a <- 111
b <- 'hello'
my_envir <- new.env()
eapply(.GlobalEnv, function(x) x) %>%
discard(is.environment) %>%
{walk2(., names(.), ~{
assign(.y, .x, envir = my_envir)
exec('rm', .y, envir = .GlobalEnv)}
)}
eapply(my_envir, function(x) x)
#> $a
#> [1] 111
#>
#> $b
#> [1] "hello"
Created on 2021-12-31 by the reprex package (v2.0.1)
Not sure if this is a good idea but you can attach them to the search path. Starting with a fresh vanilla R session try this.
a <- 1
b <- 2
attach(as.list(.GlobalEnv), name = "myenv")
rm(a, b)
ls("myenv")
ls()
a
b
Using rlang
library(rlang)
a <- 111
b <- "hello"
my_envir <- env(!!! as.list(.GlobalEnv))
-checking
> ls(my_envir)
[1] "a" "b"
> my_envir$a
[1] 111
You may use multiple lines in the with.
e1 <- new.env()
e2 <- new.env()
with(e1, {
k <- l <- m <- 0L
x <- 1
fo <- y ~ x
fun <- function(x) x^2
})
The objects are created in e1,
ls(e1)
# [1] "fo" "fun" "k" "l" "m" "x"
e2 stays empty,
ls(e2)
# character(0)
and in .GlobalEnv only the environments exist so far.
ls(.GlobalEnv)
# [1] "e1" "e2"
To work with objects, also use with or $.
with(e1, fun(2))
# [1] 4
e1$fun(2)
# [1] 4

How do you manipulate a list of unknown name in R without using parse?

I am considering keeping the data (vectors, lists, etc.) and code (functions) for my problem in a tree structure (a list of lists of lists of...). I do not want to commit to a name for the root node, nor for the next level of nodes. The lists just below the root node are different versions of each other, and I want to be able to compare them in different ways, and build them in different ways, and give them different, arbitrary names. I am presently using the following to build the overall structure:
foo <- function(ref.txt, val.txt) eval(parse(text=paste0(ref.txt, ' <<- ', val.txt)))
A trivial example might be:
root = list()
foo('root$v1', '42')
foo('root$v2', '43')
root
# $v1
# [1] 42
#
# $v2
# [1] 43
A little less trivial, continuing from the previous example:
v3 <- c(42, 43)
foo('root$v3', 'v3')
root
# $v1
# [1] 42
#
# $v2
# [1] 43
#
# $v3
# [1] 42 43
Again, I can't hard code e.g., root$v3 <- v3, because I won't know the name of the root of the list or the names of the next-level nodes until run time.
I am asking for alternatives in part because of #'Joris Meys' comment in the Stack Overflow article, "Why doesn't assign() values to a list element work in R?," who is apparently quoting Lumley's post, "Re: [R] RE: Using a number as a name to access a list." These suggest avoiding parse. However, If I do not know the names until runtime, and do not even know the depth of the path (see Lumley), how is avoiding parse possible?
How about an additional argument for your root list? No paste trickery, no eval trickery, and no need to use <<-, which you should usually avoid...
foo <- function(lst, ref, val) { lst[[ref]] <- val; return(lst) }
root <- list()
root <- foo(root, "v1", 42)
root <- foo(root, "v2", 43)
root
v3 <- c(42, 43)
root <- foo(root, "v3", v3)
root
Edit based on the comments: Here is a function that assigns values to arbitrary entries of nested lists. The ref argument should be a vector of indices, one for each level:
foo <- function(lst, ref, val) {
lvl <- length(ref)
# extract the list at depth lvl - 1 from lst,
# add val to this list and replace val with it,
# repeat, now descending one level less deep,
# and so on, until reaching the top level
for (i in seq_len(lvl)) {
res <- lst
for (j in seq_len(lvl - i)) res <- res[[ref[j]]]
res[[ref[lvl - i + 1]]] <- val
val <- res
}
return(res)
}
(root <- list(a = list(a = 1, b = list(a = 1, b = 2)),
b = list(a = 1), c = 3))
## $a
## $a$a
## [1] 1
##
## $a$b
## $a$b$a
## [1] 1
##
## $a$b$b
## [1] 2
##
##
##
## $b
## $b$a
## [1] 1
##
##
## $c
## [1] 3
foo(lst = root, ref = c("a", "b", "c"), val = 3)
## $a
## $a$a
## [1] 1
##
## $a$b
## $a$b$a
## [1] 1
##
## $a$b$b
## [1] 2
##
## $a$b$c
## [1] 3
##
##
##
## $b
## $b$a
## [1] 1
##
##
## $c
## [1] 3
And finally, here is a benchmark that compares my function to parse + eval. With three levels of nesting, my function is significantly faster, but that may change with a different list structure:
bar <- function(lst, ref, val) {
eval(parse(text = paste(paste(c("lst", ref), collapse = "$"), "<- val")))
return(lst)
}
library(microbenchmark)
microbenchmark(foo(lst = root, ref = c("a", "b", "c"), val = 3),
bar(lst = root, ref = c("a", "b", "c"), val = 3))
## Unit: microseconds
## expr min lq
## foo(lst = root, ref = c("a", "b", "c"), val = 3) 47.089 48.6700
## bar(lst = root, ref = c("a", "b", "c"), val = 3) 127.401 128.9505
## mean median uq max neval
## 55.98703 50.795 53.0640 191.575 100
## 134.71502 130.325 132.1755 291.400 100

How use match.call in a nested function

I tried to get the list of names and the expression in ... in a function composition. Let's suppose a function:
g <- function(...) {
print(as.list(match.call(expand.dots = FALSE))$...)
}
And if we call:
g(rnorm(5), par = "a", 4 + 4)
We get:
[[1]]
rnorm(5)
$par
[1] "a"
[[3]]
4 + 4
And it's nice: we can get the expression call for every argument and validate. But I need this but in a function composition:
f <- function(...) g(...)
f(rnorm(5), par = "a", 4 + 4)
But I get:
[[1]]
..1
$par
[1] "a"
[[3]]
..3
I'm reading some chapters http://adv-r.had.co.nz/Expressions.html but I can't find the solution yet. I know, I need kepp studying.
Any tips? Thanks in advance.
If you just want the parameters, you don't need the entire call. Just use substitute() to access the ... rather than match.call
g <- function(...) {
print(substitute(...()))
}
f <- function(...) g(...)
f(rnorm(5), par = "a", 4 + 4)
# [[1]]
# rnorm(5)
#
# $par
# [1] "a"
#
# [[3]]
# 4 + 4
There's also Hadley's recommendation of
g <- function(...) {
print( eval(substitute(alist(...))))
}

Strange bracket assignment call ('[<-') with matrix argument

Recently I've stumbled upon this bit of code:
y <- NULL
y[cbind(1:2, 1:2)] <- list( list(1,2), list(2,3))
From the second answer here.
But it doesn't seem to differ from y <- list(...), as the comparisons below show:
> identical(y, list( list(1,2), list(2,3)))
[1] TRUE
> identical(y, y[cbind(1:2, 1:2)])
[1] FALSE
What is going on in the bracket assignment here? Why it doesn't throw an error? And why is it different from the non-assigment version in the last line of code?
Matrix indexing only applies when y has dim. Combine this with standard R recycling and the fact that all matrices are actually vectors, and this behavior makes sense.
When you initialize y to NULL, you ensure it has no dim. Therefore, when you index y by a matrix, say ind, you get the same results as having called y[as.vector(ind)]
identical(y[ind], y[as.vector(ind)])
# [1] TRUE
If there are repeat values in ind and you are also assigning, then for each index, only the last value assigned ot it will remain. For example Lets assume we are executing
y <- NULL; y[cbind(1:2, 2:1)] <- list( list(1,2), list(3,4) )
# y has no dimension, so `y[cbind(1:2, 2:1)]`
# is the equivalent of `y[c(1:2, 2:1)]`
When you assign y[c(1, 2, 2, 1)] <- list("A", "B") , in effect what happens is analogous to:
y[[1]] <- "A"
y[[2]] <- "B"
y[[2]] <- "B" # <~~ 'Overwriting' previous value
y[[1]] <- "A" # <~~ 'Overwriting' previous value
Here is a further look at the indexing that occurs: (Notice how the first two letters are being repeated)
ind <- cbind(1:2, 1:2)
L <- as.list(LETTERS)
L[ind]
# [[1]]
# [1] "A"
#
# [[2]]
# [1] "B"
#
# [[3]]
# [1] "A"
#
# [[4]]
# [1] "B"
Here is the same thing, now with assignment. Notice how only the 3rd and 4th values being assigned have been kept.
L[ind] <- c("FirstWord", "SecondWord", "ThirdWord", "FourthWord")
L[ind]
# [[1]]
# [1] "ThirdWord"
#
# [[2]]
# [1] "FourthWord"
#
# [[3]]
# [1] "ThirdWord"
#
# [[4]]
# [1] "FourthWord"
Try a different index for further clarity:
ind <- cbind(c(3, 2), c(1, 3)) ## will be treated as c(3, 2, 1, 3)
L <- as.list(LETTERS)
L[ind] <- c("FirstWord", "SecondWord", "ThirdWord", "FourthWord")
L[1:5]
# [[1]]
# [1] "ThirdWord"
#
# [[2]]
# [1] "SecondWord"
#
# [[3]]
# [1] "FourthWord"
#
# [[4]]
# [1] "D"
#
# [[5]]
# [1] "E"
L[ind]
# [[1]]
# [1] "FourthWord"
#
# [[2]]
# [1] "SecondWord"
#
# [[3]]
# [1] "ThirdWord"
#
# [[4]]
# [1] "FourthWord"
Edit regarding #agstudy's questions:
Looking at the src for [ we have the following comments:
The special [ subscripting where dim(x) == ncol(subscript matrix)
is handled inside VectorSubset. The subscript matrix is turned
into a subscript vector of the appropriate size and then
VectorSubset continues.
Looking at the function static SEXP VectorSubset(SEXP x, SEXP s, SEXP call) the relevant check is the following:
/* lines omitted */
attrib = getAttrib(x, R_DimSymbol);
/* lines omitted */
if (isMatrix(s) && isArray(x) && ncols(s) == length(attrib)) {
/* lines omitted */
...

Function argument as a part of the output name

Perhaps a silly question, but I can't find any answers to it anywhere (that I've looked :P ). I am trying to create a function with two arguments, these will be vectors (e.g.x=c(a,b,c) and y=c(50,75,100)). I will write a function which calculates all the combinations of these and have the argument used as a part of the output name. E.g.
function(x,y)
df$output_a_50 = a*2+50^2
df$output_a_75 = a*2+75^2
.....
Any suggestions will be appreciated :)
As #Spacedman and others discussed, your problem is that if you pass c(a, b, c) to your function, the names will be lost. The best alternative in my opinion, is to pass a list:
foo <- function(x, y) {
df <- list()
for (xx in names(x)) {
for (yy in y) {
varname <- paste("output", xx, yy, sep = "_")
df[[varname]] <- x[[xx]]*2 + yy^2
}
}
df
}
foo(x = list(a = NA, b = 1, c = 2:3),
y = c(50, 75, 100))
# $output_a_50
# [1] NA
#
# $output_a_75
# [1] NA
#
# $output_a_100
# [1] NA
#
# $output_b_50
# [1] 2502
#
# $output_b_75
# [1] 5627
#
# $output_b_100
# [1] 10002
#
# $output_c_50
# [1] 2504 2506
#
# $output_c_75
# [1] 5629 5631
#
# $output_c_100
# [1] 10004 10006

Resources