If I can run code before and after a user runs some code, how can I detect which variables were set or changed using base R? I can do this using identical() for non-environment objects. But is there a base-R solution for environments, including R6 classes?
Here's a solution using identical() which fails for envs/R6:
# Copy of initial vars
this_frame = sys.frame()
start_vars = ls()
start_copy = lapply(start_vars, get, envir = this_frame )
names(start_copy) = start_vars
# (user code here)
# Assess what's new and what's changed
end_vars = ls()
new_vars = end_vars[end_vars %in% start_vars == FALSE]
old_vars = end_vars[end_vars %in% start_vars == TRUE]
changed_vars = old_vars[sapply(old_vars, function(x) identical(get(x, envir = this_frame), start_copy[[x]])) == FALSE]
I'm writing a package that lets users run code in a separate session. I'd like to return only objects that were changed.
This solution detects changes in an environment, sub-environments, and R6-classes.
General approach
run start_state = env_as_list() on sys.frame()which stores everything in a list and recursively converts all environments/R6 and sub-environments/R6 to list.
Let the user manipulate stuff
Run end_state = env_as_list() and use identical() to detect changes between start_state and end_state.
env_as_list = function(env) {
rapply(
object = as.list(env, all.names = TRUE),
f = function(x) {
if ("R6" %in% class(x)) {
# R6 to list without recursion
x = as.list(x, all.names = TRUE)
x$.__enclos_env__$self = NULL
x$.__enclos_env__$super = NULL
env_as_list(x)
} else if (is.environment(x)) {
env_as_list(x)
} else {
stop("Impossible to get here")
}
},
classes = c("environment", "R6"),
how = "replace"
)
}
Demonstration
Let's test it: let's fill globalenv() with a some stuff to begin with:
R6_class = R6::R6Class("Testing", list(a = 1))
my_R6 = R6_class$new()
my_env = new.env()
my_env$sub_env = new.env()
my_env$sub_env$some_value = 2
my_regular = rnorm(5)
Snapshot time!
start_state = env_as_list(sys.frame())
Let the user play:
my_R6$a = 99 # Change R6
new_regular = 3 # new var
my_env$sub_env$some_value = 99 # Change sub-environment
Snapshot again!
end_state = env_as_list(sys.frame())
end_state$start_state = NULL # don't include this
Did nothing change?
> identical(start_state, end_state))
# FALSE
Which variables changed?
> is_same = lapply(names(end_state), function(x) identical(start_state[[x]], end_state[[x]]))
> names(end_state)[is_same == FALSE]
# "my_env" "new_regular" "my_R6"
Bonus
You can also use this to compute the size of an environment, including all R6 and sub-environments. Simply:
object.size(env_as_list(globalenv()))
Related
I want to create a large lookup table of key value pairs, attempting it like this:
# actual use case is length ~5 million
key <- do.call(paste0, Map(stringi::stri_rand_strings, n=2e5, length = 16))
val <- sample.int(750, size = 2e5, replace = T)
make_dict <- function(keys, values){
require(rlang)
e <- new.env(size = length(keys))
l <- list2(!!!setNames(values, keys))
list2env(l, envir = e, hash = T) # problem in here...?
}
d <- make_dict(key, val)
Problem
When make_dict is run it throws Error: protect(): protection stack overflow. Specifically in RStudio when the input is a vector of length is greater than 49991, which seems very similar to this stackoverflow post.
However, when I run accessor functions to grab some of the values, it seems that make_dict ran fine after all, as I can't find any oddities in its result:
`%||%` <- function(x,y) if(is.null(x)) y else x
grab <- function(...){
vector("integer", length(..2)) |>
(\(.){. = Vectorize(\(e, x) e[[x]] %||% NA_integer_, list("x"), T, F)(..1, ..2); .})()
}
out <- vector("integer", length(key))
out <- grab(d, sample(key)) # using sample to scramble the keys
anyNA(out) | !lobstr::obj_size(out) == lobstr::obj_size(val)
[1] FALSE
Running the same code in RGui does not throw the error.
Oddities
The d environment object does not appear in environment pane in RStudio for size > 5e4.
The R console returns swiftly back to > (signaling the function has finished), but is unresponsive until the error is thrown
Error is thrown if manually setting options(expressions = 5e5), or retaining the default value of 5000
When the error is thrown is proportional to the size of the input vector
tryCatch(make_dict(key, val), error = function(e) e) doesn't catch an error
The error also occurs if code is run from package (Packaged version available through remotes::install_github("D-Se/minimal"))
Question
What's going on here? How to troubleshoot such an error?
options(error = traceback) as advised here didn't give any results. Inserting a browser() after list2env in the make_dict function throws an error long after the browser has opened. A traceback() gives the function .rs.describeObject, which is used to generate the summary in the Environment pane, and can be found here.
traceback()
# .rs.describeObject
(function (env, objName, computeSize = TRUE)
{
obj <- get(objName, env)
hasNullPtr <- .Call("rs_hasExternalPointer", obj, TRUE, PACKAGE = "(embedding)")
if (hasNullPtr) {
val <- "<Object with null pointer>"
desc <- "An R object containing a null external pointer"
size <- 0
len <- 0
}
else {
val <- "(unknown)"
desc <- ""
size <- if (computeSize)
object.size(obj)
else 0
len <- length(obj)
}
class <- .rs.getSingleClass(obj)
contents <- list()
contents_deferred <- FALSE
if (is.language(obj) || is.symbol(obj)) {
val <- deparse(obj)
}
else if (!hasNullPtr) {
if (size > 524288) {
len_desc <- if (len > 1)
paste(len, " elements, ", sep = "")
else ""
if (is.data.frame(obj)) {
val <- "NO_VALUE"
desc <- .rs.valueDescription(obj)
}
else {
val <- paste("Large ", class, " (", len_desc,
format(size, units = "auto", standard = "SI"),
")", sep = "")
}
contents_deferred <- TRUE
}
else {
val <- .rs.valueAsString(obj)
desc <- .rs.valueDescription(obj)
if (class == "data.table" || class == "ore.frame" ||
class == "cast_df" || class == "xts" || class ==
"DataFrame" || is.list(obj) || is.data.frame(obj) ||
isS4(obj)) {
if (computeSize) {
contents <- .rs.valueContents(obj)
}
else {
val <- "NO_VALUE"
contents_deferred <- TRUE
}
}
}
}
list(name = .rs.scalar(objName), type = .rs.scalar(class),
clazz = c(class(obj), typeof(obj)), is_data = .rs.scalar(is.data.frame(obj)),
value = .rs.scalar(val), description = .rs.scalar(desc),
size = .rs.scalar(size), length = .rs.scalar(len), contents = contents,
contents_deferred = .rs.scalar(contents_deferred))
})(<environment>, "d", TRUE)
This github issue pointed out by #technocrat talks about a known bug in earlier versions of RStudio of disabling null external pointer checks, and has since been solved by adding an additional preference check in .rs.describeObject() of
.rs.readUiPref("check_null_external_pointers")
To check if code is run from within RStudio, and if that version is lower than that of before a certain version number (here I use the current official release), a check can be included in the function, or in the .OnAttach of a package:
if(!is.na(Sys.getenv("RSTUDIO", unset = NA)) && .rs.api.versionInfo()$version < "2021.9.1.372")){
# warning or action
}
This question already has answers here:
Returning multiple objects in an R function [duplicate]
(6 answers)
Closed 3 years ago.
I want to store different output variables that are calculated inside a function.
I coded a toy example:
f = function(number)
{
xx = NULL
savexx = NULL
savexx10 = NULL
for (i in 1:10) {
x = number*i
xx = c(xx,x)
}
save_phrase = "hello"
savexx = xx
savexx10 = xx*10
save = cbind(savexx,savexx10)
}
store = f(1)
store
But with this code it is returning only the variable save = cbind(savexx,savexx10).
I would like to save all the 4 variables that are created inside this function.
Is it possible doing this without using a dataframe or a list?
It is impossible without a list. List would be better than a data.frame because it can store different types of variables (vector, table, plot ect.) Try to do it like here:
f = function(number)
{
xx = NULL
savexx = NULL
savexx10 = NULL
for (i in 1:10) {
x = number*i
xx = c(xx,x)
}
lista <- list()
lista$save_phrase = "hello"
lista$savexx = xx
lista$savexx10 = xx*10
lista$save = cbind(lista$savexx, lista$savexx10)
lista
}
store = f(1)
# whole list:
store
# elements of a list:
store$save_phrase
store$savexx
store$savexx10
store$save
1) list We can return the desired variables in a list.
f2 = function(number) {
xx = NULL
savexx = NULL
savexx10 = NULL
for (i in 1:10) {
x = number*i
xx = c(xx,x)
}
list(save_phrase = "hello",
savexx = xx,
savexx10 = xx*10,
save = cbind(savexx,savexx10))
}
store = f2(1)
2) mget Another way to do this is to use mget if the returned variables have a pattern to their names as in this case:
f3 = function(number) {
xx = NULL
savexx = NULL
savexx10 = NULL
for (i in 1:10) {
x = number*i
xx = c(xx,x)
}
save_phrase = "hello"
savexx = xx
savexx10 = xx*10
save = cbind(savexx,savexx10)
mget(ls(pattern = "save"))
}
store = f3(1)
3) gsubfn gsubfn has a facility for placing the list components into separate variables. After this is run save_phrase, savexx, savexx10 and save will exists as separate variables.
library(gsubfn)
list[save_phrase, savexx, savexx10, save] <- f2(1)
4) attach Although this is not really recommended you can do this:
attach(f2(1), name = "f2")
This will create an entry on the search list with the variables that were returned so we can just refer to save_phrase, savexx, savexx10 and save. We can see the entry using search() and ls("f2") and we can remove the entry using detach("f2") .
5) assign Another possibility which is not really recommended but does work is to assign the components right into a specific environment. Now save_phrase, savexx, savexx10 and save will all exist in the global environment.
list2env(f2(1), .GlobalEnv)
Similarly this will inject those variables into the current environment. This is the same as the prior line if the current environment is the global environment.
list2env(f2(1), environment())
6) Again, I am not so sure this is a good idea but we could modify f to inject the outputs right into the parent frame. After this is run save_phrase, savexx, savexx10 and save will all exist in the current environment.
f4 = function(number, env = parent.frame()) {
xx = NULL
savexx = NULL
savexx10 = NULL
for (i in 1:10) {
x = number*i
xx = c(xx,x)
}
env$save_phrase = "hello"
env$savexx = xx
env$savexx10 = xx*10
env$save = cbind(savexx,savexx10)
invisible(env)
}
f4(1)
R functions only return a SINGLE object. If you want multiple objects returned they have to be combined into a list or some other type of object.
Some languages like python let us do stuff like this:
a, b = mult_return_func()
But R will only return a single object. R programmers typically use lists to return multiple objects.
If there is no return statement, then R will return the value of the last evaluated expression in the function.
This would explain why it is returning save = cbind(savexx,savexx10).
To return multiple values you will need a list or another object because the R return function can only return a single object.
My suggestion would be to add those values to a list, return the list, and then get the variables from the list.
I hope that helps. If you'd like to read more then I suggest going to https://www.datamentor.io/r-programming/return-function/
I have a nested list in the global environment of a R script.
anno <- list()
anno[['100']] <- list(
name = "PLACE",
color = "#a6cee3",
isDocumentAnnotation = T,
sublist = list()
)
person_sublist <- list()
person_sublist[['200']] <- list(
name = "ACTOR",
color = "#7fc97f",
isDocumentAnnotation = T,
sublist = list()
)
person_sublist[['300']] <- list(
name = "DIRECTOR",
color = "#beaed4",
isDocumentAnnotation = T,
sublist = list()
)
anno[['400']] <- list(
name = "PERSON",
color = "#1f78b4",
isDocumentAnnotation = T,
sublist = person_sublist
)
While running my process I interactively select elements via the id (100,200, ...). In return a want to add, delete or move elements in the list.
For this reason I thought of using a recursive function to navigate through the list:
searchListId <- function(parent_id = NULL, annotation_system = NULL)
{
for(id in names(annotation_system))
{
cat(paste(id,"\n"))
if(id == parent_id)
{
return(annotation_system[[id]]$sublist)
}
else
{
if(length(annotation_system[[id]]$sublist) > 0)
{
el <- searchListId(parent_id, annotation_system[[id]]$sublist)
if(!is.null(el))
return(el)
}
}
}
return(NULL)
}
searchListId('100', anno)
This functions returns the list() found in the sublist element of the matching element in the 'anno'-list. My problem is the global environment of R. If I manipulate something (delete, add, move something within the returned sublist) i need to reset the global variable with <<-. But in the case of a recursive function I only hold the current sublist in the context where the parent_id matches. How could one reference a global nested list in R while navigating though it via an recursive function? Is that even possible in R?
The calls I want to carry out in order to delete, add, or move elements in the list 'anno' are:
deleteListId('100', anno) #Should return the list without the element 100
addListId('400', anno) #Should return the list with a new element nested in '400'
switchListId('400','200', anno) #Should return a list where the elements with the according keys are switched.
The tricky part though is that I don't know how deep the recursive structure is. Normally I would use element references to manipulate them directly but how could a solution for manipulation of nested lists in R look like if I want to use recursion?
If possible, have the recursive function take a list, alter that, and return the new version. The reason I suggest this is because it's idiomatic R. R leans toward being a functional language, and part of that means state-based actions are discouraged. In general, functions should only modify state if that's all they do. For example, scale(x) doesn't affect the value stored in the x variable. But x <- scale(x) does, because the <- function (yes, it's a function) is meant to modify state.
Also, don't worry about memory unless you know it will be a problem based on past experience. Behind the scenes, R is pretty good at preventing needless copying, so trust it to do the right thing. This lets you work with simpler mental models.
A skeleton of how to recursively modify a list, without affecting the original:
anno <- list()
anno[['A1']] <- list(
sublist = list(
A3 = list(sublist = NULL),
A4 = list(sublist = list(A6 = list(sublist = NULL))),
A5 = list(sublist = NULL)
)
)
change_list <- function(x) {
for (i in seq_along(x)) {
value <- x[[i]]
if (is.list(value)) {
x[[i]] <- change_list(value)
} else {
if (is.null(value)) {
x[[i]] <- "this ws null"
}
}
}
x
}
change_list(anno)
# $A1
# $A1$sublist
# $A1$sublist$A3
# $A1$sublist$A3$sublist
# [1] "something new"
#
#
# $A1$sublist$A4
# $A1$sublist$A4$sublist
# $A1$sublist$A4$sublist$A6
# $A1$sublist$A4$sublist$A6$sublist
# [1] "something new"
#
#
#
#
# $A1$sublist$A5
# $A1$sublist$A5$sublist
# [1] "something new"
If you absolutely need to modify an item in the global namespace, use environments instead of lists.
anno_env <- new.env()
anno_env[["A1"]] <- new.env()
anno_env[["A1"]][["sublist"]] <- new.env()
anno_env[["A1"]][["sublist"]][["A3"]] <- NULL
anno_env[["A1"]][["sublist"]][["A4"]] <- NULL
change_environment <- function(environ) {
for (varname in ls(envir = environ)) {
value <- environ[[varname]]
if (is.environment(value)) {
change_environment(value)
} else {
environ[[varname]] <- "something new"
}
}
}
change_environment(anno_env)
anno_env[["A1"]][["sublist"]][["A3"]]
# [1] "something new"
I would like to develop a simple function that would enable me to save graphs of specific characteristics. For example, I'm running some analysis producing a set of histograms:
# Data and Libs
data(mtcars); require(ggplot2)
# Graphs
## A
grph_a <- ggplot(data = mtcars) +
geom_histogram(aes(mpg)) +
ggtitle("MPG")
## B
grph_b <- ggplot(data = mtcars) +
geom_histogram(aes(cyl)) +
ggtitle("CYL")
Instead of writing ggsave command for each of those graphs I would like to do it via function. It makes sense as I will be repeating the same steps for a number of graphs across various similar projects. I would like for the function to do one thing:
For all the graphs that have a specific string in name run ggsave with a set parameters and save them to provide path.
Ideally, I would like for the function call to look like that
ExportGraphs(graphNamePhrase = "grph_", filesPath = "Somewhere/GaphsAndStuff/)
I don't want to be specifying more stuff.
Function
My function looks like that:
ExportGraphs <- function(graphNamePhrase = "grph_",
filesPath, objects = ls()) {
# Check if required packages are available
req_pkgs <- c("ggplot2","grid")
## Check if the package is loaded and load if needed
for (i in 1:length(req_pkgs)) {
pkg <- req_pkgs[i]
if (length(grep(pkg, search())) == 0) {
lapply(pkg, require, character.only = TRUE)
}
}
# Create list of objects
save_grphs <- grep(pattern = graphNamePhrase, x = objects,
ignore.case = TRUE, value = TRUE)
# Create save loop
for (i in 1:length(save_grphs)) {
# Create file path
fle_path <- paste0(filesPath, save_grphs[i], ".png")
# Save file
ggsave(filename = fle_path, plot = save_grphs[i],
width = 7, height = 7, units = 'cm', scale = 2, dpi = 600)
}
}
Problems
Obviously, the code:
save_grphs <- grep(pattern = graphNamePhrase, x = objects,
ignore.case = TRUE, value = TRUE)
won't work as what is passed via the objects = ls() will be a string. My question is how can I get around it. Is there a way to use get on the parent frame from which the function is called? Not the easiest solution but I could search objects via string. Or can I run ls with grep in the function call and pass all matching objects?
Comments Follow-up
mget
I tried the solution with mget:
ExportGraphs <- function(graphNamePhrase = "grph_",
filesPath, objects = ls()) {
# Check if required packages are available
req_pkgs <- c("ggplot2","grid")
## Check if the package is loaded and load if needed
for (i in 1:length(req_pkgs)) {
pkg <- req_pkgs[i]
if (length(grep(pkg, search())) == 0) {
lapply(pkg, require, character.only = TRUE)
}
}
# Create list of objects
save_grphs <- grep(pattern = graphNamePhrase, x = objects,
ignore.case = TRUE, value = TRUE)
save_grphs <- mget(objects[save_grphs])
# Create save loop
for (i in 1:length(save_grphs)) {
# Create file path
fle_path <- paste0(filesPath, save_grphs[i], ".png")
# Save file
ggsave(filename = fle_path, plot = save_grphs[[i]],
width = 7, height = 7, units = 'cm', scale = 2, dpi = 600)
}
}
But it seems that I would have to adjust the loop as subscription appears to be out of bounds:
Error in save_grphs[[i]] : subscript out of bounds
Called from: inherits(plot, "ggplot")
This works for me. There are many places to further optimize the function:
ExportGraphs <- function(graphNamePhrase = "grph_",
filesPath, objects = ls()) {
# Check if required packages are available
req_pkgs <- c("ggplot2","grid")
## Check if the package is loaded and load if needed
for (i in 1:length(req_pkgs)) {
pkg <- req_pkgs[i]
if (length(grep(pkg, search())) == 0) {
lapply(pkg, require, character.only = TRUE)
}
}
# Create list of objects
index <- grep(pattern = graphNamePhrase, x = objects,
ignore.case = TRUE)
save_grphs <- mget(objects[index])
# Create save loop
for (i in 1:length(save_grphs)) {
# Create file path
fle_path <- paste0(filesPath, objects[index][i], ".png")
# Save file
ggsave(filename = fle_path, plot = save_grphs[[i]],
width = 7, height = 7, units = 'cm', scale = 2, dpi = 600)
}
}
Try something like this:
ExportGraphs <- function(graphNamePhrase = "grph_",
filesPath = "Somewhere/GaphsAndStuff",
object = ls()) {
lapply(object[substr(names(object), 1, nchar(graphNamePhrase)) == graphNamePhrase],
function(plot.list.el){
ggsave(plot.list.el, filename = paste(filesPath, paste0(names(plot.list.el),
".pdf"),
sep = "/"))
})
}
# testing the function
dat <- data.frame(x = rnorm(100))
object <- list(grph_asd = ggplot(dat, aes(x = x)) + geom_histogram(),
grp_noplot = ggplot(dat, aes(x)) + geom_histogram())
# save the first, but not the second plot to the working directory
ExportGraphs(filesPath = "~", object = object)
G'Day, I am a newbie at R and I have GOOGLED and read books and had lots of play, but I can't seem to figure out if what I am doing is implemented. It compiles (no interpreter spit) and can be called (again no spit), it just doesn't seem to want to do anything.
OK. SYNOPSIS.
I read that lists in R are the OBJECTS of other languages. So just for a Saturday and Sunday play I have been trying to get this to work.
GLOBAL <- list( counter = 1,
locked = FALSE,
important_value = 42,
copy_of_important_value = 42,
lock = function() { GLOBAL$locked = TRUE },
unlock = function() { GLOBAL$locked = FALSE },
is_locked = function() { return(GLOBAL$locked )},
visit = function() { GLOBAL$counter <- GLOBAL$counter + 1 })
> GLOBAL$locked
[1] FALSE
>
This works...
> GLOBAL$locked <- TRUE
> GLOBAL$locked
[1] TRUE
>
This does not
> GLOBAL$unlock()
> GLOBAL$locked
[1] TRUE
>
Has R got a $this or $self construct? None of this generates any errors. Just doesn't seem to want to do anything! (functions that is). I suppose I could set up a function as a routing access table, but I thought the encapsulation would be nifty.
Second question. It has been mentioned to me several times that R MUST keep all data in memory, and that is a limitation. Does that include swp on *NIX systems? I mean, if you had a humungus matrix could you just add some swap to make it fit?
Sorry for dumb newbie questions
This can be done using proto objects:
library(proto) # home page at http://r-proto.googlecode.com
GLOBAL <- proto( counter = 1,
locked = FALSE,
important_value = 42,
copy_of_important_value = 42,
lock = function(.) { .$locked = TRUE },
unlock = function(.) { .$locked = FALSE },
is_locked = function(.) { return(.$locked )},
visit = function(.) { .$counter <- .$counter + 1 })
GLOBAL$locked <- TRUE
GLOBAL$unlock()
GLOBAL$locked
## FALSE
The S3 way of doing things.
GLOBAL <- list(counter=1, locked=FALSE,
important_value=42, copy_of_important_value=42)
class(GLOBAL) <- "foo"
lock <- function(x, ...) UseMethod("lock")
lock.foo <- function(x)
{
x$locked <- TRUE
x
}
unlock <- function(x, ...) UseMethod("unlock")
unlock.foo <- function(x)
{
x$locked <- FALSE
x
}
is_locked <- function(x) x$locked
visit <- function(x)
{
x$counter <- x$counter + 1
x
}
GLOBAL <- lock(GLOBAL) # locked is now TRUE
GLOBAL <- unlock(GLOBAL) # locked is now FALSE
There's also the enclosure method
getGlobal <- function() {
counter <- 1
locked <- FALSE
important_value <- 42
list(
is_locked = function() locked,
lock = function() locked<<-TRUE,
unlock = function() locked<<-FALSE,
visit = function() {counter <<- counter + 1 }
)
}
And then you would use
GLOBAL <- getGlobal()
GLOBAL$is_locked()
# [1] FALSE
GLOBAL$lock()
GLOBAL$is_locked()
# [1] TRUE
So the state is stored in the enclosure and getGlobal returns a list of functions you can use to access those variables not otherwise exposed.
Nothing happened because
R doesn't have any variables to go get because they way you have it, = does not mean assignment has occurred inside the list(). So the only object in the global environment is GLOBAL. The way you're using = right now is assigning the list names to the left-hand side, and they are subsequently accessed with the $ operator.
Your functions are not returning a value as they're written. GLOBAL$locked() will not return a value if GLOBAL$locked <- FALSE is all you have inside the body of the function that calls it. So I wrap it in parentheses, and is returns our desire values.
So we just need to assign locked to the global environment first, then <<- will reassign it.
I shortened your list a bit. Here's a look:
> GLOBAL <- list(locked = assign("locked", FALSE, parent.frame()),
lock = function() { (GLOBAL$locked <<- TRUE) },
unlock = function() { (GLOBAL$locked <<- 'HELLO') },
is_locked = function() { return(NULL) })
> GLOBAL$locked
[1] FALSE
> GLOBAL$lock()
[1] TRUE
> GLOBAL$unlock()
[1] "HELLO"
> GLOBAL$is_locked()
NULL
Yes, a list has its own environment, separate from the global environment. An example of this is
> l <- list(x = 5, y = 10)
> within(l, {
f <- function(x) 2 * x
})
$x
[1] 5
$y
[1] 10
$f
function (x)
2 * x
<environment: 0xb041278>
but we are currently in the global environment
> environment()
<environment: R_GlobalEnv>
It's funny you should ask this question because I just asked a question about the same thing yesterday. MrFlick provided a very good explanation on that question.