Playing with R environments - r

I have strange environment/scoping dynamic that I've been trying to figure out, and looking for the right or recommended method for achieving this.
I've made a toy example of my problem below purely for illustration. (I'm aware this particular problem can be solved much more simply, but it illustrates the dynamic I'm trying to implement).
Current functioning code:
master_function <-
function(x, iter = 100){
x_p1 <- function(){ x <<- x + 1 }
x_m1 <- function(){ x <<- x - 1 }
path <- numeric(iter)
for(i in 1:iter){
next_step <- sample(c('p', 'm'), 1)
if(next_step == 'p'){
x_p1()
} else {
x_m1()
}
path[i] <- x
}
path
}
The issue with this code (especially for an actually difficult problem) is that it makes debugging the x_p1, x_m1 function contents with the RStudio debug utility impossible.
Hoping to restructure the code to look something like:
master_function <-
function(x, iter = 100){
master_env <- environment()
path <- numeric(iter)
for(i in 1:iter){
next_step <- sample(c('p', 'm'), 1)
if(next_step == 'p'){
x_p1(master_env)
} else {
x_m1(master_env)
}
path[i] <- x
}
path
}
x_p1 <- function(env){ assign('x', get('x', envir = env) + 1, envir = env) }
x_m1 <- function(env){ assign('x', get('x', envir = env) - 1, envir = env) }
But this is also quite ugly. Is there a way to augment the search path, for example, such that access to the master_env is cleaner?
Edit: More information as requested by #MrFlick
Essentially I have simulation with a lot of moving pieces. As it progresses, different events (the sub-functions being referenced) are triggered modifying the state of the simulation. These functions currently modify many different state objects for each function call. Since the functions are made within the master function call, I can take advantage of lexical scoping and the <<- operator, but I lose the ability to debug within those functions.
Trying to figure out how to create those functions outside of the master simulation. If I understand correctly, if I make the functions such that they consume the simulation state and return a modified version, it comes at a large memory cost.

1) trace Use trace to insert debug statements after the definitions of x_p1 and x_m1 and then one can step through them when master_function is run.
trace(master_function, at = 4, quote({debug(x_p1); debug(x_m1) }))
untrace(master_function) turns this off. Use body(master_function)[4] to see which line corresponds to 4. See ?trace for more.
2) instrument Another possibility is to instrument your function like this and then call it with master(function(x, DEBUG = TRUE) to turn on debugging.
master_function <-
function(x, iter = 100, DEBUG = FALSE){
x_p1 <- function(){ x <<- x + 1 }
x_m1 <- function(){ x <<- x - 1 }
if (DEBUG) {
debug(x_p1)
debug(x_m1)
}
path <- numeric(iter)
for(i in 1:iter){
next_step <- sample(c('p', 'm'), 1)
if(next_step == 'p'){
x_p1()
} else {
x_m1()
}
path[i] <- x
}
path
}

Why does x need to reside in an alternative environment at all? The following internalizes and avoids the multiple environments entirely.
x_p1 <- function(z){ z + 1 }
x_m1 <- function(z){ z - 1 }
master_function <-
function(x, iter = 100){
new_x <- x
path <- numeric(iter)
for(i in 1:iter){
next_step <- sample(c('p', 'm'), 1)
if(next_step == 'p'){
new_x <- x_p1(new_x)
} else {
new_x <- x_m1(new_x)
}
path[i] <- new_x
}
path
}

Related

Can I access the last computed result before a function 'stop's?

Consider this code:
bad_function <- function() {
# a lot of code
x <- 1
stop("error")
}
tryCatch(bad_function(), error = function(cond) {x})
Obviously, x is not accessible in the error handler. But is there another way to access the value of x without changing bad_function? Alternatively, is there a way to patch bad_function to skip over stop("error") and return x without having to copy all that # a lot of code?
This works if the result you are looking for is named (and the you know the name - here, x):
bad_function <- function() {
# a lot of code
x <- 1
stop("error")
}
.old_stop <- base::stopifnot
.new_stop <- function(...) {
parent.frame()$x
}
assignInNamespace("stop", .new_stop, "base")
bad_function()
assignInNamespace("stop", .old_stop, "base")
I still wonder if there are better solutions.
You could assign the value simultaneously to x in the function environment, as well to another x in an external say debug environment that you defined beforehand.
ev1 <- new.env()
bad_function <- function() {
env <- new.env(parent=baseenv())
# a lot of code
x <- ev1$x <- 1
stop("error")
}
tryCatch(bad_function(), error = function(e) ev1$x)
# [1] 1
The advantage is that .GlobalEnv stays clear (apart from the environment of course).
ls()
# [1] "bad_function" "ev1"

Why is my if statement storing the wrong data types?

Okay, so I have combed the internet for an answer to my problem and I can only put it down to me being a little naive in how R works.
Below is my code for a function that generates public and private keys from the system clock and uses it to attempt to decrypt an encrypted message. This bit works fine, but obviously as it goes through different random generations it comes back with a lot of garbage and NULL data.
I wanted to filter this out by using grep and testing whether the result of that grep was 1, is so, the decoded message would be put into a list.
The problem is that, no matter how I propose the if statement, my list gets cluttered with both the nonsense entries and the NULL entries.
I've tried, !is.null, is.character. test == 1. etc etc but nothing seems to work. Either the list doesn't get populated at all, or it gets populated by every entry that runs through the if statement.
Any advice would be appreciated. Thanks :)
Edit: Okay, forgive me, for these are copy and paste jobs to provide clarity. The first code is the code I'm using to encrypt the message.
require(gmp)
source("convert.R")
p <- nextprime(urand.bigz(size=51, seed=as.bigz(Sys.time())))
q <- nextprime(urand.bigz(size=50))
n <- p*q
finde <- function(phi) {
r <- floor(log(phi, base = 2))
y <- 0 # initialise
while(y != 1) {
e <- urand.bigz(nb = 1, size = r)
y <- gcd.bigz(e, phi)
}
return(e)
}
phi <- (p-1) * (q-1)
e <-finde(phi)
d <- inv.bigz(e, phi)
text1 <- c("I want to eat a baby panda with my bare teeth and hands. Just so I know there's something else in this world suffering more than myself, right now.")
m <- blocks(text1, n) # arguments are text1 (message) and n (public key)
u <- as.bigz((as.bigz(m, n)^e))
dput(u, file="codedmessage.R")
The second is the code contained in the "convert.R" source file:
blocks <- function(txt, n) {
x <- strtoi(charToRaw(txt), 16L)
ll <- length(x)
bl <- floor(log(n, base=256)) # block length (how large the blocks must be)
nb <- floor(ll / bl)
wp <- bl*nb
rem <- ll - wp
s <- as.bigz(vector(mode="numeric", length=0))
u <- 0
while(u < wp) {
total <- as.bigz(0)
for(i in 1:bl) {
total <- 256 * total + x[i+u]
}
u <- u + bl
s <- c(s, total)
}
if(rem > 0) {
total <- as.bigz(0)
for(i in 1:rem) {
total <- 256 * total + x[i + wp]
}
s <- c(s, total)
}
return(s)
}
words <- function(blocknum) {
w <- vector(mode="numeric", length=0)
wl <- blocknum
while(as.bigz(wl) > 0) {
rem <- as.bigz(wl) %% 256
w <- c(rem, w)
wl <- (as.bigz(wl) - as.bigz(rem)) / 256
}
return(w)
}
dectext <- function(listnum) {
len <- length(listnum)
newls <- as.integer(vector(mode="numeric", length=0))
for(i in 1:len) {
temp <- as.integer(words(listnum[i]))
newls <- c(newls, temp)
}
return(rawToChar(as.raw(newls)))
}
And finally the last code is the decrypt and compile list function that I'm having issues with.
finde <- function(phi) {
r <- floor(log(phi, base = 2))
y <- 0 # initialise
while(y != 1) {
e <- urand.bigz(nb = 1, size = r)
y <- gcd.bigz(e, phi)
}
return(e)
}
FindKey <- function(a, y) {
x <<- 1 #initialisation
decodedlist <<- list() #initialisation
while (x<7200) {
print(x)
print(a)
p <- nextprime(urand.bigz(size=51, seed=as.bigz(a)))
q <- nextprime(urand.bigz(size=50))
n <- p*q
phi <- (p-1) * (q-1)
phi
e <-finde(phi)
d <- inv.bigz(e, phi)
recieved<-dget(file=y)
v<-as.bigz(as.bigz(recieved, n)^d)
tryCatch({
decodetext<-dectext(v)
Decrypt<- capture.output(cat(decodetext))
print(Decrypt)
test <- grep("and", Decrypt)
if (!is.null(Decrypt)){
if (is.character(Decrypt)){
decodedlist[[x]] <<- Decrypt
}else{return}}else{return}
}, warning = function(war) {
return()
}, error = function(err){
return()
}, finally = {
x=x+1
a=a-1})
}
}
Sorry it's long.. But I really don't know what to do :(
I found a "sort of" solution to my problem, albeit within a different code I've written.
I'm not very knowledgeable in the reasoning behind why this works but I believe the problem lay in the fact that the list was storing something with a NULL reference (Reps to Acccumulation for the hint ;D) and therefore was not technically NULL itself.
My workaround for this was to avoid using an if statement altogether, instead I found a more efficient method of filtering out NULL list entries in a program I had written for generating large prime numbers.
Extra points for anyone who can figure out what I'm currently studying ;)
#Combine two lists and remove NULL entries therein.
Prime_List2 <<- PrimeList[-which(sapply(PrimeList, is.null))]
Prime_List1 <<- PrimeList[-which(sapply(PrimeList, is.null))]

Assign element to list in parent frame

Assume I have the following function:
f1 <- function()
{
get.var <- function(v)
{
for(n in 1:sys.nframe())
{
varName <- deparse(substitute(v, env = parent.frame(n)))
if(varName != "v")
{
break
}
}
return(list(name = varName, n = n))
}
f2 <- function(v)
{
print(v)
# get original variable name and environment
obj <- get.var(v)
#below doesn't work as expected - this is where q$a and q$b would be updated
assign(obj$name, v + 1, env = parent.frame(obj$n))
}
f3 <- function(v){ f2(v) }
f4 <- function(v){ f3(v) }
q <- list(a = 2, b = 3)
f4(q$a)
f3(q$b)
}
How can I update the value of q$a and q$b from f2? The situation is that a similar routine is called in some of my code to validate a number of arguments in a nested list. If a value is incorrect the list element needs to be updated to reflect the correct value. It's certainly an ugly way to do it but unfortunately I cannot pass the entire list to each and every validation function.
A somewhat similar question was asked here but the user was passing in a list element instead.
Instead of using assign(obj$name, v + 1, env = parent.frame(obj$n)), I replaced this with eval(parse(text = sprintf("%s <- %d", obj$name, v + 1)), envir = parent.frame(obj$n))
It is horrendously ugly, but it works.

Is it possible to see source code of a value of function

I am using a function from a package. this function returns some values. For example:
k<-dtw(v1,v2, keep.internals=TRUE)
and I can get this value:
k$costMatrix
Does it possible to see the source code of costMatrix? if yes how can I do that?
UPDATE
this is the source code of the function:
function (x, y = NULL, dist.method = "Euclidean", step.pattern = symmetric2,
window.type = "none", keep.internals = FALSE, distance.only = FALSE,
open.end = FALSE, open.begin = FALSE, ...)
{
lm <- NULL
if (is.null(y)) {
if (!is.matrix(x))
stop("Single argument requires a global cost matrix")
lm <- x
}
else if (is.character(dist.method)) {
x <- as.matrix(x)
y <- as.matrix(y)
lm <- proxy::dist(x, y, method = dist.method)
}
else if (is.function(dist.method)) {
stop("Unimplemented")
}
else {
stop("dist.method should be a character method supported by proxy::dist()")
}
wfun <- .canonicalizeWindowFunction(window.type)
dir <- step.pattern
norm <- attr(dir, "norm")
if (!is.null(list(...)$partial)) {
warning("Argument `partial' is obsolete. Use `open.end' instead")
open.end <- TRUE
}
n <- nrow(lm)
m <- ncol(lm)
if (open.begin) {
if (is.na(norm) || norm != "N") {
stop("Open-begin requires step patterns with 'N' normalization (e.g. asymmetric, or R-J types (c)). See papers in citation().")
}
lm <- rbind(0, lm)
np <- n + 1
precm <- matrix(NA, nrow = np, ncol = m)
precm[1, ] <- 0
}
else {
precm <- NULL
np <- n
}
gcm <- globalCostMatrix(lm, step.matrix = dir, window.function = wfun,
seed = precm, ...)
gcm$N <- n
gcm$M <- m
gcm$call <- match.call()
gcm$openEnd <- open.end
gcm$openBegin <- open.begin
gcm$windowFunction <- wfun
lastcol <- gcm$costMatrix[np, ]
if (is.na(norm)) {
}
else if (norm == "N+M") {
lastcol <- lastcol/(n + (1:m))
}
else if (norm == "N") {
lastcol <- lastcol/n
}
else if (norm == "M") {
lastcol <- lastcol/(1:m)
}
gcm$jmin <- m
if (open.end) {
if (is.na(norm)) {
stop("Open-end alignments require normalizable step patterns")
}
gcm$jmin <- which.min(lastcol)
}
gcm$distance <- gcm$costMatrix[np, gcm$jmin]
if (is.na(gcm$distance)) {
stop("No warping path exists that is allowed by costraints")
}
if (!is.na(norm)) {
gcm$normalizedDistance <- lastcol[gcm$jmin]
}
else {
gcm$normalizedDistance <- NA
}
if (!distance.only) {
mapping <- backtrack(gcm)
gcm <- c(gcm, mapping)
}
if (open.begin) {
gcm$index1 <- gcm$index1[-1] - 1
gcm$index2 <- gcm$index2[-1]
lm <- lm[-1, ]
gcm$costMatrix <- gcm$costMatrix[-1, ]
gcm$directionMatrix <- gcm$directionMatrix[-1, ]
}
if (!keep.internals) {
gcm$costMatrix <- NULL
gcm$directionMatrix <- NULL
}
else {
gcm$localCostMatrix <- lm
if (!is.null(y)) {
gcm$query <- x
gcm$reference <- y
}
}
class(gcm) <- "dtw"
return(gcm)
}
but if I write globalCostMatrix I dont get the source code of this function
The easiest way to find how functions work is by looking at the source. You have a good chance that by typing function name in the R console, you will get the function definitions (although not always with good layout, so seeking the source where brackets are present, is a viable option).
In your case, you have a function dtw from the same name package. This function uses a function called globalCostMatrix. If you type that name into R, you will get an error that object was not found. This happens because the function was not exported when the package was created, probably because the author thinks this is not something a regular user would use (but not see!) or to prevent clashes with other packages who may use the same function name.
However, for an interested reader, there are at least two ways to access the code in this function. One is by going to CRAN, downloading the source tarballs and finding the function in the R folder of the tar ball. The other one, easier, is by using getAnywhere function. This will give you the definition of the function just like you're used for other, user accessible functions like dtw.
> library(dtw)
> getAnywhere("globalCostMatrix")
A single object matching ‘globalCostMatrix’ was found
It was found in the following places
namespace:dtw
with value
function (lm, step.matrix = symmetric1, window.function = noWindow,
native = TRUE, seed = NULL, ...)
{
if (!is.stepPattern(step.matrix))
stop("step.matrix is no stepMatrix object")
n <- nrow(lm)
... omitted for brevity
I think you want to see what the function dtw() does with your data. I seems that it creates a data.frame containing a column named costMatrix.
To find out how the data in the column costMatrix was generated, just type and execute dtw (without brackets!). R will show you the source of the function dtw() afterwards.

tryCatch - namespace?

I am quite new to R and I am confused by the correct usage of tryCatch. My goal is to make a prediction for a large data set. If the predictions cannot fit into memory, I want to circumvent the problem by splitting my data.
Right now, my code looks roughly as follows:
tryCatch({
large_vector = predict(model, large_data_frame)
}, error = function(e) { # I ran out of memory
for (i in seq(from = 1, to = dim(large_data_frame)[1], by = 1000)) {
small_vector = predict(model, large_data_frame[i:(i+step-1), ])
save(small_vector, tmpfile)
}
rm(large_data_frame) # free memory
large_vector = NULL
for (i in seq(from = 1, to = dim(large_data_frame)[1], by = 1000)) {
load(tmpfile)
unlink(tmpfile)
large_vector = c(large_vector, small_vector)
}
})
The point is that if no error occurs, large_vector is filled with my predictions as expected. If an error occurs, large_vector seems to exist only in the namespace of the error code - which makes sense because I declared it as a function. For the same reason, I get a warning saying that large_data_frame cannot be removed.
Unfortunately, this behavior is not what I want. I would want to assign the variable large_vector from within my error function. I figured that one possibility is to specify the environment and use assign. Thus, I would use the following statements in my error code:
rm(large_data_frame, envir = parent.env(environment()))
[...]
assign('large_vector', large_vector, parent.env(environment()))
However, this solution seems rather dirty to me. I wonder whether there is any possibility to achieve my goal with "clean" code?
[EDIT]
There seems to be some confusion because I put the code above mainly to illustrate the problem, not to give a working example. Here's a minimal example that shows the namespace issue:
# Example 1 : large_vector fits into memory
rm(large_vector)
tryCatch({
large_vector = rep(5, 1000)
}, error = function(e) {
# do stuff to build the vector
large_vector = rep(3, 1000)
})
print(large_vector) # all 5
# Example 2 : pretend large_vector does not fit into memory; solution using parent environment
rm(large_vector)
tryCatch({
stop(); # simulate error
}, error = function(e) {
# do stuff to build the vector
large_vector = rep(3, 1000)
assign('large_vector', large_vector, parent.env(environment()))
})
print(large_vector) # all 3
# Example 3 : pretend large_vector does not fit into memory; namespace issue
rm(large_vector)
tryCatch({
stop(); # simulate error
}, error = function(e) {
# do stuff to build the vector
large_vector = rep(3, 1000)
})
print(large_vector) # does not exist
I would do something like this :
res <- tryCatch({
large_vector = predict(model, large_data_frame)
}, error = function(e) { # I ran out of memory
ll <- lapply(split(data,seq(1,nrow(large_data_frame),1000)),
function(x)
small_vector = predict(model, x))
return(ll)
})
rm(large_data_frame)
if(is.list(ll))
res <- do.call(rbind,res)
The idea is to return a list of predictions results if you run out of the memory.
NOTE, i am not sure of the result here, because we don't have a reproducible example.
EDIT: Let's try again:
You can use finally argument of tryCatch:
step<-1000
n<-dim(large_data_frame)[1]
large_vector <- NULL
tryCatch({
large_vector <- predict(model, large_data_frame)
}, error = function(e) { # ran out of memory
for (i in seq(from = 1, to = n, by = step)) {
small_vector <- predict(model, large_data_frame[i:(i+step-1),]) #predict in pieces
save(small_vector,file=paste0("tmpfile",i)) #same pieces
}
rm(large_data_frame) #free memory
},finally={if(is.null(large_vector)){ #if we run out of memory
large_vector<-numeric(n) #make vector
for (i in seq(from = 1, to = n, by = step)){
#collect pieces
load(paste0("tmpfile",i))
large_vector[i:(i+step-1)] <- small_vector
}
}})
Here's a simplified version to see what is going on:
large_vector<-NULL
rm(y)
tryCatch({
large_vector <- y
}, error = function(e) {# y is not found
print("error")
},finally={if(is.null(large_vector)){
large_vector<-1
}})
> large_vector
[1] 1
EDIT2: Another tip regarding the scope which could be useful for you (although maybe not in this situation as you didn't want to declare large_vector beforehand): The <<- operator, from R-help:
The operators <<- and ->> are normally only used in functions, and
cause a search to made through parent environments for an existing
definition of the variable being assigned...
Therefore you could use above example code like this:
large_vector<-NULL
rm(y)
tryCatch({
large_vector <- y
}, error = function(e) {# y is not found
large_vector <<- 1
print("error")
})
> large_vector
[1] 1
The code below is quite self explanatory. Indeed the problem is that anything inside the error function is not by default applied to the parent environment.
b=0
as explained, this doesn't work:
tryCatch(expr = {stop("error1")}, error=function(e) {b=1})
b
SOLUTION 1: assign to the parent environment
tryCatch(expr = {stop("error2")}, error=function(e) {assign(x = "b", value =
2, envir = parent.env(env = environment()))})
b
SOLUTION 2: the most simple (only works if you are assigning to b in both expr and error)
b = tryCatch(expr = {stop("error3")}, error=function(e) {b=3;return(b)})
b

Resources