add trace/breakpoint while already in R's browser - r

Edit: for the record, the accepted answer has a significant down-fall in that it re-executes the first n lines of code in the function when re-debugged. This might be okay, but when those lines of code include side-effects (e.g., database updates) and/or long-time calculations, it becomes obvious what is happening. I do not believe R provides the ability to do it "properly" (as some other languages do). Bummer.
Some debuggers allow you to dynamically add breakpoints while in the debugger. Is that functionality possible in R? An example:
quux <- function(..)
{ # line 1
"line 2"
"line 3"
"line 4"
"line 5"
"line 6"
}
trace("quux", tracer = browser, at = 3)
# [1] "quux"
quux()
# Tracing quux() step 3
# Called from: eval(expr, envir, enclos)
# Browse[1]>
# debug: [1] "line 3"
While debugging, I believe I want to jump ahead in the code. Imagine the function has a few hundred lines of code, and I'd prefer to not step through them.
I'd like to be able to do this, and jump from the current line to the next interesting line, but unfortunately it just continues out of the function.
# Browse[2]>
trace("quux", tracer = browser, at = 5)
# [1] "quux"
# Browse[2]>
c
# [1] "line 6"
# # (out of the debugger)
The trace call while in the debugger merely added the breakpoint to the original (global) function, as shown if I immediately call the function again:
quux()
# Tracing quux() step 5
# Called from: eval(expr, envir, enclos)
# Browse[1]>
# debug: [1] "line 5"
I tried setting both at once (at=c(3,5)) while inside the browser, but this just sets those lines for when I exit the debugger and call the function again.
I'm guessing this has to do with the function to which trace is attaching the breakpoint. Looking into trace (and .TraceWithMethods), I think I need to be setting where, but I cannot figure out how to get it to set a new breakpoint/trace on the in-debugging function.
(The larger picture is that I'm troubleshooting a function that is dealing with a kafka-led stream of data. My two options are currently (a) restart the function with the more appropriate trace, but this requires me to purge and restart the data stream as well; or (b) go line-by-line in the debugger, tedious when there are many hundreds of lines of code.)

This may be kind of a solution. First do as in your post:
> quux <- function(..)
+ { # line 1
+ x <- 1 # added for illustration
+ "line 3"
+ "line 4"
+ "line 5"
+ print(x) # added for illustration
+ "line 7"
+ "line 8"
+ }
>
> trace("quux", tracer = browser, at = 4)
[1] "quux"
> quux()
Tracing quux() step 4
Called from: eval(expr, p)
Browse[1]> n
debug: [1] "line 4"
Next, we do as follows in the debugger:
Browse[2]> this_func <- eval(match.call()[[1]]) # find out which funcion is called
Browse[2]> formals(this_func) <- list() # remove arguments
Browse[2]> body(this_func) <- body(this_func)[-(2:4)] # remove lines we have evalutaed
Browse[2]> trace("this_func", tracer = browser,
+ at = 8 - 4 + 1) # at new line - old trace point
Tracing function "this_func" in package "base"
[1] "this_func"
Browse[2]> this_func # print for illustration
function ()
{
"line 5"
print(x)
"line 7"
"line 8"
}
Browse[2]> environment(this_func) <- environment() # change enviroment so x is present
Browse[2]> this_func() # call this_func
[1] 1
[1] "line 8"
The downside is that we end back at "line 5" in the original call to quux after we exit from the call to this_func. Further, we have to keep track of the last at value. We may be able to get this from another function?

Related

Calling two functions with the same name from different sources()

in regards to this post: How can I call two functions with same name from two source files in R?
The example provided shows two functions with the same name coming from two different sources:
in "aa.R"
hi <- function(){
print("hi, aa")
}
in "bb.R"
hi <- function(){
print("hi, bb")
}
Now what I want to do is call function hi from aa.R by refernecing the source. I know when working with packages i can use:
packagename::functionname()
But when working with source(filename.R) it doens't work.
One of the provided answers explains to use two different envoirments, which i would not prefer doing, as i feel like this makes it much more accessible to errors.
Also being said, that it's not very smart to name two functions the same name. While I totally agree, that it would make more sense to use different names for functions actually doing something different, i would still prefer calling functions by reference of source, for readability purpose, as i can instantly see the file i am grabbing the function from, while debugging or coding in general.
One more thing: Whats stopping me from creating a package with the functions i am working with? Is there any reason not to create a package from a source-file contianing functions used in my main script?
Thanks for any advice.
Except for (5) these all use the example in the Note at the end.
1) separate environments source each file into a separate environment and then qualify the name using the appropriate environment when calling the function. this seems very close to library(aa); aa::aa.R in spirit.
source("aa.R", local = aa <- new.env())
source("bb.R", local = bb <- new.env())
aa$hi()
## [1] "hi, aa"
bb$hi()
## [1] "hi, bb"
1a) A variation of this is to put only one of the hi's in a separate environment. That might be useful in the case that that one is less used.
source("aa.R", local = aa <- new.env())
source("bb.R")
aa$hi()
## [1] "hi, aa"
hi()
## [1] "hi, bb"
1b) A variation of this is to attach them to the search list.
source("aa.R", local = attach(NULL, name = "aa"))
source("bb.R", local = attach(NULL, name = "bb"))
as.environment("aa")$hi()
## [1] "hi, aa"
as.environment("bb")$hi()
## [1] "hi, bb"
2) box Konrad Rudolph's box package (on CRAN) can be used for this. Again this is not much different than library(aa); aa::aa.R .
box::use(./aa)
box::use(./bb)
aa$hi()
## [1] "hi, aa"
bb$hi()
## [1] "hi, bb"
3) re-source Another approach is to reread the file each time hi is called.
source("aa.R")
hi()
## [1] "hi, aa"
source("bb.R")
hi()
## [1] "hi, bb"
source("aa.R")
hi()
## [1] "hi, aa"
4) rename Yet another approach is to rename hi each time it is read. This won't work if hi itself is used within the source file but is ok otherwise. (This could also be combined with one of the above solutions.)
source("aa.R")
aa_hi <- hi
rm(hi)
source("bb.R")
bb_hi <- hi
rm(hi)
aa_hi()
## [1] "hi, aa"
bb_hi()
## [1] "hi, bb"
5) S3 In the case that the two instances of hi work on different input classes they could be made to be methods of the same generic. Change the example to this:
cat('hi <- function(x, ...) UseMethod("hi")
hi.numeric <- function(x) {
print(paste("hi, aa -", x))
}', file = "aa.R")
cat('hi <- function(x, ...) UseMethod("hi")
hi.character <- function(x) {
print(paste("hi, bb -", x))
}', file = "bb.R")
source("aa.R")
source("bb.R")
hi(1)
## [1] "hi, aa - 1"
hi("z")
## [1] "hi, bb - z"
6) modules The modules package (on CRAN) can be used.
library(modules)
aa <- use("aa.R")
bb <- use("bb.R")
aa$hi()
## [1] "hi, aa"
bb$hi()
## [1] "hi, bb"
7) package Another possibility is to convert the script to a package. Run the following to convert aa.R to a package, build, install and run it. Similarly for bb.R .
library(pkgKitten)
library(devtools)
setwd("...directory containing aa.R...")
kitten("aa", author = "me") # create empty package
file.copy("aa.R", "aa/R") # add script to it
setwd("aa")
build()
install()
setwd("..")
aa::hi()
## [1] "hi, aa"
Note
Generate the input in reproducible form.
cat('hi <- function() {
print("hi, aa")
}', file = "aa.R")
cat('hi <- function() {
print("hi, bb")
}', file = "bb.R")
Here is an approach based on environments :
my_Fun <- function(path_To_File)
{
source(path_To_File)
new_Env <- new.env()
new_Env$hi <- hi
return(new_Env)
}
env1 <- my_Fun("aa.R")
env2 <- my_Fun("bb.R")
> evalq(hi(), env1)
[1] "hi, aa"
> evalq(hi(), env2)
[1] "hi, bb"

Replacement functions in R that don't take input

This seems very related to several other questions that have been asked (this one for example), but I can't quite figure out how to do exactly what I want. Maybe replacement functions are the wrong tool for the job, which would also be a perfectly acceptable answer. I am much more familiar with Python than R and I can easily think of how I want to do it in Python but I can't quite get my head around how to approach it in R.
The problem: I am trying to modify an object in place within a function, without having to return it, but I don't need to pass in the value that modifies it, because this value is the result of a function call that's already contained in the object.
More specifically, I have a list (technically it's an s3 class, but I don't think that's actually relevant to this issue) that contains some things relating to a process started with processx::process$new() call. For reproducibility, here's a toy shell script you can run, and the code to get my res object:
echo '
echo $1
sleep 1s
echo "naw 1"
sleep 1s
echo "naw 2"
sleep 1s
echo "naw 3"
sleep 1s
echo "naw 4"
sleep 1s
echo "naw 5"
echo "All done."
' > naw.sh
Then my wrapper is something like this:
run_sh <- function(.args, ...) {
p <- processx::process$new("sh", .args, ..., stdout = "|", stderr = "2>&1")
return(list(process = p, orig_args = .args, output = NULL))
}
res <- run_sh(c("naw.sh", "hello"))
And res should look like
$process
PROCESS 'sh', running, pid 19882.
$output
NULL
$orig_args
[1] "naw.sh" "hello"
So, the specific issue here is a bit peculiar to process$new but I think the general principle is relevant. I am trying to collect all the output from this process after it is finished, but you can only call process$new$read_all_output_lines() (or it's sibling functions) once because the first time it will return the result from the buffer and the subsequent times it returns nothing. Also, I am going to call a bunch of these and then come back to "check on them" so I can't just call res$process$read_all_output_lines() right away because then it will wait for the process to finish before the function returns, which is not what I want.
So I'm trying to store the output of that call in res$output and then just keep that and return it on subsequent calls. Soooo... I need to have a function to modify res in place with res$output <- res$process$read_all_output_lines().
Here's what I tried, based on guidance like this, but it didn't work.
get_output <- function(.res) {
# check if process is still alive (as of now, can only get output from finished process)
if (.res$process$is_alive()) {
warning(paste0("Process ", .res$process$get_pid(), " is still running. You cannot read the output until it is finished."))
invisible()
} else {
# if output has not been read from buffer, read it
if (is.null(.res$output)) {
output <- .res$process$read_all_output_lines()
update_output(.res) <- output
}
# return output
return(.res$output)
}
}
`update_output<-` <- function(.res, ..., value) {
.res$output <- value
.res
}
Calling get_output(res) works the first time, but it does not store the output in res$output to be accessed later, so subsequent calls return nothing.
I also tried something like this:
`get_output2<-` <- function(.res, value) {
# check if process is still alive (as of now, can only get output from finished process)
if (.res$process$is_alive()) {
warning(paste0("Process ", .res$process$get_pid(), " is still running. You cannot read the output until it is finished."))
.res
} else {
# if output has not been read from buffer, read it
if (is.null(.res$output)) {
output <- .res$process$read_all_output_lines()
update_output(.res) <- output
}
# return output
print(value)
.res
}
}
Which just throws away the value but this feels silly because you have to call it with the assignment like get_output(res) <- "fake" which I hate.
Obviously I could also just return the modified res object, but I don't like that because then the user has to know to do res <- get_output(res) and if they forget to do that (the first time) then the output is lost to the ether and can never be recovered. Not good.
Any help is much appreciated!
After further information from the OP, it looks as if what is needed is a way to write to the existing variable in the environment that calls the function. This can be done with non-standard evaluation:
check_result <- function(process_list)
{
# Capture the name of the passed object as a string
list_name <- deparse(substitute(process_list))
# Check the object exists in the calling environment
if(!exists(list_name, envir = parent.frame()))
stop("Object '", list_name, "' not found")
# Create a local copy of the passed object in function scope
copy_of_process_list <- get(list_name, envir = parent.frame())
# If the process has completed, write its output to the copy
# and assign the copy to the name of the object in the calling frame
if(length(copy_of_process_list$process$get_exit_status()) > 0)
{
copy_of_process_list$output <- copy_of_process_list$process$read_all_output_lines()
assign(list_name, copy_of_process_list, envir = parent.frame())
}
print(copy_of_process_list)
}
This will update res if the process has completed; otherwise it leaves it alone. In either case it prints out the current contents. If this is client-facing code you will want further type-checking logic on the object passed in.
So I can do
res <- run_sh(c("naw.sh", "hello"))
and check the contents of res I have:
res
#> $`process`
#> PROCESS 'sh', running, pid 1112.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> NULL
and if I immediately run:
check_result(res)
#> $`process`
#> PROCESS 'sh', running, pid 1112.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> NULL
we can see that the process hasn't completed yet. However, if I wait a few seconds and call check_result again, I get:
check_result(res)
#> $`process`
#> PROCESS 'sh', finished.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> [1] "hello" "naw 1" "naw 2" "naw 3" "naw 4" "naw 5"
#> [7] "All done."
and without explicitly writing to res, it has updated via the function:
res
#> $`process`
#> PROCESS 'sh', finished.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> [1] "hello" "naw 1" "naw 2" "naw 3" "naw 4" "naw 5"
#> [7] "All done."
I may be missing something here, but why don't you just write the output after you create the object so that it's there the first time the function returns?
run_sh <- function(.args, ...)
{
p <- processx::process$new("sh", .args, ..., stdout = "|", stderr = "2>&1")
return(list(process = p, orig_args = .args, output = p$read_all_output_lines()))
}
So now if you do
res <- run_sh(c("naw.sh", "hello"))
You get
res
#> $`process`
#> PROCESS 'sh', finished.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> [1] "hello"
#> [2] "naw.sh: line 2: sleep: command not found"
#> [3] "naw 1"
#> [4] "naw.sh: line 4: sleep: command not found"
#> [5] "naw 2"
#> [6] "naw.sh: line 6: sleep: command not found"
#> [7] "naw 3"
#> [8] "naw.sh: line 8: sleep: command not found"
#> [9] "naw 4"
#> [10] "naw.sh: line 10: sleep: command not found"
#> [11] "naw 5"
#> [12] "All done."

Format function output as a customised multiple lines string

I'm trying to make a function which gives output with simple format.
If I already calculated estimated values of beta's, what should I do if I want following result format.
Coefficients
-------------
Constant: 5.2
Beta1: 4
Beta2: 9
Beta3: 2
.
.
.
I tried cat() function but to use cat(), I have to write every line manually like:
cat("Coefficients","\n","-------------","\n","Constant: 5.2","\n","Beta1: 4",....)
Is there any way to make that simple result format?
If you have a vector of 10 results and you want to label them Beta1 to Beta10 you could do:
result = 10:1
b_order = 1:10
paste0("beta", b_order, ": ", result)
This gives:
[1] "beta1: 10" "beta2: 9" "beta3: 8" "beta4: 7" "beta5: 6" "beta6: 5" "beta7: 4" "beta8: 3" "beta9: 2" "beta10: 1"

odd behavior of print within mapply

I am seeing some unexpected behavior (to me anyway) when print() is included as a side effect in a function wrapped in mapply().
For example, this works as expected (and yes I know it's not how we add vectors):
mapply(function(i,j) i+j, i=1:3, j=4:6) # returns [1] 5 7 9
And so does this:
mapply(function(i,j) paste(i, "plus", j, "equals", i+j), i=1:3, j=4:6)
# returns [1] "1 plus 4 equals 5" "2 plus 5 equals 7" "3 plus 6 equals 9"
But this doesn't:
mapply(function(i,j) print(paste(i, "plus", j, "equals", i+j)), i=1:3, j=4:6)
# returns:
# [1] "1 plus 4 equals 5"
# [1] "2 plus 5 equals 7"
# [1] "3 plus 6 equals 9"
# [1] "1 plus 4 equals 5" "2 plus 5 equals 7" "3 plus 6 equals 9"
What's going on here? I haven't used mapply() in a while, so maybe this is a no-brainer... I'm using R version 3.4.0.
print both prints its argument and returns its value.
p <- print("abc")
# [1] "abc"
p
# [2] "abc"
So each element gets printed, then the vector of stuff gets returned (and printed). Try e.g. invisible(mapply(...)) or m <- mapply(...) for comparison.
FWIW cat() returns NULL ...

fuse some information in a vector

Something maybe obvious but I can't seem to see it :
I have a vector like this :
vec<-c("i: 1","n: alpha","a: term1","a: term2", "i: 2","n: beta","a: term3","i: 3","n: gamma","a: term4","a: term5","a: term6")
and I need to get this :
out<-c("i: 1","n: alpha","a: term1;term2", "i: 2","n: beta","a: term3","i: 3","n: gamma","a: term4;term5;term6")
That is, for each unique i:, fuse the a: when there are more than one.
I tried with diff and rle but the resulted code (see below) is too long and I think I'm complicating uselessly the problem...
my code :
out<-vec
a<-which(grepl("^a: ",vec))
diffa<-diff(a)
diffa1<-which(diffa==1)
rle_a<-rle(diffa)$lengths[rle(diffa)$values==1]
indwh<-1
for(ind in 1:length(rle_a)){
allindwh<-indwh:(indwh+rle_a[ind]-1)
out[a[c(diffa1[allindwh],diffa1[allindwh[length(allindwh)]]+1)]]<-paste(out[a[diffa1[allindwh[1]]]],paste(gsub("a: ","",out[a[c(diffa1[allindwh[-1]],diffa1[allindwh[length(allindwh)]]+1)]]),collapse=";"),sep=";")
indwh<-indwh+rle_a[ind]
}
out<-unique(out)
So I get what I want but I would really appreciate any hint to simplify it.
Here's an easier approach with tapply:
# index of 'a's
idx <- grepl("^a", vec)
# find groups
grp <- c(0, cumsum(diff(idx) < 0))
# apply function to vector based on groups
unlist(tapply(vec, grp, FUN = function(x)
c(x[1:2], paste("a:", paste(sub("^a:\\s*", "", x[-(1:2)]), collapse = ";")))),
use.names = FALSE)
# [1] "i: 1" "n: alpha" "a: term1;term2"
# [4] "i: 2" "n: beta" "a: term3"
# [7] "i: 3" "n: gamma" "a: term4;term5;term6"

Resources