Replacement functions in R that don't take input

Replacement functions in R that don't take input - r

This seems very related to several other questions that have been asked (this one for example), but I can't quite figure out how to do exactly what I want. Maybe replacement functions are the wrong tool for the job, which would also be a perfectly acceptable answer. I am much more familiar with Python than R and I can easily think of how I want to do it in Python but I can't quite get my head around how to approach it in R.
The problem: I am trying to modify an object in place within a function, without having to return it, but I don't need to pass in the value that modifies it, because this value is the result of a function call that's already contained in the object.
More specifically, I have a list (technically it's an s3 class, but I don't think that's actually relevant to this issue) that contains some things relating to a process started with processx::process$new() call. For reproducibility, here's a toy shell script you can run, and the code to get my res object:
echo '
echo $1
sleep 1s
echo "naw 1"
sleep 1s
echo "naw 2"
sleep 1s
echo "naw 3"
sleep 1s
echo "naw 4"
sleep 1s
echo "naw 5"
echo "All done."
' > naw.sh
Then my wrapper is something like this:
run_sh <- function(.args, ...) {
p <- processx::process$new("sh", .args, ..., stdout = "|", stderr = "2>&1")
return(list(process = p, orig_args = .args, output = NULL))
}
res <- run_sh(c("naw.sh", "hello"))
And res should look like
$process
PROCESS 'sh', running, pid 19882.
$output
NULL
$orig_args
[1] "naw.sh" "hello"
So, the specific issue here is a bit peculiar to process$new but I think the general principle is relevant. I am trying to collect all the output from this process after it is finished, but you can only call process$new$read_all_output_lines() (or it's sibling functions) once because the first time it will return the result from the buffer and the subsequent times it returns nothing. Also, I am going to call a bunch of these and then come back to "check on them" so I can't just call res$process$read_all_output_lines() right away because then it will wait for the process to finish before the function returns, which is not what I want.
So I'm trying to store the output of that call in res$output and then just keep that and return it on subsequent calls. Soooo... I need to have a function to modify res in place with res$output <- res$process$read_all_output_lines().
Here's what I tried, based on guidance like this, but it didn't work.
get_output <- function(.res) {
# check if process is still alive (as of now, can only get output from finished process)
if (.res$process$is_alive()) {
warning(paste0("Process ", .res$process$get_pid(), " is still running. You cannot read the output until it is finished."))
invisible()
} else {
# if output has not been read from buffer, read it
if (is.null(.res$output)) {
output <- .res$process$read_all_output_lines()
update_output(.res) <- output
}
# return output
return(.res$output)
}
}
`update_output<-` <- function(.res, ..., value) {
.res$output <- value
.res
}
Calling get_output(res) works the first time, but it does not store the output in res$output to be accessed later, so subsequent calls return nothing.
I also tried something like this:
`get_output2<-` <- function(.res, value) {
# check if process is still alive (as of now, can only get output from finished process)
if (.res$process$is_alive()) {
warning(paste0("Process ", .res$process$get_pid(), " is still running. You cannot read the output until it is finished."))
.res
} else {
# if output has not been read from buffer, read it
if (is.null(.res$output)) {
output <- .res$process$read_all_output_lines()
update_output(.res) <- output
}
# return output
print(value)
.res
}
}
Which just throws away the value but this feels silly because you have to call it with the assignment like get_output(res) <- "fake" which I hate.
Obviously I could also just return the modified res object, but I don't like that because then the user has to know to do res <- get_output(res) and if they forget to do that (the first time) then the output is lost to the ether and can never be recovered. Not good.
Any help is much appreciated!

After further information from the OP, it looks as if what is needed is a way to write to the existing variable in the environment that calls the function. This can be done with non-standard evaluation:
check_result <- function(process_list)
{
# Capture the name of the passed object as a string
list_name <- deparse(substitute(process_list))
# Check the object exists in the calling environment
if(!exists(list_name, envir = parent.frame()))
stop("Object '", list_name, "' not found")
# Create a local copy of the passed object in function scope
copy_of_process_list <- get(list_name, envir = parent.frame())
# If the process has completed, write its output to the copy
# and assign the copy to the name of the object in the calling frame
if(length(copy_of_process_list$process$get_exit_status()) > 0)
{
copy_of_process_list$output <- copy_of_process_list$process$read_all_output_lines()
assign(list_name, copy_of_process_list, envir = parent.frame())
}
print(copy_of_process_list)
}
This will update res if the process has completed; otherwise it leaves it alone. In either case it prints out the current contents. If this is client-facing code you will want further type-checking logic on the object passed in.
So I can do
res <- run_sh(c("naw.sh", "hello"))
and check the contents of res I have:
res
#> $`process`
#> PROCESS 'sh', running, pid 1112.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> NULL
and if I immediately run:
check_result(res)
#> $`process`
#> PROCESS 'sh', running, pid 1112.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> NULL
we can see that the process hasn't completed yet. However, if I wait a few seconds and call check_result again, I get:
check_result(res)
#> $`process`
#> PROCESS 'sh', finished.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> [1] "hello" "naw 1" "naw 2" "naw 3" "naw 4" "naw 5"
#> [7] "All done."
and without explicitly writing to res, it has updated via the function:
res
#> $`process`
#> PROCESS 'sh', finished.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> [1] "hello" "naw 1" "naw 2" "naw 3" "naw 4" "naw 5"
#> [7] "All done."

I may be missing something here, but why don't you just write the output after you create the object so that it's there the first time the function returns?
run_sh <- function(.args, ...)
{
p <- processx::process$new("sh", .args, ..., stdout = "|", stderr = "2>&1")
return(list(process = p, orig_args = .args, output = p$read_all_output_lines()))
}
So now if you do
res <- run_sh(c("naw.sh", "hello"))
You get
res
#> $`process`
#> PROCESS 'sh', finished.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> [1] "hello"
#> [2] "naw.sh: line 2: sleep: command not found"
#> [3] "naw 1"
#> [4] "naw.sh: line 4: sleep: command not found"
#> [5] "naw 2"
#> [6] "naw.sh: line 6: sleep: command not found"
#> [7] "naw 3"
#> [8] "naw.sh: line 8: sleep: command not found"
#> [9] "naw 4"
#> [10] "naw.sh: line 10: sleep: command not found"
#> [11] "naw 5"
#> [12] "All done."

Related

Capture `{cli}` output for reporting

I would like to capture the dynamic output produced by {cli} to use in error reporting/logging.
Here's an example:
my_function <- function(val, return_message = TRUE) {
if(val == 'a'){
if (return_message) cli::cli_alert("your value {val} = a!")
} else {
if (return_message) cli::cli_alert("your value {val} is not equal to a!")
}
if (return_message) cli::cli_h2("processing now complete")
return(val)
}
Using my_function, it returns val and also prints a message that is dynamic, based on the input value:
→ your value x is not equal to a!
── processing now complete ──
[1] "x"
Is there any way to capture the dynamic output from {cli} functions, ideally by appending to a list or a similar method?
Ideal output would be something like this:
my_data <- list(val = "x", message = c("your value x is not equal to a!", "processing now complete"))

I don't think cli_X() returns the text, but you could imagine a wrapper that might do what you want. Consider this as a start:
cli_wrapper <- function(str, type="alert", return_str = TRUE, return_message=TRUE, ...){
str <- with(list(...), glue::glue(str))
if(return_message){
cmd <- glue::glue('cli::cli_{type}("{str}")')
eval(parse(text=cmd))
}
if(return_str){
invisible(str)
}
}
In the function above, return_message indicates whether the cli_X() function should be run and the return_str indicates whether the string should be returned (invisibly). You could then rewrite your function with the cli_wrapper() function:
my_function <- function(val, return_message = TRUE) {
message <- NULL
if(val == 'a'){
message <- c(message, cli_wrapper("your value {val} = a!", type="alert", val = val, return_message = return_messasge))
}else{
message <- c(message, cli_wrapper("your value {val} is not equal to a!", type="alert", val = val, return_message = return_messasge))
}
message <- c(message, cli_wrapper("processing now complete", type="h2", return_message = return_messasge))
ret <- list(val=val, message = message)
invisible(ret)
}
Running the function a couple of different ways would give the following output:
my_data <- my_function(val="x", return_message=TRUE)
# → your value x is not equal to a!
#
# ── processing now complete ──
#
my_data
# $val
# [1] "x"
#
# $message
# [1] "your value x is not equal to a!" "processing now complete"
#
my_data <- my_function(val="x", return_message=FALSE)
my_data
# $val
# [1] "x"
#
# $message
# [1] "your value x is not equal to a!" "processing now complete"

{cli} provides a utility function cli_fmt() which gives a much easier interface for this behaviour.
# Function to test `cli_fmt()`
noisy_identity <- function(x) {
cli::cli_h1("Noisily returning {.val {x}}")
cli::cli_bullets(c(
"*" = "Here",
"*" = "are",
"*" = "some",
"*" = "bulllets!"
))
x
}
# This just returns `"hi"` and prints a bunch of text
noisy_identity("hi")
#>
#> ── Noisily returning "hi" ──────────────────────────────────────────────────────
#> • Here
#> • are
#> • some
#> • bulllets!
#> [1] "hi"
# Wrapping in `cli_fmt()` means only the message is returned
cli::cli_fmt(noisy_identity("hi"))
#> [1] ""
#> [2] "── Noisily returning \"hi\" ──────────────────────────────────────────────────────"
#> [3] "• Here"
#> [4] "• are"
#> [5] "• some"
#> [6] "• bulllets!"
# We can use this to define a function that modifies other functions so that
# they return both output value *and* messages produced like so:
capture_cli_messages <- function(fun) {
function(..., .quiet = TRUE) {
output <- list(result = NULL, messages = NULL)
output$messages <- cli::cli_fmt({
output$result <- fun(...)
})
if (!.quiet) cat(output$messages, sep = "\n")
output
}
}
# `capture_cli_messages()` modifies the behaviour of `noisy_identity()`
noisy_identity2 <- capture_cli_messages(noisy_identity)
noisy_identity2("hi")
#> $result
#> [1] "hi"
#>
#> $messages
#> [1] ""
#> [2] "── Noisily returning \"hi\" ──────────────────────────────────────────────────────"
#> [3] "• Here"
#> [4] "• are"
#> [5] "• some"
#> [6] "• bulllets!"
# The .quiet argument can be used if you still want to print the messages out
noisy_identity2("hi", .quiet = FALSE)
#>
#> ── Noisily returning "hi" ──────────────────────────────────────────────────────
#> • Here
#> • are
#> • some
#> • bulllets!
#> $result
#> [1] "hi"
#>
#> $messages
#> [1] ""
#> [2] "── Noisily returning \"hi\" ──────────────────────────────────────────────────────"
#> [3] "• Here"
#> [4] "• are"
#> [5] "• some"
#> [6] "• bulllets!"
Created on 2022-09-21 with reprex v2.0.2

Apply mutate to data frame in R: iterates one extra step and causes error

I wonder why my get_http_status function iterates once more than necessary causing an exception
I have a data frame like:
> str(df5)
'data.frame': 10 obs. of 3 variables:
$ text : chr "\n" "\n" "\n" "\n" ...
$ enlace: chr "//www.blogger.com| __truncated__ ...
$ Freq : int 1 1 1 1 1 1 1 1 1 r code here
I'm trying to get the http status code for each "enlace"
Using this function:
get_http_status <- function(url){
if (!is.null(url)){
Sys.sleep(3)
print(url)
ret <- HEAD(url)
return(ret$status_code)
}
return("")
}
df44 <- mutate(df5, status = get_http_status(enlace))
but keeps trowing the error:
** Error in parse_url(url) : length(url) == 1 is not TRUE**
i can warp the function with try/catch and it works, but i don't know why the error is happening in first place.
get_http_status_2 <- function(url){
tryCatch(
expr = {
Sys.sleep(3)
print(url)
ret <- HEAD(url)
return(ret$status_code)
},
error = function(e){
return("")
}
)
}
The content of the df5$enlace is:
> df5$enlace
[1] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Attribution&widgetId=Attribution1&action=editWidget&sectionId=footer-3"
[2] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=BlogArchive&widgetId=BlogArchive1&action=editWidget&sectionId=sidebar-right-1"
[3] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=BlogSearch&widgetId=BlogSearch1&action=editWidget&sectionId=sidebar-right-1"
[4] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Followers&widgetId=Followers1&action=editWidget&sectionId=sidebar-right-1"
[5] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=PageList&widgetId=PageList1&action=editWidget&sectionId=crosscol"
[6] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Text&widgetId=Text1&action=editWidget&sectionId=sidebar-right-1"
[7] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Text&widgetId=Text2&action=editWidget&sectionId=sidebar-right-1"
[8] "http://5d4a.wordpress.com/2010/08/02/smashing-the-stack-in-2010/"
[9] "http://advancedwindowsdebugging.com/ch06.pdf"
[10] "http://beej.us/guide/
I think it iterate one time more because the result of the function is:
> df44 <- mutate(df5, status = get_http_status(enlace))
[1] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Attribution&widgetId=Attribution1&action=editWidget&sectionId=footer-3"
[2] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=BlogArchive&widgetId=BlogArchive1&action=editWidget&sectionId=sidebar-right-1"
[3] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=BlogSearch&widgetId=BlogSearch1&action=editWidget&sectionId=sidebar-right-1"
[4] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Followers&widgetId=Followers1&action=editWidget&sectionId=sidebar-right-1"
[5] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=PageList&widgetId=PageList1&action=editWidget&sectionId=crosscol"
[6] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Text&widgetId=Text1&action=editWidget&sectionId=sidebar-right-1"
[7] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Text&widgetId=Text2&action=editWidget&sectionId=sidebar-right-1"
[8] "http://5d4a.wordpress.com/2010/08/02/smashing-the-stack-in-2010/"
[9] "http://advancedwindowsdebugging.com/ch06.pdf"
[10] "http://beej.us/guide/bgc/"
Error in parse_url(url) : length(url) == 1 is not TRUE

Since your function contains a function that is not vectored, use the apply family of higher order function to iterate over your vector.
Below, get_http_status will be called on each element of df$enlace.
For each call a length one character vector is expected as the return, character(1):
vapply(df5$enlace, get_http_status, character(1))

add trace/breakpoint while already in R's browser

Edit: for the record, the accepted answer has a significant down-fall in that it re-executes the first n lines of code in the function when re-debugged. This might be okay, but when those lines of code include side-effects (e.g., database updates) and/or long-time calculations, it becomes obvious what is happening. I do not believe R provides the ability to do it "properly" (as some other languages do). Bummer.
Some debuggers allow you to dynamically add breakpoints while in the debugger. Is that functionality possible in R? An example:
quux <- function(..)
{ # line 1
"line 2"
"line 3"
"line 4"
"line 5"
"line 6"
}
trace("quux", tracer = browser, at = 3)
# [1] "quux"
quux()
# Tracing quux() step 3
# Called from: eval(expr, envir, enclos)
# Browse[1]>
# debug: [1] "line 3"
While debugging, I believe I want to jump ahead in the code. Imagine the function has a few hundred lines of code, and I'd prefer to not step through them.
I'd like to be able to do this, and jump from the current line to the next interesting line, but unfortunately it just continues out of the function.
# Browse[2]>
trace("quux", tracer = browser, at = 5)
# [1] "quux"
# Browse[2]>
c
# [1] "line 6"
# # (out of the debugger)
The trace call while in the debugger merely added the breakpoint to the original (global) function, as shown if I immediately call the function again:
quux()
# Tracing quux() step 5
# Called from: eval(expr, envir, enclos)
# Browse[1]>
# debug: [1] "line 5"
I tried setting both at once (at=c(3,5)) while inside the browser, but this just sets those lines for when I exit the debugger and call the function again.
I'm guessing this has to do with the function to which trace is attaching the breakpoint. Looking into trace (and .TraceWithMethods), I think I need to be setting where, but I cannot figure out how to get it to set a new breakpoint/trace on the in-debugging function.
(The larger picture is that I'm troubleshooting a function that is dealing with a kafka-led stream of data. My two options are currently (a) restart the function with the more appropriate trace, but this requires me to purge and restart the data stream as well; or (b) go line-by-line in the debugger, tedious when there are many hundreds of lines of code.)

This may be kind of a solution. First do as in your post:
> quux <- function(..)
+ { # line 1
+ x <- 1 # added for illustration
+ "line 3"
+ "line 4"
+ "line 5"
+ print(x) # added for illustration
+ "line 7"
+ "line 8"
+ }
>
> trace("quux", tracer = browser, at = 4)
[1] "quux"
> quux()
Tracing quux() step 4
Called from: eval(expr, p)
Browse[1]> n
debug: [1] "line 4"
Next, we do as follows in the debugger:
Browse[2]> this_func <- eval(match.call()[[1]]) # find out which funcion is called
Browse[2]> formals(this_func) <- list() # remove arguments
Browse[2]> body(this_func) <- body(this_func)[-(2:4)] # remove lines we have evalutaed
Browse[2]> trace("this_func", tracer = browser,
+ at = 8 - 4 + 1) # at new line - old trace point
Tracing function "this_func" in package "base"
[1] "this_func"
Browse[2]> this_func # print for illustration
function ()
{
"line 5"
print(x)
"line 7"
"line 8"
}
Browse[2]> environment(this_func) <- environment() # change enviroment so x is present
Browse[2]> this_func() # call this_func
[1] 1
[1] "line 8"
The downside is that we end back at "line 5" in the original call to quux after we exit from the call to this_func. Further, we have to keep track of the last at value. We may be able to get this from another function?

Strange addTaskCallback work in RStudio

This is my next question from cycle of "strange" questions.
I found same difference in code execution in R console and RStudio and couldn't understand reason of it. It's also connected with incorrect work of "track" package in RStudio and R.NET as I'd written before in Incorrect work of track package in R.NET
So, let's look at example from https://search.r-project.org/library/base/html/taskCallback.html
(I corrected it a little for correct data output for sum in RStudio)
times <- function(total = 3, str = "Task a") {
ctr <- 0
function(expr, value, ok, visible) {
ctr <<- ctr + 1
cat(str, ctr, "\n")
if(ctr == total) {
cat("handler removing itself\n")
}
return(ctr < total)
}
}
# add the callback that will work for
# 4 top-level tasks and then remove itself.
n <- addTaskCallback(times(4))
# now remove it, assuming it is still first in the list.
removeTaskCallback(n)
## Not run:
# There is no point in running this
# as
addTaskCallback(times(4))
print(sum(1:10))
print(sum(1:10))
print(sum(1:10))
print(sum(1:10))
print(sum(1:10))
## End(Not run)
An output in R console:
>
> # add the callback that will work for
> # 4 top-level tasks and then remove itself.
> n <- addTaskCallback(times(4))
Task a 1
>
> # now remove it, assuming it is still first in the list.
> removeTaskCallback(n)
[1] TRUE
>
> ## Not run:
> # There is no point in running this
> # as
> addTaskCallback(times(4))
1
1
Task a 1
>
> print(sum(1:10))
[1] 55
Task a 2
> print(sum(1:10))
[1] 55
Task a 3
> print(sum(1:10))
[1] 55
Task a 4
handler removing itself
> print(sum(1:10))
[1] 55
> print(sum(1:10))
[1] 55
>
> ## End(Not run)
>
Okay, let's run this in RStudio.
Output:
> source('~/callbackTst.R')
[1] 55
[1] 55
[1] 55
[1] 55
[1] 55
Task a 1
>
Second run give us this:
> source('~/callbackTst.R')
[1] 55
[1] 55
[1] 55
[1] 55
[1] 55
Task a 2
Task a 1
>
Third:
> source('~/callbackTst.R')
[1] 55
[1] 55
[1] 55
[1] 55
[1] 55
Task a 3
Task a 2
Task a 1
>
and so on.
There is a strange difference between RStudio and R console and I don't know why. Could anyone help me? Is is bug or it's normal and I have curved hands?
Thank you.
P.S. This post connected with correct working of "track" package, because "track.start" method consist this part of code:
assign(".trackingSummaryChanged", FALSE, envir = trackingEnv)
assign(".trackingPid", Sys.getpid(), envir = trackingEnv)
if (!is.element("track.auto.monitor", getTaskCallbackNames()))
addTaskCallback(track.auto.monitor, name = "track.auto.monitor")
return(invisible(NULL))
which, I think, doesn't work correct in RStudio and R.NET
P.P.S. I use R 3.2.2 x64, RStudio 0.99.489 and Windows 10 Pro x64. On RRO this problem also exists under R.NET and RStudio

addTaskCallback() will add a callback that's executed when R execution returns to the top level. When you're executing code line-by-line, each statement executed will return control to the top level, and callbacks will execute.
When executed within source(), control isn't returned until the call to source() returns, and so the callback is only run once.

Download.file() incongruent with Manual Download

I am working with data from: Environment Canada
I am using download.file() to acquire this data. When I use:
download.file(url="http://dd.weather.gc.ca/model_gem_global/25km/grib2/lat_lon/00/000/CMC_glb_VGRD_ISBL_1000_latlon.24x.24_2015091100_P000.grib2",destfile = "Local_Grib.grib2")
GribInfo(grib.file = "Local_File.grib2",file.type = "grib2")
It yields:
$inventory
[1] "" "*** FATAL ERROR: rd_grib2_msg, missing end section ('7777') ***"
[3] ""
attr(,"status")
[1] 8
$grid
[1] "" "*** FATAL ERROR: rd_grib2_msg, missing end section ('7777') ***"
[3] ""
attr(,"status")
[1] 8
Warning messages:
1: running command 'wgrib2 Local_File.grib2 -inv -' had status 8
2: running command 'wgrib2 Local_File.grib2 -grid' had status 8
Whilst a manual download followed by:
GribInfo(grib.file = "CMC_glb_TMP_ISBL_985_latlon.24x.24_2015091100_P000.grib2",file.type = "grib2")
Yields:
$inventory
[1] "1:0:d=2015091100:TMP:985 mb:anl:"
$grid
[1] "1:0:grid_template=0:winds(N/S):" "\tlat-lon grid:(1500 x 751) units 1e-06 input WE:SN output WE:SN res 48"
[3] "\tlat -90.000000 to 90.000000 by 0.240000" "\tlon 180.000000 to 179.760000 by 0.240000 #points=1126500"
I have attempted using the Curl and Wget methods within download.file() however they fail giving a non exit error. I am able to obtain these files using a wget batch file however, I would prefer my entire system be run within R for consistency and ease of use.

As per #Martin Morgan. Downloading as a binary will circumvent this issue. Thanks again Martin.
download.file(url="http://dd.weather.gc.ca/model_gem_global/25km/grib2/lat_lon/00/000/CMC_glb_VGRD_ISBL_1000_latlon.24x.24_2015091100_P000.grib2",destfile = "Local_Grib.grib2", mode="wb")
GribInfo(grib.file = "Local_File.grib2",file.type = "grib2")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Replacement functions in R that don't take input - r

Related

Capture `{cli}` output for reporting

Apply mutate to data frame in R: iterates one extra step and causes error

add trace/breakpoint while already in R's browser

Strange addTaskCallback work in RStudio

Download.file() incongruent with Manual Download

Categories

Resources