How can I make R's output more verbose so as to reassure me that it hasn't broken yet? - r

I often run code that eats up a lot of RAM, and may take as much as an hour before it gives its outputs. Often, I'll be half an hour in to running such code and I'll be worrying that something gone wrong. Is there any way that I can get R to reassure me that there's not been any errors yet? I suppose that I could put milestones in to the code itself, but I'm wondering if there's anything in R (or RStudio) that can automatically do this job at run time. For example, it would be handy to see how much memory the code is using, because then I'd be reassured that it's still working whenever I see the memory use significantly vary.

You might like my package {boomer}.
If you rig() your function, all its calls will be exploded and printed as the code is executed.
For instance
# remotes::install_github("moodymudskipper/boomer")
fun <- function(x) {
x <- x + 1
Sys.sleep(3)
x + 1
}
library(boomer)
# rig() the function and all the calls will be exploded
# and displayed as they're run
rig(fun)(2)

One way is:
to make a standalone file containing all the stuff to be run,
sourcing it and getting warned when the code is done, possibly with error.
The small function warn_me below:
runs the source file located in "path"
possibly catches an error, if an error there was
plays a sound when the run is over
sends an email reporting the status of the run: OK or fail
optionally: plays a sound until you stop it, so you can't miss it's over
And here it is:
warn_me = function(path, annoying = FALSE){
# The run
info = try(source(path))
# Sound telling it's over
library(beepr)
beep()
# Send an email with status
library(mailR)
msg = if(inherits(info, "try-error")) "The run failed" else "It's over, all went well"
send.mail(from = "me#somewhere.com",
to = "me#somewhere.com",
subject = msg,
body = "All is in the title.",
smtp = list(host.name = "smtp.mailtrap.io", port = 25,
user.name = "********",
passwd = "******", ssl = TRUE),
authenticate = TRUE,
send = TRUE)
if(annoying){
while(TRUE){
beepr::beep()
Sys.sleep(1)
}
}
}
warn_me(path)
I didn't test the package mailR myself, but any email sending package would do. See this excellent page on sending emails in R for alternatives.

If you are running an R script file within RStudio, use the "Source with Echo" selection (Ctrl+Shift+Enter, or via dropdown).

Related

How to call a parallelized script from command prompt?

I'm running into this issue and I for the life of me can't figure out how to solve it.
Quick summary before example:
I have several hundred data sets from which I want create reports on everyday. In order to do this efficiently, I parallelized the process with doParallel. From within RStudio, the process works fine, but when I try to make the process automatic via Task Scheduler on windows, I can't seem to get it to work.
The process within RStudio is:
I call a script that sources all of my other scripts, each individual script has a header section that performs the appropriate package import, so for instance it would look like:
get_files <- function(){
get_files.create_path() -> path
for(file in path){
if(!(file.info(paste0(path, file))[['isdir']])){
source(paste0(path, file))
}
}
}
get_files.create_path <- function(){
return(<path to directory>)
}
#self call
get_files()
This would be simply "Source on saved" and brings in everything I need into the .GlobalEnv.
From there, I could simply type: parallel_report() which calls a script that sources another script that houses the parallelization of the report generations. There was an issue awhile back with simply calling the parallelization directly (I wonder if this is related?) and so I had to make the doParallel script a non-function housing script and thus couldn't be brought in with the get_files script which would start the report generation every time I brought everything in. Thus, I had to include it in its own script and save it elsewhere to be called when necessary. The parallel_report() function would simply be:
parallel_report <- function(){
source(<path to script>)
}
Then the script that is sourced is the real parallelization script, and would look something like:
doParallel::registerDoParallel(cl = (parallel::detectCores() - 1))
foreach(name = report.list$names,
.packages = c('tidyverse', 'knitr', 'lubridate', 'stringr', 'rmarkdown'),
.export = c('generate_report'),
.errorhandling = 'remove') %dopar% {
tryCatch(expr = {
generate_report(name)
}, error = function(e){
error_handler(error = e, caller = paste0("generate report for ", name, " from parallel"), line = 28)
})
}
doParallel::stopImplicitCluster()
The generate_report function is simply an .Rmd and render() caller:
generate_report <- function(<arguments>){
#stuff
generate_report.render(<arguments>)
#stuff
}
generate_report.render <- function(<arguments>){
rmarkdown::render(
paste0(data.information#location, 'report_generator.Rmd'),
params = list(
name = name,
date = date,
thoughts = thoughts,
auto = auto),
output_file = paste0(str_to_upper(stock), '_report_', str_remove_all(date, '-'))
)
}
So to recap, in RStudio I would simply perform the following:
1 - Source save the script to bring everything
2 - type parallel_report
2.a - this calls directly the doParallization of generate_report
2.b - generate_report calls an .Rmd file that houses the required function calling and whatnot to produce the reports
And the process starts and successfully completes without a hitch.
In order to make the situation automatic via the Task Scheduler, I made a script that the Task Scheduler can call, named automatic_caller:
source(<path to the get_files script>) # this brings in all the scripts and data into the global, just
# as if it were being done manually
tryCatch(
expr = {
parallel_report()
}, error = function(e){
error_handler(error = e, caller = "parallel_report from automatic_callng", line = 39)
})
The error_handler function is just an in-house script used to log errors throughout.
So then on the Task Schedule's tasks I have the Rscript.exe called and then the automatic_caller after that. Everything within the automatic_caller function works except for the report generation.
The process completes almost automatically, and the only output I get is an error:
"pandoc version 1.12.3 or higher is required and was not found (see the help page ?rmarkdown::pandoc_available)."
But rmarkdown is within the .export call of the doParallel and it is in the scripts that use it explicitly, and in the actual generate_report it is called directly via rmarkdown::render().
So - I am at a complete loss.
Thoughts and suggestions would be completely appreciated.
So pandoc is apprently an executable that helps convert files from one extension to another. RStudio comes with its own pandoc executable so when running the scripts from RStudio, it knew where to point when pandoc is required.
From the command prompt, the system did not know to look inside of RStudio, so simply downloading pandoc as a standalone executable gives the system the proper pointer.
Downloded pandoc and everything works fine.

Using system() function in R

I'm running a new script on R and i'm trying to call a file .exe using the function system() from r.
I run :
system("C:/Program Files (x86)/OxMetrics6/ox/bin/oxl.exe I:/Code R/GarchOxModelling.ox", show.output.on.console = TRUE, wait = TRUE)
But it seams to do nothing. And when i launch manually the file GarchOxModelling.ox, it works.
Would you have any idea on how to make it work from R ?
Thanks in advance
Without testing, try
ret <- system(paste(shQuote("C:/Program Files (x86)/OxMetrics6/ox/bin/oxl.exe"),
shQuote("I:/Code R/GarchOxModelling.ox")),
show.output.on.console = TRUE, wait = TRUE)
A few issues with the code you provided in your question:
It is syntactically wrong R:
"system(C:/Program Fil...", show.output.on.console = TRUE, wait = TRUE)
is like typing in
"ABC", x=1, y=2)
which should error.
Even assuming that the leading quote is not correct, you need to start the quote at the beginning of the executable name, as in
system("C:/Program File...", ...)
Further, though, is that this is being passed verbatim to the shell. While something windows guesses correctly about embedded spaces, it's really not good practice to assume this can happen all of the time, so you should manually quote all of your arguments that either (a) do include a space in them, or (b) you do not know because they are variables. In this case, I prefer shQuote, but dQuote might be sufficient.
system(paste(shQuote("C:/Program Files (x86)/OxMetrics6/ox/bin/oxl.exe"),
shQuote("I:/Code R/GarchOxModelling.ox")),
show.output.on.console = TRUE, wait = TRUE)
I suggest that you consider using intern=TRUE instead of show.output..., so that you can perhaps programmatically verify the output is what you expect.
Last suggestion, I find the processx package much more reliable for calls like this,
# library(processx)
ret <- processx::run("C:/Program Files (x86)/OxMetrics6/ox/bin/oxl.exe", "I:/Code R/GarchOxModelling.ox")
where quoting is handled automatically.

How do I check if a command used for pipe() failed?

Say, I have written a compressor function of the form:
compress <- function(text, file){
c <- paste0("gzip -c > ",shQuote(file))
p <- pipe(c, open = "wb")
on.exit(close(p))
writeLines(text, p)
}
Now I can compress strings like this:
compress("Hello World", "test.gz")
system("zcat test.gz")
## Hello World
However, how can I check that the programm gzip called by pipe() succeeds?
E.g.
compress("Hello World", "nonexistent-dir/test.gz")
## sh: nonexistent-dir/test.gz: No such file or directory
results in an error message printed on STDERR but I have no way in creating an R error. The program just continues without saving my text.
I know that I can just check for the existence of the target file in this example. But there are many possible errors, like end of disk space, the program was not found, some library is missing, the program was found but the ELF interpreter was not etc. I just can't think of any way to test for all possible errors.
I searched the help page ?pipe, but could find no hint.
As a UNIX-specific approach, I tried to trap the SIGPIPE signal, but could not find a way to do this. Stack Overflow questions regarding this topic remain unanswered as of now [1] [2]
How do I check the exit code or premature termination of a program called with pipe()?
I have not found a solution using pipe(), but the package processx has a solution. I first create the external process as a background process, obtain a connection to it and write to that connection.
Before I start to write, I check if the process is running. After all is written, I can check the exit code of the program.
library(processx)
p <- process$new("pigz","-c",stdin="|", stdout = "/tmp/test.gz")
if(!p$is_alive()) stop("Subprocess ended prematurely")
con <- p$get_input_connection()
# Write output
conn_write(con, "Hello World")
close(con)
p$wait()
if(p$get_exit_status() != 0) stop("Subprocess failed")
system("zcat /tmp/test.gz")

Avoid console message form package function

I'm using a package function (corenv, from seewave) which create a "please wait..." message in console. As I call it iteratively, that message is very annoying. So, I need a way for:
From my code, to temporarily ban the console messages
OR
To access the function code and take off the message line
The following is not my real code, but a very simple one showing the problem
require(seewave)
a = seq(0, (2*pi), by=0.01) #simple, unreal example
for (i in sequence(100)){
x = sin(a*i/3) #simple, unreal example
y = sin(a*i/2) #simple, unreal example
corenv(x,y,10,plot=FALSE)
}
A very simple question, but I haven't found any solution. I'll aprecciate any help
You could use sink to capture the output, e.g.
sink("tmp.txt")
z = corenv(x,y,10,plot=FALSE)
sink()
You can also wrap it in a function, e.g.
## unlink deletes the temporary file
## on.exit ensures the sink is closed even if
## corenv raises an error.
corenv(..., verbose=FALSE) {
if(verbose) {
sink("tmp.txt")
on.exit(sink(); unlink("tmp.txt"))
}
seewave::corenv(...)
}

How do I avoid halting the execution of a standalone r script that encounters an error?

I am running an optimization program I wrote in a multi-language framework. Because I rely on different languages to accomplish the task, everything must be standalone so it can be launched through a batch file. Everything has been going fine for 2-3 months, but I finally ran out of luck when one of the crucial parts of this process, executed through a standalone R script, encountered something new and gave me an error message. This error message makes everything screech to a halt despite my best efforts:
selMEM<-forward.sel(muskfreq, musk.MEM, adjR2thresh=adjR2)
Procedure stopped (adjR2thresh criteria) adjR2cum = 0.000000 with 0 variables (superior to -0.005810)
Error in forward.sel(muskfreq, musk.MEM, adjR2thresh = adjR2) :
No variables selected. Please change your parameters.
I know why I am getting this message: it is warning me that no variables are above the threshold I have programmed to retain during a forward selection. Although this didn't happen in hundreds of runs, it's not that big a deal, I just need to tell R what to do next. This is where I am lost. After an exhaustive search through several posts (such as here), it seams that try() and tryCatch() are the way to go. So I have tried the following:
selMEM<-try(forward.sel(muskfreq, musk.MEM, adjR2thresh=adjR2))
if(inherits(selMEM, "try-error")) {
max<-0
cumR2<-0
adjR2<-0
pvalue<-NA
} else {
max<-dim(selMEM)[1]
cumR2<-selMEM$R2Cum[max]
adjR2<-selMEM$AdjR2Cum[max]
pvalue<-selMEM$pval[max]
}
The code after the problematic line works perfectly if I execute it line by line in R, but when I execute it as a standalone script from the command prompt, I still get the same error message and my whole process screeches to a halt before it executes what follows.
Any suggestions on how to make this work?
Note this in the try help:
try is implemented using tryCatch; for programming, instead of
try(expr, silent = TRUE), something like tryCatch(expr, error =
function(e) e) (or other simple error handler functions) may be more
efficient and flexible.
Look to tryCatch, possibly:
selMEM <- tryCatch({
forward.sel(muskfreq, musk.MEM, adjR2thresh=adjR2)
}, error = function(e) {
message(e)
return(NULL)
})
if(is.null(selMEM)) {
max<-0
cumR2<-0
adjR2<-0
pvalue<-NA
} else {
max<-dim(selMEM)[1]
cumR2<-selMEM$R2Cum[max]
adjR2<-selMEM$AdjR2Cum[max]
pvalue<-selMEM$pval[max]
}
Have you tried setting the silent parameter to true in the Try function?
max<-0
cumR2<-0
adjR2<-0
pvalue<-NA
try({
selMEM <- forward.sel(muskfreq, musk.MEM, adjR2thresh=adjR2)
max<-dim(selMEM)[1]
cumR2<-selMEM$R2Cum[max]
adjR2<-selMEM$AdjR2Cum[max]
pvalue<-selMEM$pval[max]
}, silent=T)

Resources