Understanding the behaviour of system.time() - r

I think I must be misunderstanding something about R's system.time() function. If I have the following code in a test.r:
for(i in 1:10)
{
print(system.time(testFunction()))
}
(where testFunction() is defined elsewhere, but contains some fairly computationally-intensive code), and run the code, but kill the job after the 1st loop, then receive the following output:
> source("test.r")
user system elapsed
280.388 2.622 288.155
Timing stopped at: 210.891 0.367 211.637
why is the value for 'Timing Stopped' less than the elapsed time for the function?

The timing restarts during the second loop, and since you killed it part way through, it will be less than what you timed for the full first loop.

Related

How can I test an infinite loop bug in R

I have a bug in my R code that causes an infinite loop. I'd like to write a test that checks I have fixed this bug.
foo <- function () {
while (TRUE) sleep(1) # oops!
}
# I want something like:
expect_completes_within(foo(), seconds = 10)
Is there any existing solution? Is there a way to interrupt execution and throw an error after a given time?
R.utils::withTimeout(foo(), timeout = 10) # Stop foo() after 10 seconds.
Solution has already been found here, where also code in base R has been provided.

Manually interrupt a loop in R and continue below

I have a loop in R that does very time-consuming calculations. I can set a max-iterations variable to make sure it doesn't continue forever (e.g. if it is not converging), and gracefully return meaningful output.
But sometimes the iterations could be stopped way before max-iterations is reached. Anyone who has an idea about how to give the user the opportunity to interrupt a loop - without having to wait for user input after each iteration? Preferably something that works in RStudio on all platforms.
I cannot find a function that listens for keystrokes or similar user input without halting until something is done by the user. Another solution would be to listen for a global variable change. But I don't see how I could change such a variable value when a script is running.
The best idea I can come up with is to make another script that creates a file that the first script checks for the existence of, and then breaks if it is there. But that is indeed an ugly hack.
Inspired by Edo's reply, here is an example of what I want to do:
test.it<-function(t) {
a <- 0
for(i in 1:10){
a <- a + 1
Sys.sleep(t)
}
print(a)
}
test.it(1)
As you see, when I interrupt by hitting the read button in RStudio, I break out of the whole function, not just the loop.
Also inspired by Edo's response I discovered the withRestarts function, but I don't think it catches interrupts.
I tried to create a loop as you described it.
a <- 0
for(i in 1:10){
a <- a + 1
Sys.sleep(1)
if(i == 5) break
}
print(a)
If you let it go till the end, a will be equal to 5, because of the break.
If you stop it manually by clicking on the STOP SIGN on the Rstudio Console, you get a lower number.
So it actually works as you would like.
If you want a better answer, you should post a reproducible example of your code.
EDIT
Based on the edit you posted... Try with this.
It's a trycatch solution that returns the last available a value
test_it <- function(t) {
a <- 0
tryCatch(
for(i in 1:10){
a <- a + 1
message("I'm at ", i)
Sys.sleep(t)
if(i==5) break
},
interrupt = function(e){a}
)
a
}
test_it(1)
If you stop it by clicking the Stop Sign, it returns the last value a is equal to.

stop a running mcparallel job prematurely

I have three tasks:
is disk I/O bound
is network I/O bound
is CPU bound on a remote machine
The result of 3 will tell me whether the answer I want will come from task 1 or task 2. Since each task requires separate resources, I'd like to start all three tasks with mcparallel, then wait on the result from the third task and determine whether to terminate task 1 or task 2. However, I can not determine how to prematurely cancel an mcparallel task from within R. Is it safe to just kill the PID of the forked process from a call to system()? If not, is there a better way to cancel the unneeded computation?
I don't think the parallel package supports an official way to kill a process started via mcparallel, but my guess is that it's safe to do, and you can use the pskill function from the tools package to do it. Here's an example:
library(parallel)
library(tools)
fun1 <- function() {Sys.sleep(20); 1}
fun2 <- function() {Sys.sleep(20); 2}
fun3 <- function() {Sys.sleep(5); sample(2, 1)}
f1 <- mcparallel(fun1())
f2 <- mcparallel(fun2())
f3 <- mcparallel(fun3())
r <- mccollect(f3)
if (r[[1]] == 1) {
cat('killing fun1...\n')
pskill(f1$pid)
print(mccollect(f1))
r <- mccollect(f2)
} else {
cat('killing fun2...\n')
pskill(f2$pid)
print(mccollect(f2))
r <- mccollect(f1)
}
print(r)
It's usually dangerous to randomly kill threads within a multi-threaded application because they might be holding a shared lock of some kind, but these of course are processes, and the master process seems to handle the situation just fine.
current version of parallel::mccollect() supports wait argument.
Simply pass FALSE to quit any running jobs prematurely.
> mccollect(wait = FALSE)

Automatically always time functions in R

I hacked together an Emacs function in order to send
tOne <- proc.time()[3]
before my "send-to-R" key, followed by
tTwo <- proc.time()[3]
afterwards, then printing the difference. The printing gets quite messy though.
Is there a better way in R to automatically time everything send to R? (such as in F# #time "on")
EDIT: Currently, it sends some extra newlines since the inferior buffer needs the strings to be sent:
> > a<-ocorrelate.parallel(replicate(2, rnorm(100000)), 0.5)
>
+ user 0.072 sys 0.020 elapsed 14.925
> > a<-ocorrelate.parallel(replicate(2, rnorm(100000)), 0.5)
>
+ user 0.088 sys 0.032 elapsed 16.868
> >
Function:
(defun ess-timed-cc (vis)
(interactive "P")
(process-send-string "R" "tone <- proc.time()[1:3];")
(ess-eval-region-or-function-or-paragraph-and-step vis)
(process-send-string "R" "ttwo <- proc.time()[1:3]; cat(paste(c(\"\",
format(ttwo-tone)), c(\"user\", \"sys\", \"elapsed\", \"\n\")));")
(other-window 1)
(inferior-ess-send-input)
(inferior-ess-send-input)
(goto-char (point-max))
(other-window -1)
)
You can turn on profiling in R and it will tell you the relative amount of time spent in each function, that may be what you want. See ?Rprof for details.
You could also use addTaskCallback to add a callback to show you the time since the last expression finished, though this time would include any idle time and the time to type the expression, not just the run time. If you have all the commands already in a file and just send them to the command line then this should work reasonably well.
There may also be some hooks that you could set that would start and stop the timing, but not all functions have hooks.
For the emacs solution you could use that to wrap the call in system.time instead of calling proc.time twice and subtracting.
You could also use the trace function to insert the 2 calls to proc.time at the beginning and end of each function that you wanted to time. This would require a vector of the names of the functions that you wanted to time, but ls could help with that.

R Script - How to Continue Code Execution on Error

I have written an R script which includes a loop that retrieves external (web) data. The format of the data are most of the time the same, however sometimes the format changes in an unpredictable way and my loop is crashing (stops running).
Is there a way to continue code execution regardless the error? I am looking for something similar to "On error Resume Next" from VBA.
Thank you in advance.
Use try or tryCatch.
for(i in something)
{
res <- try(expression_to_get_data)
if(inherits(res, "try-error"))
{
#error handling code, maybe just skip this iteration using
next
}
#rest of iteration for case of no error
}
The modern way to do this uses purrr::possibly.
First, write a function that gets your data, get_data().
Then modify the function to return a default value in the case of an error.
get_data2 <- possibly(get_data, otherwise = NA)
Now call the modified function in the loop.
for(i in something) {
res <- get_data2(i)
}
You can use try:
# a has not been defined
for(i in 1:3)
{
if(i==2) try(print(a),silent=TRUE)
else print(i)
}
How about these solutions on this related question :
Is there a way to `source()` and continue after an error?
Either parse(file = "script.R") followed by a loop'd try(eval()) on each expression in the result.
Or the evaluate package.
If all you need to do is a small piece of clean up, then on.exit() may be the simplest option. It will execute the expression "when the current function exits (either naturally or as the result of an error)" (documentation here).
For example, the following will delete my_large_dataframe regardless of whether output_to_save gets created.
on.exit(rm("my_large_dataframe"))
my_large_dataframe = function_that_does_not_error()
output_to_save = function_that_does_error(my_large_dataframe)

Resources