Automatically always time functions in R - r

I hacked together an Emacs function in order to send
tOne <- proc.time()[3]
before my "send-to-R" key, followed by
tTwo <- proc.time()[3]
afterwards, then printing the difference. The printing gets quite messy though.
Is there a better way in R to automatically time everything send to R? (such as in F# #time "on")
EDIT: Currently, it sends some extra newlines since the inferior buffer needs the strings to be sent:
> > a<-ocorrelate.parallel(replicate(2, rnorm(100000)), 0.5)
>
+ user 0.072 sys 0.020 elapsed 14.925
> > a<-ocorrelate.parallel(replicate(2, rnorm(100000)), 0.5)
>
+ user 0.088 sys 0.032 elapsed 16.868
> >
Function:
(defun ess-timed-cc (vis)
(interactive "P")
(process-send-string "R" "tone <- proc.time()[1:3];")
(ess-eval-region-or-function-or-paragraph-and-step vis)
(process-send-string "R" "ttwo <- proc.time()[1:3]; cat(paste(c(\"\",
format(ttwo-tone)), c(\"user\", \"sys\", \"elapsed\", \"\n\")));")
(other-window 1)
(inferior-ess-send-input)
(inferior-ess-send-input)
(goto-char (point-max))
(other-window -1)
)

You can turn on profiling in R and it will tell you the relative amount of time spent in each function, that may be what you want. See ?Rprof for details.
You could also use addTaskCallback to add a callback to show you the time since the last expression finished, though this time would include any idle time and the time to type the expression, not just the run time. If you have all the commands already in a file and just send them to the command line then this should work reasonably well.
There may also be some hooks that you could set that would start and stop the timing, but not all functions have hooks.
For the emacs solution you could use that to wrap the call in system.time instead of calling proc.time twice and subtracting.
You could also use the trace function to insert the 2 calls to proc.time at the beginning and end of each function that you wanted to time. This would require a vector of the names of the functions that you wanted to time, but ls could help with that.

Related

Debugging within a namespace

I frequently need to debug a function in one of my R packages, and find it convenient to add test code and print statements throughout the code. Then the function is a method inside a package, running tests from the console will use the package-stored old version of the function and not the new test version. I often resort to something like cat *.r > /tmp/package.r and then source('/tmp/package.r') to override all functions, which allows the test function to be prioritized. But this doesn't work when I have .Fortran or similar calls within the package.
Is there an elegant way to debug with function overrides within the correct version of a local package?
Regardless of your IDE, you can reload your package under development with devtools:
devtools::load_all("path/to/your/package/directory")
This should load it to your R session (RStudio has buttons and keyboard shortcuts for this too)
This is an extension of my comment above. As said in the comment checkout this guide for more detail.
To inspect calls dynamically during calls, editing functions and adding code can be done using the trace(func, edit = TRUE) method. However it is not the recommended methodology for doing so in any programming language. It is however recommended to instead perform debugging. Luckily debugging in R is simpler than many other languages. Some of the most noticeable methods for debugging in R is obtained using
debug(func) or debugonce(func)
trace
browser
I'll ignore trace's main usage, and only pass over it briefly in conjunction with browser.
Take this small example code
x <- 3
f <- function(x){
x <- x ** 2
x
}
Now if we wanted to "go into" this function and inspect what happens we can use the debug method simply by doing
debug(f) # alt: debugonce(f)
f(x)
and the following shows in the console
debugging in: f(x)
debug at #1: {
x <- x^2
x
}
We see 2 things: The original call in line 1, the function debugged including a function-line-number (#1) with the body in the function (potentially truncated). In addition the command has changed to Browse[n] (where n is a number), indicating we are in the debugger.
At this point nothing has run so in the "environment" tab in Rstudio we can see x | x. Note that x is still a promise (its value is delayed until used or changed). If we execute x we get the value [1] 3 and we see the environment change to x | 3 (it has been forced, and is no longer a promise).
Inside the debugger we can use the following commands to "walk through" the function
n to move to the "next line"
s to move to the "next line", or if the current call is a function, "step" into the function (debug this function)
f to move forward until the next break (with the added utility that if you are in a loop you stop at "loop end".
c to move until next break point or end of function (without breaking at end of loops).
Q to exit immediately
It you click n for example you will see
debug at #2: x <- x^2
printed in the console. This indicates the line that is executed next. Notice the value of x in the environment and run n again, notice the value changed from x | 3 to x | 9 and
debug at #3: x
is printed. This being the last line pressing n again will exit the function and print
exiting from: f(x)
[1] 9
Once you're done debugging you can run undebug(f) to remove the breakpoint and stop the debugger from activating.
This is a simple function, easy to debug, but the idea for more complex functions are the same. If you are in a long loop you can use f to skip to the end of the loop, similar to pressing n a bunch of times. Note that if you hit an error at any point it will exit automatically when the error occurs and you'll have to walk back to the point again or alternatively use browser.
In the case where you have a function like
f2 <- function(x){
x <- x + 2
f(x)
}
you can further step into the nested function call f(x) by using the s command while the line is printing
debug at #3: f(x)
or by using debug(f2) and debug(f) in conjunction. Both will give the same result (try it out!).
Now in many cases you might hit a bug or debug many lines (potentially thousands). In this case, you might have some "idea" where you want to start, and this might not be the start of the function. In this case you can use browser(). This basically sets a breakpoint. Whenever browser() is hit, it will stop and start the debugger (similar to debug(f) and calling f(x) but at a specific point). Try for example
f3 <- function(x){
x1 <- f(x)
browser()
x2 <- f2(x)
c(x1, x2)
}
f3(x)
and you'll notice see
Called from: f3(x)
printed (if you have run undebug(f2) and undebug(f) first).
Lets say it is not your function but a function within a namespace, well then we can even add the breakpoint ourself at run-time. Try for example calling
trace(f3, edit = TRUE)
and you will see an editing window pop up. Simply add browser() at the desired spot and click save. This edits the function within the namespace. It will be reverted once R is closed or alternatively you can remove it with another call to trace(f3, edit = TRUE).

Manually interrupt a loop in R and continue below

I have a loop in R that does very time-consuming calculations. I can set a max-iterations variable to make sure it doesn't continue forever (e.g. if it is not converging), and gracefully return meaningful output.
But sometimes the iterations could be stopped way before max-iterations is reached. Anyone who has an idea about how to give the user the opportunity to interrupt a loop - without having to wait for user input after each iteration? Preferably something that works in RStudio on all platforms.
I cannot find a function that listens for keystrokes or similar user input without halting until something is done by the user. Another solution would be to listen for a global variable change. But I don't see how I could change such a variable value when a script is running.
The best idea I can come up with is to make another script that creates a file that the first script checks for the existence of, and then breaks if it is there. But that is indeed an ugly hack.
Inspired by Edo's reply, here is an example of what I want to do:
test.it<-function(t) {
a <- 0
for(i in 1:10){
a <- a + 1
Sys.sleep(t)
}
print(a)
}
test.it(1)
As you see, when I interrupt by hitting the read button in RStudio, I break out of the whole function, not just the loop.
Also inspired by Edo's response I discovered the withRestarts function, but I don't think it catches interrupts.
I tried to create a loop as you described it.
a <- 0
for(i in 1:10){
a <- a + 1
Sys.sleep(1)
if(i == 5) break
}
print(a)
If you let it go till the end, a will be equal to 5, because of the break.
If you stop it manually by clicking on the STOP SIGN on the Rstudio Console, you get a lower number.
So it actually works as you would like.
If you want a better answer, you should post a reproducible example of your code.
EDIT
Based on the edit you posted... Try with this.
It's a trycatch solution that returns the last available a value
test_it <- function(t) {
a <- 0
tryCatch(
for(i in 1:10){
a <- a + 1
message("I'm at ", i)
Sys.sleep(t)
if(i==5) break
},
interrupt = function(e){a}
)
a
}
test_it(1)
If you stop it by clicking the Stop Sign, it returns the last value a is equal to.

Good practice on how to store the result of a function for later use in R

I have the situation where I have written an R function, ComplexResult, that computes a computationally expensive result that two other separate functions will later use, LaterFuncA and LaterFuncB.
I want to store the result of ComplexResult somewhere so that both LaterFuncA and LaterFuncB can use it, and it does not need to be recalculated. The result of ComplexResult is a large matrix that only needs to be calculated once, then re-used later on.
R is my first foray into the world of functional programming, so interested to understand what it considered good practice. My first line of thinking is as follows:
# run ComplexResult and get the result
cmplx.res <- ComplexResult(arg1, arg2)
# store the result in the global environment.
# NB this would not be run from a function
assign("CachedComplexResult", cmplx.res, envir = .GlobalEnv)
Is this at all the right thing to do? The only other approach I can think of is having a large "wrapper" function, e.g.:
MyWrapperFunction <- function(arg1, arg2) {
cmplx.res <- ComplexResult(arg1, arg2)
res.a <- LaterFuncA(cmplx.res)
res.b <- LaterFuncB(cmplx.res)
# do more stuff here ...
}
Thoughts? Am I heading at all in the right direction with either of the above? Or is an there Option C which is more cunning? :)
The general answer is you should Serialize/deSerialize your big object for further use. The R way to do this is using saveRDS/readRDS:
## save a single object to file
saveRDS(cmplx.res, "cmplx.res.rds")
## restore it under a different name
cmplx2.res <- readRDS("cmplx.res.rds")
This assign to GlobalEnv:
CachedComplexResult <- ComplexResult(arg1, arg2)
To store I would use:
write.table(CachedComplexResult, file = "complex_res.txt")
And then to use it directly:
LaterFuncA(read.table("complex_res.txt"))
Your approach works for saving to local memory; other answers have explained saving to global memory or a file. Here are some thoughts on why you would do one or the other.
Save to file: this is slowest, so only do it if your process is volatile and you expect it to crash hard and you need to pick up the pieces where it left off, OR if you just need to save the state once in a while where speed/performance is not a concern.
Save to global: if you need access from multiple spots in a large R program.

Detecting keystrokes in Julia

I have a piece of code in Julia in which a solver iterates many, many times as it seeks a solution to a very complex problem. At present, I have to provide a number of iterations for the code to do, set low enough that I don't have to wait hours for the code to halt in order to save the current state, but high enough that I don't have to keep activating the code every 5 minutes.
Is there a way, with the current state of Julia (0.2), to detect a keystroke instructing the code to either end without saving (in case of problems) or end with saving? I require a method such that the code will continue unimpeded unless such a keystroke event has happened, and that will interrupt on any iteration.
Essentially, I'm looking for a command that will read in a keystroke if a keystroke has occurred (while the terminal that Julia is running in has focus), and run certain code if the keystroke was a specific key. Is this possible?
Note: I'm running julia via xfce4-terminal on Xubuntu, in case that affects the required command.
You can you an asynchronous task to read from STDIN, blocking until something is available to read. In your main computation task, when you are ready to check for input, you can call yield() to lend a few cycles to the read task, and check a global to see if anything was read. For example:
input = ""
#async while true
global input = readavailable(STDIN)
end
for i = 1:10^6 # some long-running computation
if isempty(input)
yield()
else
println("GOT INPUT: ", input)
global input = ""
end
# do some other work here
end
Note that, since this is cooperative multithreading, there are no race conditions.
You may be able to achieve this by sending an interrupt (Ctrl+C). This should work from the REPL without any changes to your code – if you want to implement saving you'll have to handle the resulting InterruptException and prompt the user.
I had some trouble with the answer from steven-g-johnson, and ended up using a Channel to communicate between tasks:
function kbtest()
# allow 'q' pressed on the keyboard to break the loop
quitChannel = Channel(10)
#async while true
kb_input = readline(stdin)
if contains(lowercase(kb_input), "q")
put!(quitChannel, 1)
break
end
end
start_time = time()
while (time() - start_time) < 10
if isready(quitChannel)
break
end
println("in loop # $(time() - start_time)")
sleep(1)
end
println("out of loop # $(time() - start_time)")
end
This requires pressing and then , which works well for my needs.

Understanding the behaviour of system.time()

I think I must be misunderstanding something about R's system.time() function. If I have the following code in a test.r:
for(i in 1:10)
{
print(system.time(testFunction()))
}
(where testFunction() is defined elsewhere, but contains some fairly computationally-intensive code), and run the code, but kill the job after the 1st loop, then receive the following output:
> source("test.r")
user system elapsed
280.388 2.622 288.155
Timing stopped at: 210.891 0.367 211.637
why is the value for 'Timing Stopped' less than the elapsed time for the function?
The timing restarts during the second loop, and since you killed it part way through, it will be less than what you timed for the full first loop.

Resources