How to be alerted about the ongoing progress of a loop/lapply - r

In R, I will sometimes have a long for loop or lapply that I want to know the ongoing progress of.
Something like the following is in the spirit of what I want but doesn't work:
lapply(1:n,function(i) { print(i); MAIN COMPUTATIONS })
Ideally the above would print i at the beginning of each new iteration of the lapply.
QUESTION: How do I get ongoing progress updates of how many iterations my lapply or for loop has done?

It sounds like you're using RGui on Windows. There should be an option in one of the menus to tell it to not buffer the output. Alternatively you could call flush.console after every time you print.
lapply(1:1000, function(i){print(i); flush.console()})
Note that this will slow down the code a little bit.

A solution using plyr
l_ply(1:10,function(x) x+1,.progress='text')
or you can define your progress using progress_text
l_ply(1:10000,function(x) x+1,.progress= progress_text(char = '*'))
|*********************************************************************| 100%
or with option print , to get the result of each iteration
l_ply(1:4,function(x) x+1,.progress= progress_text(char = '+'),.print=TRUE)
| | 0%[1] 2
|++++++ | 25%[1] 3
|+++++++++++++++ | 50%[1] 4
|++++++++++++++++++++++ | 75%[1] 5
|++++++++++++++++++++++++++++++++ | 100%[1]

You might also want to look at the functions like winProgressBar, tkProgressBar, or txtProgressBar. The windows and tk versions are nice in that they can show you your progress, but don't clutter your output.

Related

Applying a System Call for ImageJ over a List in R

I am working with a large number of image files within several subdirectories of one parent folder.
I am attempting to run an ImageJ macro to batch-process the images (specifically, I am trying to stitch together a series of images taken on the microscope into single images). Unfortunately, I don't think I can't run this as an ImageJ Macro because the images were taken with varying grid sizes, ie some are 2x3, some are 3x3, some are 3x2, etc.
I've written an R script that is able to evaluate the image folders and determine the grid size, now I am trying to feed that information to my ImageJ macro to batch process the folder.
The issue I am running into seems like it should be easy to solve, but I haven't had any luck figuring it out: in R, I have a data.frame that I need to pass to the system command line-by-line with the columns concatenated into a single character string delimited by *'s.
Here's an example from the data.frame I have in R:
X xcoord ycoord input
1 4_10249_XY01_Fused_CH2 2 3 /XY01
2 4_10249_XY02_Fused_CH2 2 2 /XY02
3 4_10249_XY03_Fused_CH2 3 3 /XY03
4 4_10249_XY04_Fused_CH2 2 2 /XY04
5 4_10249_XY05_Fused_CH2 2 2 /XY05
6 4_10249_XY06_Fused_CH2 2 3 /XY06
Here's what each row needs to be transformed into so that ImageJ can understand it:
4_10249_XY01_Fused_CH2*2*3*/XY01
4_10249_XY02_Fused_CH2*2*2*/XY02
4_10249_XY03_Fused_CH2*3*3*/XY03
4_10249_XY04_Fused_CH2*2*2*/XY04
4_10249_XY05_Fused_CH2*2*2*/XY05
4_10249_XY06_Fused_CH2*2*3*/XY06
I tried achieving this with a for loop inside of a function that I thought would pass each row into the system command, but the macro only runs for the first line, none of the others.
macro <- function(i) {
for (row in 1:nrow(i)) {
df<-paste(i$X, i$xcoord, i$ycoord, i$input, sep='*')
}
system2('/Applications/Fiji.app/Contents/MacOS/ImageJ-macosx', args=c('-batch "/Users/All Stitched CH2.ijm"', df))
}
macro(table)
I think this is because the for loop is not maintaining the list-form of the data.frame. How do I concatenate the table by row and maintain the list-structure? I don't know if I'm asking the right question, but hopefully I'm close enough that someone here understands what I'm trying to do.
I appreciate any help or tips you can provide!
Turns out taking a break helps a lot!
I came back to this after lunch and came up with an easy solution (duh!)- I thought I would post it in case anyone comes along later with a similar issue.
I used stringr to combine my datatable by columns, then put them back into list form using as.list. Finally, for feeding the list into my macro, I edited the macro to only contain the system command and then used lapply to apply the macro to my list of inputs. Here is what my code looks like in the end:
library(stringr)
tablecombined<- str_c(table$X, table$xcoord, table$ycoord, table$input, sep = "*")
listylist<-as.list(tablecombined)
macro <- function(i) {
system2('/Applications/Fiji.app/Contents/MacOS/ImageJ-macosx', args=c('-batch "/Users/All Stitched CH2.ijm"', i))
}
runme<- lapply(listylist, macro)
Note: I am using the system2 command because it can take arguments, which is necessary for me to be able to feed it a series of images to iterate over. I started with the solution posted here: How can I call/execute an imageJ macro with R?
but needed additional flexibility for my specific situation. Hopefully someone may find this useful in the future when running ImageJ Macros from R!

Non-blocking cell execution in Jupyter

In Jupyter with an ipython kernel, is there a canonical way to execute cells in a non-blocking fashion?
Ideally I'd like to be able to run a cell
%%background
time.sleep(10)
print("hello")
such that I can start editing and running the next cells and in 10 seconds see "hello" appear in the output of the original cell.
I have tried two approaches, but haven't been happy with either.
(1) Create a thread by hand:
def foo():
time.sleep(10)
print("hello")
threading.Thread(target=foo).start()
The problem with this is that "hello" is printed in whatever cell is active in 10 seconds, not necessarily in the cell where the thread was started.
(2) Use a ipywidget.Output widget.
def foo(out):
time.sleep(10)
out.append_stdout("hello")
out = ipywidgets.Output()
display(out)
threading.Thread(target=foo,args=(out,)).start()
This works, but there are problems when I want to update the output (think of monitoring something like memory consumption):
def foo(out):
while True:
time.sleep(1)
out.clear_output()
out.append_stdout(str(datetime.datetime.now()))
out = ipywidgets.Output()
display(out)
threading.Thread(target=foo,args=(out,)).start()
The output now constantly switches between 0 and 1 lines in size, which results in flickering of the entire notebook.
This should be solvable wait=True in the call to clear_output. Alas, for me it results in the output never showing anything.
I could have asked about that issue, which seems to be a bug, specifically, but I wondered whether there is maybe another solution that doesn't require me doing all of this by hand.
I've experienced some issues like this with plotting to an output, it looks like you have followed the examples in the ipywidgets documentation on async output widgets.
The other approach I have found sometimes helpful (particularly if you know the size of the desired output) is to fix the height of your output widget when you create it.
out = ipywidgets.Output(layout=ipywidgets.Layout(height='25px'))

Is there a way to let the console in RStudio produce time stamps? [duplicate]

I wonder if there is a way to display the current time in the R command line, like in MS DOS, we can use
Prompt $T $P$G
to include the time clock in every prompt line.
Something like
options(prompt=paste(format(Sys.time(), "%H:%M:%S"),"> "))
will do it, but then it is fixed at the time it was set. I'm not sure how to make it update automatically.
Chase points the right way as options("prompt"=...) can be used for this. But his solutions adds a constant time expression which is not what we want.
The documentation for the function taskCallbackManager has the rest:
R> h <- taskCallbackManager()
R> h$add(function(expr, value, ok, visible) {
+ options("prompt"=format(Sys.time(), "%H:%M:%S> "));
+ return(TRUE) },
+ name = "simpleHandler")
[1] "simpleHandler"
07:25:42> a <- 2
07:25:48>
We register a callback that gets evaluated after each command completes. That does the trick. More fancy documentation is in this document from the R developer site.
None of the other methods, which are based on callbacks, will update the prompt unless a top-level command is executed. So, pressing return in the console will not create a change. Such is the nature of R's standard callback handling.
If you install the tcltk2 package, you can set up a task scheduler that changes the option() as follows:
library(tcltk2)
tclTaskSchedule(1000, {options(prompt=paste(Sys.time(),"> "))}, id = "ticktock", redo = TRUE)
Voila, something like the MS DOS prompt.
NB: Inspiration came from this answer.
Note 1: The wait time (1000 in this case) refers to the # of milliseconds, not seconds. You might adjust it downward when sub-second resolution is somehow useful.
Here is an alternative callback solution:
updatePrompt <- function(...) {options(prompt=paste(Sys.time(),"> ")); return(TRUE)}
addTaskCallback(updatePrompt)
This works the same as Dirk's method, but the syntax is a bit simpler to me.
You can change the default character that is displayed through the options() command. You may want to try something like this:
options(prompt = paste(Sys.time(), ">"))
Check out the help page for ?options for a full list of things you can set. It is a very useful thing to know about!
Assuming this is something you want to do for every R session, consider moving that to your .Rprofile. Several other good nuggets of programming happiness can be found hither on that topic.
I don't know of a native R function for doing this, but I know R has interfaces with other languages that do have system time commands. Maybe this is an option?
Thierry mentioned system.time() and there is also proc.time() depending on what you need it for, although neither of these give you the current time.

Printing repetetively on the same line in R

I was just wondering what is the best way in R to keep on printing on the same line in a loop, to avoid swamping your console? Let's say to print a value indicating your progress, as in
for (i in 1:10) {print(i)}
Edit:
I tried inserting carriage returns before each value as in
for (i in 1:10000) {cat("\r",i)}
but that also doesn't quite work as it will just update the value on the screen after the loop, just returning 10000 in this case.... Any thoughts?
NB this is not to make a progress bar, as I know there are various features for that, but just to be able to print some info during the progression of some loop without swamping the console
You have the answer, it's just looping too quickly for you to see. Try:
for (i in 1:10) {Sys.sleep(1); cat("\r",i)}
EDIT: Actually, this is very close to #Simon O'Hanlon's answer, but given the confusion in the comments and the fact that it isn't exactly the same, I'll leave it here.
Try using cat()...
for (i in 1:10) {cat(paste(i," "))}
#1 2 3 4 5 6 7 8 9 10
cat() performs much less conversion than print() (from the horses mouth).
To repeatedly print in the same place, you need to clear the console. I am not aware of another way to do this, but thanks to this great answer this works (in RStudio on Windows at least):
for (i in 1:1e3) {
cat( i )
Sys.sleep(0.01)
cat("\014")
}
Well... are you worried about hangs, or just about being notified when the job completes?
In the first case, I'd stick w/ my j%%N suggestion, where N is large enough that you don't swamp the console.
In the second case, add a final line to your script or function which, e.g., calls "Beep" .

R Code Taking Too Long To Run

I have the following code running and it's taking me a long time to run. How do I know if it's still doing its job or it got stuck somewhere.
noise4<-NULL;
for(i in 1:length(noise3))
{
if(is.na(noise3[i])==TRUE)
{
next;
}
else
{
noise4<-c(noise4,noise3[i]);
}
}
noise3 is a vector with 2418233 data points.
You just want to remove the NA values. Do it like this:
noise4 <- noise3[!is.na(noise3)]
This will be pretty much instant.
Or as Joshua suggests, a more readable alternative:
noise4 <- na.omit(noise3)
Your code was slow because:
It uses explicit loops which tend to be slow under the R interpreter.
You reallocate memory every iteration.
The memory reallocation is probably the biggest handicap to your code.
I wanted to illustrate the benefits of pre-allocation, so I tried to run your code... but I killed it after ~5 minutes. I recommend you use noise4 <- na.omit(noise3) as I said in my comments. This code is solely for illustrative purposes.
# Create some random data
set.seed(21)
noise3 <- rnorm(2418233)
noise3[sample(2418233, 100)] <- NA
noise <- function(noise3) {
# Pre-allocate
noise4 <- vector("numeric", sum(!is.na(noise3)))
for(i in seq_along(noise3)) {
if(is.na(noise3[i])) {
next
} else {
noise4[i] <- noise3[i]
}
}
}
system.time(noise(noise3)) # MUCH less than 5+ minutes
# user system elapsed
# 9.50 0.44 9.94
# Let's see what we gain from compiling
library(compiler)
cnoise <- cmpfun(noise)
system.time(cnoise(noise3)) # a decent reduction
# user system elapsed
# 3.46 0.49 3.96
The other answers have given you much, much better ways to do the task that you actually set out to achieve (removing NA values in your data), but an answer to the specific question you asked ("how do I know if R is actually working or if it has instead gotten stuck?") is to introduce some output (cat) statements in your loop, as follows:
rpt <- 10000 ## reporting interval
noise4<-NULL;
for(i in 1:length(noise3))
{
if (i %% rpt == 0) cat(i,"\n")
if(is.na(noise3[i])==TRUE)
{
next;
}
else
{
noise4<-c(noise4,noise3[i]);
}
}
If you run this code you can immediately see that it slows down radically as it gets farther into the loop (a consequence of the failure to pre-allocate space) ...
The others have all given correct ways to do the same problem, so that you needn't worry about speed. #BenBolker also gave a good pointer regarding regular output.
A different thing to note is that if you find yourself in a loop, you can break out of it and find the value of i. Assuming that re-starting from that value of i won't harm things, i.e. using that value twice won't be a problem, you can restart. Or, you can just finish the job as the others have stated.
A separate trick is that if the loop is slow (and can't be vectorized or else you're not eager to break out of the loop), AND you don't have any reporting, you can still look for an external method to see if R is actually consuming cycles on your computer. In Linux, the top command is your best bet. On Windows, the task manager will do the trick (I prefer to use the SysInternals / Microsoft program Process Explorer). 'top' also exists on Macs, though I believe there are some other more popular tools.
One other word of advice: if you have a really long loop to run, I strongly encourage saving the results regularly. I typically create a file with the a name like: myPrefix_YYYYMMDDHHMMSS.rdat . This way everything can go to hell and you can still start your loop where you left off.
I don't always iterate, but when I do, I use these tricks. Stay speedy, my friend.
For one case I've faced, updating all packages in use under R studio resolved the issue.

Resources