Progress bar for single, long command (not a loop) in R - r

I can see some very nice progress bars in R, for example using the progress package or cat()function
The problem is, these only work with long tasks that are performed incrementally, like loops or multiple commands
Is there a way to have a progress bar that doesn't rely on a loop or sequence of operations, and can simply work for a single, long command?
Note
If I could somehow start the timed progress bar in progress package simultaneously with the main operation, that would also solve the issue (but I'm not sure if it's possible?). Here is a timed progress bar for reference
library(progress)
pb <- progress_bar$new(total = 100)
for (i in 1:100) {
pb$tick()
Sys.sleep(1 / 100)
}
Also note
An example of a time consuming (single) command could be as simple as Sys.sleep(20). The actual use case I have is extracting a large JSON object from an API which takes 10 - 30 seconds depending on connection speed etc

Related

R bootstrap continue execution

I would like to use bootstrapping using the boot library. Since calculating the statistics from each sample is a length process, it is going to take several days for the entire bootstrapping calculation to conclude. Since the computer I am using disconnects every several hours, I would like to use some checkpoint mechanism such that I will not have to start from scratch every time. Currently, I am running:
results <- boot(data=data, statistic=my_slow_function, R=10000, parallel='snow', ncpus=4, cl=cl)
but I would rather run it with R=100 multiple times such that I will be able to save the intermediate results and retrieve them if the connection hang-up. How can I achieve that?
Thank you in advance
Maybe you can combine results for the bootstrap replicates:
#simulating R=10000
results_list <- lapply(1:00, function(x) {
return(boot(data=data, statistic=my_slow_function, R=100, parallel='snow', ncpus=4)$t)
})
results_t <- unlist(results_list)
hist(results_t)
t0 = mean(results_t)

How to add a progress bar to a package function in r

I am running a moving window function from the package landscapemetrics. This seems to take some time as the raster is quite big. It would be really helpful to have a progress bar or something similar. How can I code something like this without having a for loop or a self-coded function to begin with? I don't know how to provide an example raster, but here is my code:
my.raster <- raster('forest2_nonforest1_min_extent.tif')
#specify window size
moving_window <- matrix(1,nrow=5,ncol=5)
#moving window analysis
tt <- window_lsm(my.raster,
window = moving_window,
level= "landscape",
what = c("lsm_l_ed"))
I need to have a visualization of the progress for the last function (#moving window analysis)
The function window_lsm() uses raster::focal() internally, which doesn't provide a progress bar itself. So without writing your own loop/moving window function I think this won't be possible, unfortunately.
As already mentioned above, the progress argument in window_lsm() refers to layers and metrics only, but not the moving window.
Not familiar with this package, but window_lsm() has an argument progress which will "print progress report" when TRUE.
Otherwise, to the best of my knowledge it's not possible to implement a true progress bar without any kind of iteration / loop. The one other option I see would be to look at the source of window_lsm(); find the outermost loop (if there is one); define your own local version of the function; and insert a progress bar incremented inside the loop (e.g. using the progress package). (Obviously, you wouldn't want to redistribute this without looking into the licensing / discussing with package devs.)
I guess another option would be to somehow develop an estimate of how long the operation might take, e.g., based on the size of your raster, then run a countdown timer in a parallel process? My hunch is this would be hard to implement and not especially accurate.

Create an unique pdf with a FOR loop [duplicate]

I'm trying to write a function that plots a ggplot facet_wrap plot over multiple pages. It's just a hack, as this feature seems to be on the ggplot2 feature to-do list. I do some small calculations to find the number of pages I'm going to need, the number of rows of my data.frame that I need per page etc. I'm pretty confident this all works.
pdf(filename)
for (i in seq(num_pages)){
slice = seq(((i-1)*num_rows)+1,(i*num_rows))
slice = slice[!(slice > nrow(df.merged))]
df.segment=df.merged[slice,]
p <- ggplot(df.segment, aes(y=mean,x=phenotype))
p <- p + geom_bar(stat="identity",fill="white",colour="black")
p + facet_wrap("ID",scales="free_y",ncol=n_facets,nrow=n_facets)
}
dev.off()
My problem is that, by wrapping it all up in a for loop like this, in between the pdf() and dev.off() functions, is that the for loop doesn't seem to wait for ggplot to do its thing, and blazes through its loop very quickly and outputs an invalid PDF.
If I set i = 1, start the pdf(), run the above code inside the for loop, then set i=2, then run the code, and so on until I get bored (i=3) then turn off the device the resulting PDF is brilliant.
Is there a way I can get the for loop to wait for the final line to finish plotting before moving onto the next iteration?
I think the problem is that you need print() around your last line (p+ ...) to get it to actually print to the device inside the for loop . . .
Exactly. Page 39 of the ggplot2 book tells us that when you create ggplot2 objects, you can "Render it on screen, with print(). This happens automatically when running interactively, but inside a loop or function, you'll need to print() it yourself".

Octave crashes when printing a plot

Solution: As suggested by user Andy in the comments, an update to the newest version of Octave (at the moment: octave-4.0.1-rc4) fixed the problem and the plot could be saved as PNG.
I have a large-ish amount of data that I plot in Octave. But when I try to save the image, the program crashes without any explanation or real error message. My Octave is version 4.0 and it's running on Win 8.1, the graphics_toolkit is qt.
Saving smaller amounts of data has worked so far, but somehow I seem to have reached a size where the plot can be drawn but not saved.
First, I load the data from several files listed in the vector inputs:
data = [];
for i = 1:length(inputs)
data = [data; load(inputs{i})];
endfor
The result is a 955.524 x 7 matrix containing numbers. Loading alone takes a while on my system (several minutes), but eventually succeeds. I then proceed to plot the data:
hold on;
for j = 1:length(data(1,:))
curenntColumn = normalize(data(:,j)); % make sure all data is in the same range
plot(1:length(curenntColumn), curenntColumn, colours{j}); % plot column with distinct colour
endfor
hold off;
This results in a plot being drawn as Figure 1 that shows all 955.524 entries of each of the seven columns correctly in a distinct colour. If the program ends here, it exits properly. However, if I add
print("data.png");
Octave will keep running after opening the plot window and eventually crash with a simple "program does not work anymore" error message. The same happens if I try to save manually from the File->Save menu (which offers saving as PDF). Even just touching and moving the plot window takes a few seconds.
I tried using gnuplot and fltk as graphics_toolkit, but the latter does not even open a plot window, and the former seems to be broken (crashes on the attempt of plotting even simple data like plot(1:10,1:10);).
Now, I could screenshot the plot and try to work with that, but I'd really rather have it be saved automatically. Also, I find it weird that displaying the curves is possible, but not saving said display. As it works for smaller amounts of data, maybe I just need to somehow allocate more resources to Octave?
It (4.2.2 version) crashes with my Linux Mint. just a simple graph, and it crashed two times in a row. I am going back to R. I had my hopes up as I wanted to review the Numerical Analysis Using Matlab text.
Wait, come to think of it, the Studio Version of R crashes when I try to use it but not when I run the same program from the command line, so I will go back (one more time) and try to run a plot totally from the command line. The Linux Mint requires a 2 CPU 64 bit, and I just have the 64 bit single CPU.

Why does a ddply command take so long after the progress bar gets to 100?

I have a data file with 10000 lines. The file contains blocks of 100 lines with a different factor in the first column, and I use ddply to process them. For example like this:
result.df = ddply(data.df, "V1", calc_stuff, .progress = "text")
message("done!")
It takes about one minute for the ddply progress bar to get to 100%. However then R does "something" for another 5-7 minutes before the next line in the script is processed (the message is printed in this example).
What is R doing in that time? Collecting the results in "result.df"? Can I speed that up somehow? I have many of these files to process.
ddply takes the following approach:
Split up dataset.
Apply function to each component of the split
Combine the components into one big result data set
The progressbar probably deals with step 2, and states how far along it is in processing each of the chunks. Step 3. is what takes time in your case, and is not included in the progressbar.
To speed up your analysis, I would stop using plyr and start using dplyr. This is orders of magnitude faster than plyr, and the successor to plyr. See the tutorial I wrote for some more information.
Your code example would boil down to something like:
results.df = data.df %>% group_by(V1) %>% calc_stuff()

Resources