Benchmarking function that takes a long time - julia

function1(args; kwargs) and function2(args; kwargs) are two functions that, given the same input, return the same output. I would like to check that function1 is faster than function2. Both of these functions take a very long time to run (10 minutes). I tried using #btime but this seems to take ages. My guess is that it is running the function thousands of times. I don't care too much about how precise the average is, so how can I benchmark the functions with just 1 or 2 runs?

#btime is largely there to take care of two issues in benchmarking: compilation overhead and random noise caused by other processes on your system which may lead to the function running slower for reasons unrelated to the actual machine code being executed.
In your case it seems that neither of these are particularly worrying, simply because the runtime of the function is so long. You can take care of the compilation overhead by simply running both functions once before timing them, and random system noise will likely even out across the long runtime of your functions.
If you do want to use BenchmarkTools look at the samples and evals keywords to #benchmark described in the manual here: https://juliaci.github.io/BenchmarkTools.jl/stable/manual/#Benchmark-Parameters

Related

Parallel processing in R with "parallel" package - unpredictable runtime

I've been learning to parallelize code in R using the parallel package, and specifically, the mclapply() function with 14 cores.
Something I noticed, just from a few runs of code, is that repeat calls of mclapply() (with the same arguments and same number of cores used) take significantly different lengths of time. For example, the first run took 18s, the next run took 23s, and the next one took 34s when I did them back to back to back (on the same input). So I waited a minute, ran the code again, and it was back down to taking 18s.
Is there some equivalent of "the computer needs a second to cool down" after running the code, which would mean that running separate calls of mclapply() back to back might take longer and longer amounts of time, but waiting for a minute or so and then running mclapply() again gets it back to normal?
I don't have much experience with parallelizing in R, but this is the only ad-hoc explanation I can think of. It would be very helpful to know if my reasoning checks out, and hear in more detail about why this might be happening. Thanks!
To clarify, my calls are like:
RNGkind("L'Ecuyer-CMRG")
set.seed(1)
x <- mclapply(training_data, simulation, testing_data, mc.cores=14, mc.set.seed = TRUE)
Running this twice in a row takes a lot longer the second time for me. Waiting for a minute and then running it again, it becomes fast again.
I haven't used mcapply but I have used parallel, foreach and pbapply packages. I think the inconsistency lies in the fact that there are small overheads involved in firing workers and in communicating on progress of running tasks in parallel.

Speeding up package load in Julia

I wrote a program to solve a linear program in Julia using GLPKMathProgInterface and JuMP. The Julia code is being called by python program which runs multiple instances of the Juila code through multiple command line calls. While I'm extremely happy with the performance of the actual solver the initialization is extremely slow. I was wondering if there were approaches to speed this up.
For example if I just save the following to a file
#time using DataFrames, CSV, GLPKMathProgInterface, JuMP, ArgParse
and run it
mylabtop:~ me$ julia test.jl
12.270137 seconds (6.54 M allocations: 364.537 MiB, 3.05% gc time)
This seems extremely slow, is there some good way to speed up using modules like a precompile step I could do once?
Since you haven't gotten any answers yet, let me give you the general first order answers - although I hope someone more qualified will answer your question in more detail (and correct me if I'm wrong).
1) Loading packages in Julia is sometimes rather slow up to the time of this writing. It has been discussed many times and you can expect improvements in the future. AFAIK this will happen in early 1.x releases after 1.0 is out. Have a look at this thread.
2) Since you typically only have to pay the loading time cost once per Julia session one approach is to keep the session running for as long as possible. You can execute your script with include("test.jl") from within the session. Let me also mention the amazing Revise.jl - it's hardly possible to overemphasize this package!
3) (I have no experience with this more difficult approach.) There is PackageCompiler.jl which allows you to compile a package into your system image. Read this blog post by Simon.
4) (Not recommended) There has also been the highly experimental static-julia which statically compiles your script into an shared library and executable.
Hope that helps.

Profiling valid for parallel efficiency study?

I have been puzzled by the following matter:
I am trying to check the weak scaling of an in-house parallel Fortran code. Initially I tried to utilise the time command, but I would receive significant higher real times than the sys+user times. So, I ended up using gprof to perform the time measuring (although it may slow down the execution).
Is gprof a valid approach for benchmarking parallel efficiency (considering it is not ideal approach)?

When does foreach call .combine?

I have written some code using foreach which processes and combines a large number of CSV files. I am running it on a 32 core machine, using %dopar% and registering 32 cores with doMC. I have set .inorder=FALSE, .multicombine=TRUE, verbose=TRUE, and have a custom combine function.
I notice if I run this on a sufficiently large set of files, it appears that R attempts to process EVERY file before calling .combine the first time. My evidence is that in monitoring my server with htop, I initially see all cores maxed out, and then for the remainder of the job only one or two cores are used while it does the combines in batches of ~100 (.maxcombine's default), as seen in the verbose console output. What's really telling is the more jobs i give to foreach, the longer it takes to see "First call to combine"!
This seems counter-intuitive to me; I naively expected foreach to process .maxcombine files, combine them, then move on to the next batch, combining those with the output of the last call to .combine. I suppose for most uses of .combine it wouldn't matter as the output would be roughly the same size as the sum of the sizes of inputs to it; however my combine function pares down the size a bit. My job is large enough that I could not possibly hold all 4200+ individual foreach job outputs in RAM simultaneously, so I was counting on my space-saving .combine and separate batching to see me through.
Am I right that .combine doesn't get called until ALL my foreach jobs are individually complete? If so, why is that, and how can I optimize for that (other than making the output of each job smaller) or change that behavior?
The short answer is to use either doMPI or doRedis as your parallel backend. They work more as you expect.
The doMC, doSNOW and doParallel backends are relatively simple wrappers around functions such as mclapply and clusterApplyLB, and don't call the combine function until all of the results have been computed, as you've observed. The doMPI, doRedis, and (now defunct) doSMP backends are more complex, and get inputs from the iterators as needed and call the combine function on-the-fly, as you have assumed they would. These backends have a number of advantages in my opinion, and allow you to handle an arbitrary number of tasks if you have appropriate iterators and combine function. It surprises me that so many people get along just fine with the simpler backends, but if you have a lot of tasks, the fancy ones are essential, allowing you to do things that are quite difficult with packages such as parallel.
I've been thinking about writing a more sophisticated backend based on the parallel package that would handle results on the fly like my doMPI package, but there's hasn't been any call for it to my knowledge. In fact, yours has been the only question of this sort that I've seen.
Update
The doSNOW backend now supports on-the-fly result handling. Unfortunately, this can't be done with doParallel because the parallel package doesn't export the necessary functions.

Using system.time in R, getting very varied times

I have written two functions in R and I need to see which is faster so I used system.time. However, the answers are so varied I can't tell. As its for assessed work I don't feel I can actually post the code (in case someone corrects it). Both functions call rbinom to generate multiple values and this is the only part that isn't a simple calculation.
The function time needs to be as fast as possible but both are returning times of anywhere between 0.17 and 0.33. As the mark is 0.14/(my function time) x 10 it's important I know the exact time.
I have left gcFirst=TRUE as recommended in the R help.
My question is why are the times so inconsistent? Is it most likely to be the functions themselves, my laptop or R?
You probably want to use one of the benchmarking packages
rbenchmark
microbenchmark
for this. And even then, variability will always enter. Benchmarking and performance testing is not the most exact science.
Also see the parts on profiling in the "Writing R Extensions" manual.

Resources