I have some Julia functions that are several hundreds of lines long that I would like to profile so that I can then work on optimizing the code.
I am aware of the BenchmarkTools package which allows the overall execution time and memory consumption of a function to be measured using #btime or #benchmark. But those functions tell me nothing about where inside the functions the bottlenecks are. So my first step would have to be using some tool to identify which parts of the code are slow.
In Matlab for instance there is a very nice built-in profiler which runs a script/function and then reports the time spent on every line of the code. Similarly in Python there is a module called line_profiler which can produce a line-by-line report showing how much time was spent on every single line of a function.
What I’m looking for is simply a line-by-line report showing the total time spent on each line of code and how many times a particular piece of code was called.
Is there such a functionality in Julia? Either built-in or via some third-party package.
There is a Profiling chapter in Julia docs with all the necessary info.
Also, you can use ProfileView.jl or similar packages for visual exploration of the profiled code.
And, not exactly profiling, but very useful in practice package is TimerOutputs.jl
UPD: Since Julia is a compiling language it makes no sense to measure timing of individual lines, since the actual code that is executed can be very different from what is written in Julia.
For example following julia code
function f()
x = 0
for i in 0:100_000_000
x += i
end
x
end
is lowered to
julia> #code_llvm f()
; # REPL[8]:1 within `f'
define i64 #julia_f_594() {
top:
; # REPL[8]:7 within `f'
ret i64 5000000050000000
}
I.e. there is no loop at all. This is why instead of execution time proxy metric of how often a line appears in the set of all backtraces is used. Of course, it is not the same as the execution time, but it gives a good approximation of where the bottleneck is because lines with long execution time appear in backtraces more often.
OwnTime.jl. Doesn't do call counts though, but it should be easy to add.
Related
I am a very new Julia user (coming from Matlab), so forgive me if I ask a very dumb question.
I currently have a julia code, which works (it runs fine) though it provides different results if I execute it as a function or if I run every of the function lines interactively.
My script is mostly about linear algebra and uses Arrays and Dicts.
As I have some trouble making use of the Juno debugger, I did not find another way to debug my code, which is quite a shame.
I spent the last three hours on this and I still have no clue why these results differ.
I suspect I don't understand some very basic working process of julia related to variable allocation but I'm flying blind here.
Does anyone have a explaination for this behavior ?
I can't provide the code here but here is the base structure of the code. Basically the M matrix returned by childfunction is wrong. a is a scalar a dict is a dictionary.
calling function
function motherfunction(...)
M = childfunction(a,dict)
end
child function
function childfunction(...)
...
M = *some linear algebra*
return M
end
function somefun()
x::Int = 1
x = 0.5
end
this compiles with no warning. of course calling it produces an InexactError: Int64(0.5). question: can you enforce a compile time check?
Julia is a dynamic language in this sense. So, no, it appears you cannot detect if the result of an assignment will result in such an error without running the function first, as this kind of type checking is done at runtime.
I wasn't sure myself, so I wrapped this function in a module to force (pre)compilation in the absence of running the function, and the result was that no error was thrown, which confirms this idea. (see here if you want to see what I mean by this).
Having said this, to answer the spirit of your question: is there a way to avoid such obscure runtime errors from creeping up in unexpected ways?
Yes there is. Consider the following two, almost equivalent functions:
function fun1(x ); y::Int = x; return y; end;
function fun2(x::Int); y::Int = x; return y; end;
fun1(0.5) # ERROR: InexactError: Int64(0.5)
fun2(0.5) # ERROR: MethodError: no method matching fun2(::Float64)
You may think, big deal, we exchanged one error for another. But this is not the case. In the first instance, you don't know that your input will cause a problem until the point where it gets used in the function. Whereas in the second case, you are effectively enforcing a type check at the point of calling the function.
This is a trivial example of programming "by contract", by making use of Julia's elegant type-checking system. See Design By Contract for details.
So the answer to your question is, yes, if you rethink your design and follow good programming practices, such that this kind of errors are caught early on, then you can avoid having them occuring later on in obscure scenarios where they are hard to fix or detect.
The Julia manual provides a style guide which may also be of help (the example I give above is right at the top!).
It's worth thinking through what "compile time" really is in Julia — because it's probably not what you're thinking.
When you define the function:
julia> function somefun()
x::Int = 1
x = 0.5
end
somefun (generic function with 1 method)
You are not compiling it. Julia won't compile it, in fact, until you call it. Julia's compiler can be thought of as Just-Barely-Ahead-Of-Time, standing in contrast to typical JIT or AOT designs.
Now, when you call the function it compiles it and then runs it which throws the error. You can see this compilation happening the very first time you call the function — it takes a bit more time and memory as it generates and caches the specialized code:
julia> #time try somefun() catch end
0.005828 seconds (6.76 k allocations: 400.791 KiB)
julia> #time try somefun() catch end
0.000107 seconds (6 allocations: 208 bytes)
So perhaps you can see that with Julia's compilation model it doesn't so much matter if it gets caught at compile time or not — even if Julia refused to compile (and cache) the code it'd behave exactly like what you currently see. It'd still allow you to define the function in the first place, and it'd still only throw its error upon calling the function.
The question you mean to ask is if Julia could (or should) catch this error at function definition time. And then the question is really — is it ok to define a method that always results in an error? What about a function like error itself? In Julia, it's totally fine to define a method that unconditionally errors like this one, and there can be good reasons to do so.
Now, there are ways to ask Julia if it is able to detect that this method will always unconditionally error:
julia> #code_typed somefun()
CodeInfo(
1 ─ invoke Base.convert(Main.Int::Type{Int64}, 0.5::Float64)::Union{}
└── $(Expr(:unreachable))::Union{}
) => Union{}
This is the very first step in Julia's process of compilation, and in this case it can see that everything beyond convert(Int, 0.5) is unreachable — that is, it errors. Further, it knows that since the function will never return, it's return type is Union{} (that is, no possible type can ever be returned!) So you can ask Julia to do this step with, for example, the #inferred macro as part of a test suite.
I'm benchmarking Julia execution speed. I executed #time [i^2 for i in 1:1000] on Julia prompt, which resulted in something of the order of 20 ms. This seems strange, since my computer is modern with an i7 processor (I am using Linux Ubuntu).
Another strange thing is that when I execute the same command on a range of 1:10 the execution time is 15 ms.
There must be something trivial that I am missing here?
Several things, see performance tips:
Don't benchmark in global scope.
Don't measure the first execution of something like this.
Use BenchmarkTools.
Julia is a JIT-compiled language, so the first time you measure things, you're measuring compilation time. This is a small fixed overhead, so for anything that takes a substantial time, it's negligible, but for short-running code like this, it's almost all of the time. Non-constant global variables force the compiler to assume almost nothing about types, which tends to poison all of your performance. This is fine in some circumstances, but most of the time, you a) should write code so that the inputs are explicit parameters to functions, rather than implicit parameters coming from some globals, and b) shouldn't write code that uses mutable global state.
I'm running a simulation in an ipython notebook that is composed of seven functions that are dependent of each other, and requires 13 different parameters. Some of the functions are called within other functions to allow one function to run the entire simulation. The simulation involves manipulating two parameters for a total of >20k iterations. Two simulations can be run asynchronously. Since each iteration is taking ~1.5 seconds, I'm investigating parallel processing.
When I first tried ipyparallel, I got a global name not defined error. Makes sense that local objects can't been found a worker. In an effort to avoid spending quite a bit of time going down a rabbit hole, what would be the easiest way to pass a whole bunch of objects to all of the workers? Are there other gotchas to consider when using ipyparallel in this way?
There is a bit more detail in this related question, but the gist is: interactively defined modules resolve in the interactive namespace (__main__), which is different on the engine and client. You can send functions to the engine with view.push(dict(func=func, func2=func2)), in which case they will be found. The alternative is to define your functions in a module or package that you ensure is installed on all the engines.
For instance, in a script:
def bar(x):
return x * x
def foo(y):
return bar(y)
view.apply(foo, 5) # NameError on bar
view.push(dict(bar=bar)) # send bar
view.apply(foo, 5) # 25
Often when using IPython parallel from a notebook or larger script, one of the early steps is seeding the namespace of the engines:
rc[:].push(dict(
f1=f1,
f2=f2,
const=const,
))
If you have more than a few names to push this way, it might be time to consider defining these functions in a module, and distributing that instead.
I have started using the doMC package for R as the parallel backend for parallelised plyr routines.
The parallelisation itself seems to be working fine (though I have yet to properly benchmark the speedup), my problem is that the logging is now asynchronous and messages from different cores are getting mixed in together. I could created different logfiles for each core, but I think I neater solution is to simply add a different label for each core. I am currently using the log4r package for my logging needs.
I remember when using MPI that each processor got a rank, which was a way of distinguishing each process from one another, so is there a way to do this with doMC? I did have the idea of extracting the PID, but this does seem messy and will change for every iteration.
I am open to ideas though, so any suggestions are welcome.
EDIT (2011-04-08): Going with the suggestion of one answer, I still have the issue of correctly identifying which subprocess I am currently inside, as I would either need separate closures for each log() call so that it writes to the correct file, or I would have a single log() function, but have some logic inside it determining which logfile to append to. In either case, I would still need some way of labelling the current subprocess, but I am not sure how to do this.
Is there an equivalent of the mpi_rank() function in the MPI library?
I think having multiple process write to the same file is a recipe for a disaster (it's just a log though, so maybe "disaster" is a bit strong).
Often times I parallelize work over chromosomes. Here is an example of what I'd do (I've mostly been using foreach/doMC):
foreach(chr=chromosomes, ...) %dopar% {
cat("+++", chr, "+++\n")
## ... some undoubtedly amazing code would then follow ...
}
And it wouldn't be unusual to get output that tramples over each other ... something like (not exactly) this:
+++chr1+++
+++chr2+++
++++chr3++chr4+++
... you get the idea ...
If I were in your shoes, I think I'd split the logs for each process and set their respective filenames to be unique with respect to something happening in that process's loop (like chr in my case above). Collate them later if you must ... ie. map/reduce your log files :-)