julia workflow with JIT compiler - julia

I'v recently picked up Julia as a neat way to implement some computationally heavy projects. So far I'm quite impressed by both speed and convenience - however, there's one thing I sort of dislike: when a code becomes fairly large running scripts takes increasing amounts of time since the JIT compiler needs to compile all files time and time again (not only the modified ones as, e.g., in C++ with CMake). This slows down my development workflow - what's the most julian/best practice way to speed this up so that I avoid waiting (sometime exessive) time?

Despite the workflow outlined in the comments above (keep REPL open and use Revise.jl), this package might be helpful for you:
https://github.com/dmolina/DaemonMode.jl

Related

Why is .Net Core's PublishSingleFile doing self extraction instead of resource embedding assemblies?

To me, merging .Net binaries so that I'm left with one single file executable was always a both interesting and important feature. From what I've read, there are three approaches regarding this:
ILMerge
Really merges the assemblies. Comparable to creating one single .csproj and pulling in all source code.
Fody.Costura
Keeps the assemblies seperate but just includes their files as ressources and bends the resolution accordingly.
.Net Core 3+ PublishSingleFile / Warp
From what I understand the approaches are similar. Extract the files to a temp dir on first execution and just run them normally.
While the topics are quite related and sometimes do overlap I don't intend to mix up single file, self-contained deployments and trimming - I'm mainly thinking of the single file part here.
I think any of the three approaches has it's own advantages and disadvantages and no one is "the" way to go.
ILMerge
+ Seems like the most natural option for "merging"
+ Requires no runtime loading, if everything is there and works than that's it; once the image is in memory we're done
- Loading everything at once everytime might lead to lower startup performance, relevant to desktop apps
- Changes quite much about the internals, assembly names change, probably library internals get more involved with app code etc.; this might lead to hard to debug errors, in the worst case popping up in production, I guess. It just changes things developers may have taken for granted (i.e. name of their assembly).
- Probably not conceptual but doesn't support .Net Core => dealbreaker
Fody.Costura
+ Enables compression for dependencies
+ Changes much less internals, probably leading to less unforeseen situations
+ Probably requires the OS to load the full image but not the CLR to load the libs at once so might improve start-up performance
- Requires de-gzipping the dependencies every time; might be a performance penalty CPU-wise, could be a performance benefit IO-wise
Self extracting
+ Compressed too
+ Once the code is extracted and running, the only difference is where it's located. Thus, lib developers can even still find their assembly on disk etc., making the least runtime differences I guess.
- First time startup can take some noticable time and is quite IO intensive, kind of an installation.
+ Subsequent runs behave like non-single-file deployments, having their respective (dis)advantages compared to the other two.
- Comes with some non trivial problems. When to update the extracted files? Who deletes old versions of the extracted files? What if the disk is full/...?
- Executing is not side-effect free anymore.
- More then 2x the disk space needed.
In my personal obviously non-exhaustive and non-scientific experience, (3) sometimes had some rough edges and showed strange behaviour while (2) worked really good in the years I've been using it (comaring Framework vs Core though). But probably thats either due to the implementation of (3) beeing quite new or my misunderstanding.
The intention of me posting this question is to (a) understand this better in general and (b) understand why Microsoft chose (3). So, it'd be great
if I missed differences between the three to point them out (feel free to edit the question to keep things in one place).
if I made false assumptions to point them out.
if there are more options in general or if I'm wrong about one of the concepts to point them out
if there's content about the reasoning, why Microsoft selected (3) as the superior way (that would probably deepen my understanding for the pros and cons) to point me in the right direction.
While I understand it's hard to compete with a framework feature, I'm a bit sad that Costura is in maintenance mode, given I think it has advantages over the frameworks approach, making the second option a viable choice. Thus, one of the reasons I'm asking this question is to understand if there are any severe advantages of (3) over (2) I'm overlooking.

Matlab like debugging tool for Julia

I am writing a big project in Julia at the moment and the only option that I found to debug this code is Debug.jl. It is sooo(!) overwhelming to debug this code without a debugger like the one MATLAB has.
Are there any such debugging tools? I could adopt them even if they are in alpha stage.
Anyone has timeline estimates as to when they are planned to appear?
There is work in progress by Keno Fischer (one of the core Julia developers) on a debugger called Gallium.jl.
This is a very complicated piece of work, due to the nature of Julia as a JIT-compiled language; for example, as one piece, it will include a C++ REPL! As I understand it, there are still some technical issues that prevent it being used, but it will hopefully be available for general consumption "soon".
See this video for a demo, and this discussion on the julia-dev mailing list for the latest news.

How to detect type unstable functions in Julia

Setup: Let's say I have a reasonably detailed piece of software (in Julia), involving the interaction of several modules. I feel like it is running slower than it should. Typically the first culprit to check for is type unstable functions, i.e. functions where the compiler is unable to determine ahead of time what the output type will be.
Question: How can I detect these type unstable functions?
What I currently do: I use the profiling tools, e.g. the ProfileView.jl package of #tholy, to detect bottlenecks, under the assumption that type unstable functions will show up here (due to their excessive run-time). But what would be really nice is some sort of debugging tool that, after a routine is run, will spit out a list of functions where the compiler was unable to determine the output type ahead of time. Is this possible?
You could try TypeCheck.jl on bits the profiler say are slow.
Julia 0.4 has #code_warntype as well.
In addition to the excellent suggestions of IainDunning, running julia with --track-allocation=user and analyzing the results with analyze_malloc from the Coverage package is a good way to quickly get a high-level overview. The principle is that type-instability triggers memory allocation, so looking for lines of code that have unexpected, large allocations is a good way to find the most egregious instances of type instability.
You can find more information about track-allocation in the manual, and even more performance-analysis options described.

'make'-like dependency-tracking library?

There are many nice things to like about Makefiles, and many pains in the butt.
In the course of doing various project (I'm a research scientist, "data scientist", or whatever) I often find myself starting out with a few data objects on disk, generating various artifacts from those, generating artifacts from those artifacts, and so on.
It would be nice if I could just say "this object depends on these other objects", and "this object is created in the following manner from these objects", and then ask a Make-like framework to handle the details of actually building them, figuring out which objects need to be updated, farming out work to multiple processors (like Make's -j option), and so on. Makefiles can do all this - but the huge problem is that all the actions have to be written as shell commands. This is not convenient if I'm working in R or Perl or another similar environment. Furthermore, a strong assumption in Make is that all targets are files - there are some exceptions and workarounds, but if my targets are e.g. rows in a database, that would be pretty painful.
To be clear, I'm not after a software-build system. I'm interested in something that (more generally?) deals with dependency webs of artifacts.
Anyone know of a framework for these kinds of dependency webs? Seems like it could be a nice tool for doing data science, & visually showing how results were generated, etc.
One extremely interesting example I saw recently was IncPy, but it looks like it hasn't been touched in quite a while, and it's very closely coupled with Python. It's probably also much more ambitious than I'm hoping for, which is why it has to be so closely coupled with Python.
Sorry for the vague question, let me know if some clarification would be helpful.
A new system called "Drake" was announced today that targets this exact situation: http://blog.factual.com/introducing-drake-a-kind-of-make-for-data . Looks very promising, though I haven't actually tried it yet.
This question is several years old, but I thought adding a link to remake here would be relevant.
From the GitHub repository:
The idea here is to re-imagine a set of ideas from make but built for R. Rather than having a series of calls to different instances of R (as happens if you run make on R scripts), the idea is to define pieces of a pipeline within an R session. Rather than being language agnostic (like make must be), remake is unapologetically R focussed.
It is not on CRAN yet, and I haven't tried it, but it looks very interesting.
I would give Bazel a try for this. It is primarily a software build system, but with its genrule type of artifacts it can perform pretty arbitrary file generation, too.
Bazel is very extendable, using its Python-like Starlark language which should be far easier to use for complicated tasks than make. You can start by writing simple genrule steps by hand, then refactor common patterns into macros, and if things become more complicated even write your own rules. So you should be able to express your individual transformations at a high level that models how you think about them, then turn that representation into lower level constructs using something that feels like a proper programming language.
Where make depends on timestamps, Bazel checks fingerprints. So if at any one step produces the same output even though one of its inputs changed, then subsequent steps won't need to get re-computed again. If some of your data processing steps project or filter data, there might be a high probability of this kind of thing happening.
I see your question is tagged for R, even though it doesn't mention it much. Under the hood, R computations would in Bazel still boil down to R CMD invocations on the shell. But you could have complicated muliti-line commands assembled in complicated ways, to read your inputs, process them and store the outputs. If the cost of initialization of the R binary is a concern, Rserve might help although using it would make the setup depend on a locally accessible Rserve instance I believe. Even with that I see nothing that would avoid the cost of storing the data to file, and loading it back from file. If you want something that avoids that cost by keeping things in memory between steps, then you'd be looking into a very R-specific tool, not a generic tool like you requested.
In terms of “visually showing how results were generated”, bazel query --output graph can be used to generate a graphviz dot file of the dependency graph.
Disclaimer: I'm currently working at Google, which internally uses a variant of Bazel called Blaze. Actually Bazel is the open-source released version of Blaze. I'm very familiar with using Blaze, but not with setting up Bazel from scratch.
Red-R has a concept of data flow programming. I have not tried it yet.

Just in Time compilation always faster?

Greetings to all the compiler designers here on Stack Overflow.
I am currently working on a project, which focuses on developing a new scripting language for use with high-performance computing. The source code is first compiled into a byte code representation. The byte code is then loaded by the runtime, which performs aggressive (and possibly time consuming) optimizations on it (which go much further, than what even most "ahead-of-time" compilers do, after all that's the whole point in the project). Keep in mind the result of this process is still byte code.
The byte code is then run on a virtual machine. Currently, this virtual machine is implemented using a straight-forward jump table and a message pump. The virtual machine runs over the byte code with a pointer, loads the instruction under the pointer, looks up an instruction handler in the jump table and jumps into it. The instruction handler carries out the appropriate actions and finally returns control to the message loop. The virtual machine's instruction pointer is incremented and the whole process starts over again. The performance I am able to achieve with this approach is actually quite amazing. Of course, the code of the actual instruction handlers is again fine-tuned by hand.
Now most "professional" run-time environments (like Java, .NET, etc.) use Just-in-Time compilation to translate the byte code into native code before execution. A VM using a JIT does usually have much better performance than a byte code interpreter. Now the question is, since all an interpreter basically does is load an instruction and look up a jump target in a jump table (remember the instruction handler itself is statically compiled into the interpreter, so it is already native code), will the use of Just-in-Time compilation result in a performance gain or will it actually degrade performance? I cannot really imagine the jump table of the interpreter to degrade performance that much to make up the time that was spent on compiling that code using a JITer. I understand that a JITer can perform additional optimization on the code, but in my case very aggressive optimization is already performed on the byte code level prior to execution. Do you think I could gain more speed by replacing the interpreter by a JIT compiler? If so, why?
I understand that implementing both approaches and benchmarking will provide the most accurate answer to this question, but it might not be worth the time if there is a clear-cut answer.
Thanks.
The answer lies in the ratio of single-byte-code-instruction complexity to jump table overheads. If you're modelling high level operations like large matrix multiplications, then a little overhead will be insignificant. If you're incrementing a single integer, then of course that's being dramatically impacted by the jump table. Overall, the balance will depend upon the nature of the more time-critical tasks the language is used for. If it's meant to be a general purpose language, then it's more useful for everything to have minimal overhead as you don't know what will be used in a tight loop. To quickly quantify the potential improvement, simply benchmark some nested loops doing some simple operations (but ones that can't be optimised away) versus an equivalent C or C++ program.
When you use an interpreter, the code cache in your processor caches the interpreter code; not the byte code (which may be cached in the data cache). Since code caches are 2 to 3 times faster than data caches, IIRC; you may see a performance boost if you JIT compile. Also, the native, real code you are executing is probably PIC; something which can be avoided for JITted code.
Everything else depends on how optimized the byte code is, IMHO.
JIT can theoretically optimize better, since it has information not available at compile time (especially about typical runtime behavior). So it can for example do better branch prediction, roll out loops as needed, et.c.
I am sure your jumptable approach is OK, but I still think it would perform rather poor compared to straight C code, don't you think?

Resources