Profiling GPRBuild

Profiling GPRBuild - ada

I have a large GPR based project that can take over 30 minutes to compile.
Having analyzed the build process I noticed many obvious inefficiencies (multiple calls to gprbuild rather than aggregates, excessive use of alternative files rather than configurations, etc). I am wondering if there is some means to 'profile' the build process to see what takes so long.
In particular it takes about 5 minutes to recompile when even a single file changes and there is an error in it. In theory it should be pretty quick to realize that that file has to be recompiled (its the only one that does) and start the compilation process, rapidly discovering the error.
From the verbose output it looks like it takes quite a while just parsing the massive web of gpr files used to define the build, but I would like to know where it spends most of its time.
Thus my question is: Is it possible to profile a build done by gprbuild? If so, how?

From low to high complexity:
Ask gprbuild to report more details about what it is doing with the flag -vh.
Run gprbuild through strace.
Rebuild gprbuild with the required flags to profile it using gprof (but be aware that gprof doesn't always tell the truth).

Related

I'm getting OutOfMemoryExceptions, but my trace file is much smaller than my available memory

As the title says, I've got tons of free memory, but I keep getting OutOfMemoryExceptions when processing traces and calling properties on Data Sources. Why is this happening?

The ETL file format is designed to be very space-efficient, and also supports optional compression. Due to these factors taking the data from an .etl file and transforming it into our more useful structures can often require significantly more memory than the original size of the file. However there are two steps that can be taken to make OutOfMemoryExceptions less likely:
Don't use data sources you don't need. Even if none of the properties on a data source are called by your code, simply turning it on by calling its Use method will result in the data source processing events and preparing data for consumption.
Ensure your program is running as a 64-bit process. The default Visual Studio C# project settings are to compile your program targeting AnyCPU, but to prefer running it as a 32-bit process. Unchecking the "Prefer 32-bit" option in your project's Build properties or switching your project's build configuration to x64 will cause your program to run as a 64-bit process.

How to track a process time and disk usage of that particular process in Unix?

I'm trying to compile and run simulations using tcsh shell in unix. How can i track time when the compilation has started and stopped, and what is the disk usage?

Using time you can track the internal CPU time spent on a process. You also can view what top (or similar tools like ps) displays for your process. Typically the CPU minutes spent on that process are shown. You can also use date before and after the process, maybe with option +%s to show the date as seconds since 1970-01-01T00:00:00Z to allow easy arithmetic with it (difference). Keep in mind that CPU time can be larger (more than one CPU used) or smaller (CPU also working on other tasks) than real time. Using date will always show real time. time will try to show both.
Disk usage, however, is more complex. By design, files are not dedicated to a specific process. You can, however, run df before and after the process and compare the two values. You cannot be sure that the difference was created by the process, but if you do several runs, that might help. Also you can use du to find out how much storage size is used in a particular path. This only works if you happen to know (or have a good guess) where the process stores its results in files. In your case, this might be the best way to go as you talk about a compilation job.
You can also have a look at the /proc/<PID>/fd/ directory to see the currently held open file descriptors of a running process.

Julia compiles the script every time?

Julia language compiles the script every time, can't we compile binaries with julia instead?
I tried a small helloworld script with println function it took like 2,3 seconds for julia to show the output! It would be better if we can make binaries instead of compiling every time
Update: There have been some changes in Julia, since I asked this question. Though I'm not following the updates for julia anymore, since I've asked this question and if you're looking for something similar, look into the below answers and comments by people who are following julia.
Also, its good to know that now it takes around 150ms to load a script.

Keno's answer is spot on, but maybe I can give a little more detail on what's going on and what we're planning to do about it.
Currently there is only an LLVM JIT mode:
There's a very trivial interpreter for some simple top-level statements.
All other code is jitted into machine code before execution. The code is aggressively specialized using the run-time types of the values that the code is being applied to, propagated through the program using dynamic type inference.
This is how Julia gets good performance even when code is written without type annotations: if you call f(1) you get code specialized for Int64 — the type of 1 on 64-bit systems; if you call f(1.0) you get a newly jitted version that is specialized for Float64 — the type of 1.0 on all systems. Since each compiled version of the function knows what types it will be getting, it can run at C-like speed. You can sabotage this by writing and using "type-unstable" functions whose return type depends on run-time data, rather than just types, but we've taken great care not to do that in designing the core language and standard library.
Most of Julia is written in itself, then parsed, type-inferred and jitted, so bootstrapping the entire system from scratch takes some 15-20 seconds. To make it faster, we have a staged system where we parse, type-infer, and then cache a serialized version of the type-inferred AST in the file sys.ji. This file is then loaded and used to run the system when you run julia. No LLVM code or machine code is cached in sys.ji, however, so all the LLVM jitting still needs to be done every time julia starts up, which therefore takes about 2 seconds.
This 2-second startup delay is quite annoying and we have a plan for fixing it. The basic plan is to be able to compile whole Julia programs to binaries: either executables that can be run or .so/.dylib shared libraries that can be called from other programs as though they were simply shared C libraries. The startup time for a binary will be like any other C program, so the 2-second startup delay will vanish.
Addendum 1: Since November 2013, the development version of Julia no longer has a 2-second startup delay since it precompiles the standard library as binary code. The startup time is still 10x slower than Python and Ruby, so there's room for improvement, but it's pretty fast. The next step will be to allow precompilation of packages and scripts so that those can startup just as fast as Julia itself already does.
Addendum 2: Since June 2015, the development version of Julia precompiles many packages automatically, allowing them to load quickly. The next step is static compilation of entire Julia programs.

At the moment Julia JIT compiles its entire standard library on startup. We are aware of the situation and are currently working on caching the LLVM JIT output to remedy the situation, but until then, there's no way around it (except for using the REPL).

Why is aspnet_compiler.exe so slow (and can it be made any faster)?

During our build process we run aspnet_compiler.exe against our websites to make sure that all the late-bound stuff in ASP.NET/MVC actually builds (I know nothing about ASP.NET but am assured this is necessary to prevent finding the failures at runtime).
Our sites are fairly large in size, with a few hundred pages/views/controls/etc. however the time taken seems excessive in the 10-15 minute range (for reference, this is longer than it takes the entire solution with approx 40 projects to compile, and we're only pre-compiling two website projects).
I doubt that hardware is the issue as I'm running on the latest Quad core Intel chip, with 4GB RAM and a WD Velociraptor 10,000rpm hard disk. And part of what's odd is that the EXE doesn't seem to be using much CPU (1-5%) and doesn't seem to be doing an awful lot of I/O either.
So... is this a known issue? Why is it so slow? And is there any way to speed it up?
Note: To clarify a couple of things people have answered about, I am not talking about the compilation of code within Visual Studio. We're using web application projects already, and the speed of compilation of those is not the issue. The problem is the pre-compilation of the site after these projects have already been compiled (see this MSDN page for more details) as part of the dev build script. We are performing in-place pre-compilation, not copying the files to a target directory.

Switching to Roslyn compiler most likely will significantly improve precompile time. Here is a good article about it: https://devblogs.microsoft.com/aspnet/enabling-the-net-compiler-platform-roslyn-in-asp-net-applications/.
In addition to this, make sure that batch compilation is enabled by setting batch attribute to true on the compilation element.

Simply, the aspnet_compiler uses what is effectively a "global compiler lock" whenever it starts pre-compiling any individual aspx page; it is basically only allowed to compile each page sequentially.
There are reasons for this (although I personally disagree with them) - primarily, in order to detect and prevent circular references causing an infinite loop of sorts, as well as ensuring that all dependencies are properly built before the requiring page is compiled, they avoid a lot of "nasty CS issues".
I once started writing a massively-forked version of aspnet_compiler.exe last time I worked at a web company, but got tied up with "real work" and never finished it. Biggest problem is the ASPX pages: the MVC/Razor stuff you can parallelize the HELL out of, but the ASPX parse/compile engine is about 20 levels deep of internal and private classes/methods.

Compiler should generate second code-behind file for every .aspx page, check
During compilation, aspnet_compiler.exe will copy ALL of the web site files to the output directory, including css, js and images.
You'll get better compilation times using Web application project instead of Web site model.

I don't have any specific hot tips for this compiler, but when I have this sort of problem, I run ProcMon to see what the process is doing on the machine, and I run Wireshark to check that it isn't spending ages timing-out some network access to a long-forgotten machine which is referenced in some registry key or environment variable.

Just my 2 cents.
One of the things slowing down ASP.NET views precompilation significantly is the -fixednames command line option for aspnet_compiler.exe. Do not use it especially if you're on Razor/MVC.
When publishing the wep app from Visual Studio make sure you select "Do not merge", and do not select "create separate assembly" cause this is what causes the global lock and slows things down.
More info here https://msdn.microsoft.com/en-us/library/hh475319(v=vs.110).aspx

What artifacts to save for a nightly build?

Assume that I set up an automatic nightly build. What artifacts of the build should I save?
For example:
Input source code
output binaries
Also, how long should I save them, and where?
Do your answers change if I do Continuous Integration?

You shouldn't save anything for the sake of saving it. you should save it because you need it (i.e., QA uses nightly builds to test). At which point, "how long to save it" becomes however long QA wants them.
i wouldn't "save" source code so much as tag/label it. I don't know what source control you're using, but tagging is trivial (performance & disk space) for any quality source control system. Once your build is tagged, unless you need binaries, there really isn't any benefit to just having them around because you can simply re-compile when necessary from source.
Most CI tools let you tag on each successful build. This can become problematic for some systems as you can easily have 100+ tags a day. For such cases I recommend still running a nightly build and only tagging that.

Here are some artifacts/information that I'm used to keep at each build:
The tag name of the snapshot you are building (tag and do a clean checkout before you build)
The build scripts themselfs or their version number (if you treat them as a separate project with its own version control)
The output of the build script: logs and final product
A snapshot of your environment:
compiler version
build tool version
libraries and dll/libs versions
database version (client & server)
ide version
script interpreter version
OS version
source control version (client and server)
versions of other tools used in the process and everything else that might influence the content of your build products. I usually do this with a script that queries all this information and logs it to a text file that should be stored with the other build artifacts.
Ask yourself this question: "if something destroys entirely my build/development environment what information would I need to create a new one so I can redo my build #6547 and end up with the exact same result I got the first time?"
Your answer is what you should keep at each build and it will be a subset or superset of the things I already mentioned.
You can store everything in your SCM (I'd recommend a separate repository), but in this case your question on how long you should keep the items looses sense. Or you should store it to zipped folders or burn a cd/dvd with the build result and artifacts. Whatever you choose, have a backup copy.
You should store them as long as you might need them. How long, will depend on your development team pace and your release cycle.
And no, I don't think it changes if you do continous integration.

This isn't a direct answer to your question, but don't forget to version control the nightly build setup itself. When the project structure changes, you may have to change the build process, which will break older builds from that point on.

In addition to the binaries as everyone else has mentioned I would recomend setting up a symbol server and a source server and making sure you get the correct information out and into those. It will aid in debugging tremendously.

We save the binaries, stripped and unstripped (so we have the exactly same binary, once with and once without debug symbols). Further we build everything twice, once with debug output enabled and once without (again, stripped and unstripped, so every build result in 4 binaries). The build is stored to a directory according to SVN revision number. That way we can always retain the source from the SVN repository by simply checking out this very revision (that way the source is archived as well).

A surprising one I learned about recently: If you're in an environment that might be audited you'll want to save all the output of your build, the script output, the compiler output, etc.
That's the only way you can verify your compiler settings, build steps, etc.
Also, how long to save them for, and where to save them?
Save them until you know that build won't be going to production, iow as long as you have the compiled bits around.
One logical place to save them is your SCM system. Another option is to use a tool that will automatically save them for you, like AnthillPro and its ilk.

We're doing something close to "embedded" development here, and I can tell you what we save:
the SVN revision number and timestamp, as well as the machine it was built on and by whom (also burned into the build binaries)
a full build log, showing whether it was a full/incremental build, any interesting (STDERR) output the data baking tools produced, a list of files compiled and any compiler warnings (this compresses very well, being text)
the actual binaries (for anywhere from 1-8 build configurations)
files produced as a side effect of linking: a linker command file, address map, and a sort of "manifest" file indicating what was burned into the final binaries (CRC and size for each), as well as the debugging database (.pdb equivalent)
We also mail out the result of running some tools over the "side-effect" files to interested users. We don't actually archive these since we can reproduce them later, but these reports include:
total and delta of filesystem size, broken down by file type and/or directory
total and delta of code section sizes (.text, .data, .rodata, .bss, .sinit, etc)
When we have unit tests or functional tests (e.g. smoke tests) running, those results show up in the build log.
We've not thrown out anything yet -- given, our target builds usually end up at ~16 or 32 MiB per configuration, and they're fairly compressible.
We do keep uncompressed copies of the binaries around for 1 week for ease of access; after that we keep only the lightly compressed version. About once a month we have a script that extracts each .zip that the build process produces and 7-zips a whole month of build outputs together (which takes advantage of only having small differences per build).
An average day might have a dozen or two builds per project... The buildserver wakes up about every 5 minutes to check for relevant differences and builds. A full .7z on a large very active project for one month might be 7-10GiB, but it's certainly affordable.
For the most part, we've been able to diagnose everything this way. Occasionally there's a hiccup on the buildsystem and a file isn't actually a the revision it's supposed to be when a build happens, but there's usually enough evidence of this in the logs. Sometimes we have to dig out a tool that understands the debugging database format and feed it a few addresses to diagnose a crash (we have automatic stackdumps built into the product). But usually all the information needed is there.
We haven't had to crack the .7z archives yet, to mention. But we have the info there, and I have some interesting ideas on how to mine bits of useful data from it.

Save what can't be reproduced easily. I work on FPGAs where only the FPGA team have the tools and some cores (libraries) of the design are licensed to compile on only one machine. So we save the output bitstreams. But try to check them over one another rather than with a date/time/version stamp.

Save as in check in to source code control or just on disk? Save nothing to source code control. All derived files should be visible in the file system and available to developers. Don't checkin binaries, code generated from XML files, message digests etc. A separate packaging step will make these end products available. As you have the change number you can always reproduce the build if necessary assuming of course everything you need to do a build is completely in the tree and is available to all builds by syncing.

I would save your built binaries for exactly as long as they have a chance to go into production or be used by some other team (like a QA group). Once something has left production, what you do with it can vary a lot. For a lot of teams, they'll keep just their most recent prior build around (for rollback) and otherwise discard their builds.
Others have regulatory requirements to keep anything that went into production around for as long as seven years (banks). If you are a product company, I'd keep around any binary a customer might have installed in case a tech support guy wants to install the same version.