I'm getting OutOfMemoryExceptions, but my trace file is much smaller than my available memory - .net-traceprocessing

As the title says, I've got tons of free memory, but I keep getting OutOfMemoryExceptions when processing traces and calling properties on Data Sources. Why is this happening?

The ETL file format is designed to be very space-efficient, and also supports optional compression. Due to these factors taking the data from an .etl file and transforming it into our more useful structures can often require significantly more memory than the original size of the file. However there are two steps that can be taken to make OutOfMemoryExceptions less likely:
Don't use data sources you don't need. Even if none of the properties on a data source are called by your code, simply turning it on by calling its Use method will result in the data source processing events and preparing data for consumption.
Ensure your program is running as a 64-bit process. The default Visual Studio C# project settings are to compile your program targeting AnyCPU, but to prefer running it as a 32-bit process. Unchecking the "Prefer 32-bit" option in your project's Build properties or switching your project's build configuration to x64 will cause your program to run as a 64-bit process.

Related

Profiling GPRBuild

I have a large GPR based project that can take over 30 minutes to compile.
Having analyzed the build process I noticed many obvious inefficiencies (multiple calls to gprbuild rather than aggregates, excessive use of alternative files rather than configurations, etc). I am wondering if there is some means to 'profile' the build process to see what takes so long.
In particular it takes about 5 minutes to recompile when even a single file changes and there is an error in it. In theory it should be pretty quick to realize that that file has to be recompiled (its the only one that does) and start the compilation process, rapidly discovering the error.
From the verbose output it looks like it takes quite a while just parsing the massive web of gpr files used to define the build, but I would like to know where it spends most of its time.
Thus my question is: Is it possible to profile a build done by gprbuild? If so, how?
From low to high complexity:
Ask gprbuild to report more details about what it is doing with the flag -vh.
Run gprbuild through strace.
Rebuild gprbuild with the required flags to profile it using gprof (but be aware that gprof doesn't always tell the truth).

OutofMemory error with ibm jdk 1.7

I am using IBM jdk 1.7(to support TLS cyphers) for an struts based application deployed with embedded tomcat.
We are running with memory leaks(OOM) that generated almost 30 gigs of dumps.This has become a rotine event.
We have tried increasing the heap mem by including
wrapper.java.additional.1="-XX:MaxPermSize=256m -Xss2048k" in the wrapper.conf.
But this didnt help much.
Try using Memory Analyzer, you can follow the instructions here to download and install it:
https://www.ibm.com/developerworks/java/jdk/tools/memoryanalyzer/
It should provide an overview of your heap usage.
I'd recommend starting with the dominator tree view to see which objects are responsible for keeping data alive on the heap. You can also run various reports which analyse the heap for you.
You should have core files (.dmp) and heap dumps (.phd), the core files will be large but may be faster to access and will also contain all the values for basic types in objects and strings. The phd files just contain object sizes and the connections between them. It may be easier to relate what you are seeing back to your code if you start with the core file.

WinDBG - Analyse dump file on local PC

I have created a memory dump of an ASP.NET process on a server using the following command: .dump /ma mydump.dmp. I am trying to identify a memory leak.
I want to look at the dump file in more detail on my local development PC. I read somewhere that it is advisable to debug on the same machine as you create the dump file. However, I have also read that some developers do analyse the dump file on their local development PC's. What is the best approach?
I notice that when I create a dump file using the command above the W3WP process memory increases by about 1.5 times. Why this this? I suppose this should be avoided on a live server.
Analyzing on the same machine can save you from SOS loading issues thereafter. Unless you are familiar with WinDbg and SOS, you will find it confusing and frustrating then.
If you have to use another machine for analysis, make sure you read carefully this blog post, http://blogs.msdn.com/b/dougste/archive/2009/02/18/failed-to-load-data-access-dll-0x80004005-or-what-is-mscordacwks-dll.aspx as it shows you how to copy the necessary files from the source machine (where the dump is captured) to the target machine (the one you launch WinDbg).
For your second question, as you use WinDbg to attach to the process directly, and use .dump command to capture the dump, the target process unfortunately is modified. Not easy to explain in a few words. The recommended way is to use ADPlus.exe or Debug Diag. Even procdump from SysInternals is better. Those tools are designed for dump capture and they have minimal impact on the target processes.
For memory leak from unmanaged libraries, you should use memory leak rule of Debug Diag. for managed memory leak, you can simply capture hang dumps when memory usage is high.
I am no expert on WinDBG but I once had to analyse a dump file on my ASP.NET site to find a StackOverflowException.
While I got a dump file of my live site (I had no choice since that was what was failing), originally I tried to analyse that dump file on my local dev PC but ran into problems when trying to load the CLR data from it. The reason being that the exact version of the .NET framework differed between my dev PC and the server - both were .NET 4 but I imagine my dev PC had some cumulative updates installed that the server did not. The SOS module simply refused to load because of this discrepancy. I actually wrote a blog post about my findings.
So to answer part of your question it may be that you have no choice but to run WinDBG from your server, at least you can be sure that the dump file will match your environment.
It is not necessary to debug on the actual machine unless the problem is difficult to manifest on your development machine.
So long as you have the pdbs with the private symbols then the symbols should be resolved and call stacks correctly displayed and the correct version of .NET installed.
In terms of looking at memory leaks you should enable Gflags user stack trace and take memory dumps at 2 intervals so you can compare the memory usage before and after the action that provokes the memory leak, remember to disable gflags afterwards!
You could also run DebugDiag on the server which has automated memory pressure analysis scripts that will work with .Net leaks.

Calling functions in Qt from third-party DLL works in debug mode, crashes in release

I use a third-party DLL (FTD2xx) to communicate with an external device. Using Qt4, in debug mode everything works fine, but the release crashes silently after successfully completing a called function. It seems to crash at return, but if I write something to the console (with qDebug) at the end of the function, sometimes it does not crash there, but a few, or few dozen lines later.
I suspect a not properly cleaned stack, what the debug build can survive, but the release chokes on it. Did someone encounter a similar problem? The DLL itself cannot be changed, as the source is not available.
It seems the reduction of the optimization level was the only way around. The DLL itself might have problems, as a program which does nothing but calls a single function from that DLL crashes the same way if optimization is turned on.
Fortunately, the size and speed lost by the change in optimization level is negligible.
Edit: for anyone with similar problems on Qt 5.0 or higher: If you change the optimization level (for example, to QMAKE_CXXFLAGS_RELEASE = -O0), it's usually not enough to just rebuild the application. A full "clean all" is required.
Be warned - the EPANET library is not thread safe, it contains a lot of global variables.
Are you calling two methods of that library from different threads?

What artifacts to save for a nightly build?

Assume that I set up an automatic nightly build. What artifacts of the build should I save?
For example:
Input source code
output binaries
Also, how long should I save them, and where?
Do your answers change if I do Continuous Integration?
You shouldn't save anything for the sake of saving it. you should save it because you need it (i.e., QA uses nightly builds to test). At which point, "how long to save it" becomes however long QA wants them.
i wouldn't "save" source code so much as tag/label it. I don't know what source control you're using, but tagging is trivial (performance & disk space) for any quality source control system. Once your build is tagged, unless you need binaries, there really isn't any benefit to just having them around because you can simply re-compile when necessary from source.
Most CI tools let you tag on each successful build. This can become problematic for some systems as you can easily have 100+ tags a day. For such cases I recommend still running a nightly build and only tagging that.
Here are some artifacts/information that I'm used to keep at each build:
The tag name of the snapshot you are building (tag and do a clean checkout before you build)
The build scripts themselfs or their version number (if you treat them as a separate project with its own version control)
The output of the build script: logs and final product
A snapshot of your environment:
compiler version
build tool version
libraries and dll/libs versions
database version (client & server)
ide version
script interpreter version
OS version
source control version (client and server)
versions of other tools used in the process and everything else that might influence the content of your build products. I usually do this with a script that queries all this information and logs it to a text file that should be stored with the other build artifacts.
Ask yourself this question: "if something destroys entirely my build/development environment what information would I need to create a new one so I can redo my build #6547 and end up with the exact same result I got the first time?"
Your answer is what you should keep at each build and it will be a subset or superset of the things I already mentioned.
You can store everything in your SCM (I'd recommend a separate repository), but in this case your question on how long you should keep the items looses sense. Or you should store it to zipped folders or burn a cd/dvd with the build result and artifacts. Whatever you choose, have a backup copy.
You should store them as long as you might need them. How long, will depend on your development team pace and your release cycle.
And no, I don't think it changes if you do continous integration.
This isn't a direct answer to your question, but don't forget to version control the nightly build setup itself. When the project structure changes, you may have to change the build process, which will break older builds from that point on.
In addition to the binaries as everyone else has mentioned I would recomend setting up a symbol server and a source server and making sure you get the correct information out and into those. It will aid in debugging tremendously.
We save the binaries, stripped and unstripped (so we have the exactly same binary, once with and once without debug symbols). Further we build everything twice, once with debug output enabled and once without (again, stripped and unstripped, so every build result in 4 binaries). The build is stored to a directory according to SVN revision number. That way we can always retain the source from the SVN repository by simply checking out this very revision (that way the source is archived as well).
A surprising one I learned about recently: If you're in an environment that might be audited you'll want to save all the output of your build, the script output, the compiler output, etc.
That's the only way you can verify your compiler settings, build steps, etc.
Also, how long to save them for, and where to save them?
Save them until you know that build won't be going to production, iow as long as you have the compiled bits around.
One logical place to save them is your SCM system. Another option is to use a tool that will automatically save them for you, like AnthillPro and its ilk.
We're doing something close to "embedded" development here, and I can tell you what we save:
the SVN revision number and timestamp, as well as the machine it was built on and by whom (also burned into the build binaries)
a full build log, showing whether it was a full/incremental build, any interesting (STDERR) output the data baking tools produced, a list of files compiled and any compiler warnings (this compresses very well, being text)
the actual binaries (for anywhere from 1-8 build configurations)
files produced as a side effect of linking: a linker command file, address map, and a sort of "manifest" file indicating what was burned into the final binaries (CRC and size for each), as well as the debugging database (.pdb equivalent)
We also mail out the result of running some tools over the "side-effect" files to interested users. We don't actually archive these since we can reproduce them later, but these reports include:
total and delta of filesystem size, broken down by file type and/or directory
total and delta of code section sizes (.text, .data, .rodata, .bss, .sinit, etc)
When we have unit tests or functional tests (e.g. smoke tests) running, those results show up in the build log.
We've not thrown out anything yet -- given, our target builds usually end up at ~16 or 32 MiB per configuration, and they're fairly compressible.
We do keep uncompressed copies of the binaries around for 1 week for ease of access; after that we keep only the lightly compressed version. About once a month we have a script that extracts each .zip that the build process produces and 7-zips a whole month of build outputs together (which takes advantage of only having small differences per build).
An average day might have a dozen or two builds per project... The buildserver wakes up about every 5 minutes to check for relevant differences and builds. A full .7z on a large very active project for one month might be 7-10GiB, but it's certainly affordable.
For the most part, we've been able to diagnose everything this way. Occasionally there's a hiccup on the buildsystem and a file isn't actually a the revision it's supposed to be when a build happens, but there's usually enough evidence of this in the logs. Sometimes we have to dig out a tool that understands the debugging database format and feed it a few addresses to diagnose a crash (we have automatic stackdumps built into the product). But usually all the information needed is there.
We haven't had to crack the .7z archives yet, to mention. But we have the info there, and I have some interesting ideas on how to mine bits of useful data from it.
Save what can't be reproduced easily. I work on FPGAs where only the FPGA team have the tools and some cores (libraries) of the design are licensed to compile on only one machine. So we save the output bitstreams. But try to check them over one another rather than with a date/time/version stamp.
Save as in check in to source code control or just on disk? Save nothing to source code control. All derived files should be visible in the file system and available to developers. Don't checkin binaries, code generated from XML files, message digests etc. A separate packaging step will make these end products available. As you have the change number you can always reproduce the build if necessary assuming of course everything you need to do a build is completely in the tree and is available to all builds by syncing.
I would save your built binaries for exactly as long as they have a chance to go into production or be used by some other team (like a QA group). Once something has left production, what you do with it can vary a lot. For a lot of teams, they'll keep just their most recent prior build around (for rollback) and otherwise discard their builds.
Others have regulatory requirements to keep anything that went into production around for as long as seven years (banks). If you are a product company, I'd keep around any binary a customer might have installed in case a tech support guy wants to install the same version.

Resources