Intel PIN: print backtrace when segfault happens in tool - intel-pin

I'm developing a tool for Intel PIN. Somewhere in the runtime, it gives me the below error. I want to know if there is a way to tell PIN to print the backtrace or let me handle the segfault in the tool itself.
I'm running my tool with MPI and it crashes when I insert values into an unordered map.
C: Tool (or Pin) caused signal 11 at PC 0x2b09594533cb
mpirun -np 44 pin-3.7-97619-g0d0c92f4f-gcc-linux/pin -follow_execv -t pin-3.7-97619-g0d0c92f4f-gcc-linux/source/tools/Simp ... -- program

You can use the following API:
PIN_AddInternalExceptionHandler()
from where you get access to an EXCEPTION_INFO structure which is supposed to be manipulated with the exception API.
Otherwise, you can also debug your tool from within a debugger, by launching your tool with the -pause_tool 20 option. Then you have 20 seconds to attach your debugger to the process. Once attached, the debugger stops (at least with Visual Studio) and lets you set the breakpoints you need in your tool's code.
This is not that easy to debug though, as the whole system switch from pintool code, to pin, to target application constantly. Hence there is not a continuous process of steps inside your pintool code that you can follow, as you can expect when debugging "classic single threaded applications".

Related

How does ARM find my relocated vector table?

I'm using an NXP Kinetis K64 ARM Cortex M4 MCU. I successfully altered the linker configuration file to move my vector table to address 0x8000 (instead of the 0x0000 default). When I tell the CodeWarrior 10.6 debugger to break at the start of the code, it stops at the top of the boot.S file as expected. But it dawned on me, HOW did the MCU/debugger find the code since the flash memory is empty (0xFF) from address 0x0000 to 0x7FFF and the VTOR register shows as 0x0?!
I looked through the datasheets of both the ARM M4 core and the NXP K64, but they don't answer this scenario.
It is probably due to the settings of your Debug Configurations in CodeWarrior. In Debugger tab, if Initialized program counter at is ticked as shown below, the debugger will give Program Counter, at reset, the address of "the top of the boot.S file", which is the Program entry point. The normal sequence of finding the vector table is skipped.
Your program will not run without the debugger.
Further details about the CodeWarrior Debugger can be found here

Collecting an MPI Trace

How can I collect an MPI communication trace on Supercomputers?
I need text files with details of each message (say sender, receiver, size, etc.) that I can parse.
I was using following command for Intel MPI and do not see any text files.
mpirun -trace -n 4 -trace-pt2pt -trace-collectives ./myApp
I am not familiar with Intel MPI's integrated solution.
There is a number of tools that provide MPI tracing.
Performance focussed:
Score-P (Fileformat OTF2)
TAU
Extrae
Correctness checking:
MUST
I recommend to not roll your own solution, because it's not straight forward to match receives to sends and you might run into timing issues because timers are not synchronized across nodes.
You could e.g. trace a run using Score-P, and then use the otf2-print command on the trace to get the text output you wanted. Or you can use the OTF2 reader library and develop a tool on top of it. Here is a short tutorial on how to run Score-P, starting at slide 17

Process stop getting network data

We have a process (written in c++ /managed), which receives network data via tcpip.
After running the process for a while while tracking network load, it seems that network get into freeze state and the process does not getting data, there are other processes in the system that using networking (same nic) which operates normally.
the process gets out of this frozen situation by itself after several minutes.
Any idea what is happening?
Any counter i can track to see if my process reach some limitations ?
It is going to be very difficult to answer specifically,
-- without knowing what exactly is your process/application about,
-- whether it is a network chat application, or a file server/client, or ......
-- without other details about your process how it is implemented, what libraries it uses, if relevant to problem.
Also you haven't mentioned what OS and environment you are running this process under,
there is very little anyone can help . It could be anything, a busy wait loopl in your code, locking problems if its a multi-threaded code,....
Nonetheless , here are some options to check:
If its linux try below commands to debug and monitor the behaviour of the process and see what could be problem-
top
Check top to see ow much resources(CPU, memory) your process is using and if there is anything abnormally high values in CPU usage for it.
pstack
This should stack frames of the process executing at time of the problem.
netstat
Run this with necessary options (tcp/udp) to check what is the stae of the network sockets opened by your process
gcore -s -c
This forces your process to core when the mentioned problem happens, and then analyze that core file using gdb
gdb
and then use command where at gdb prompt to get full back trace of the process (which functions it was executing last and previous function calls.

which signal does gdb send when attaching to a process?

Which signal does gdb send when attaching to a process? Does this work the same for different UNIXes. E.g. Linux and Mac OS X?
So far I only found out, that SIGTRAP is used to implement breakpoints. Is it used for attaching aswell?
AFAIK it does not need any signals to attach. It just suspends the "inferior" by calling ptrace. It also reads debugged process memory and registers using this calls and it can request instruction single stepping (provided it's implemented on that port of linux), etc.
Software breakpoints are implemented by placing at right location instruction that triggers "trap" or something similar when reached, but debugged process can run full speed until then.
Also (next to reading man ptrace, as already mentioned) see ptrace explanation on wikipedia.

LabVIEW blocking Qt signals?

I have a LabVIEW 8.6 program that is using a DLL written in Qt; the DLL listens to a TCP port for incoming messages and updates some internal data. My LabVIEW program calls into the DLL occasionally to read the internal data. The DLL works perfectly (i.e., receives data from the TCP port) with another Qt program. However, it does not work at all with my LabVIEW program.
I've attached a debugger to the DLL and can see calls from LabVIEW going into it -- my function for getting the internal data is being called and I can step through it. The code that gets the data from the TCP is never called though; it looks like the signal for incoming data on the TCP port is never triggered.
I know this sounds like a Qt issue but the DLL works perfectly with another Qt program. Unfortunately, it fails miserably with LabVIEW.
One theory:
The event loop is not running when LabVIEW calls the DLL
In the Qt DLL's run() function, I call socket->waitForDisconnected(). Perhaps the DLL is not processing incoming events because the event loop is not running? If I call exec() to start the event loop, LabVIEW crashes (LabVIEW 8.6 Development System has encountered a problem and needs to close."):
AppName: labview.exe AppVer: 8.6.0.4001 ModName: qtcored4.dll
ModVer: 4.5.1.0 Offset: 001af21a
Perhaps when I call the DLL from another Qt program, that program's event loop is allowing for the TCP signal to be seen by the DLL. Unfortunately, kicking off the event loop in the DLL takes down LabVIEW.
Any thoughts on how to keep signals running in the DLL when LabVIEW is the calling program?
EDIT Debug trace of the exec() call:
QThread::exec() -> eventLoop.exec() -> if (qApp->thread() == thread())
in the call to
QObject::thread() {
return d_func()->threadData->thread;
}
The macro Q_DECLARE_PRIVATE(QObject), the second call, triggers the crash.
EDIT 17 Aug 2009: Status update
After two days of trying various ways to get this to work I decided to implement a TCP listener directly in LabVIEW. My LabVIEW application sends data out via the DLL and receives data in via TCP. All is working well.
This question was cross-posted on http://forums.ni.com/ni/board/message?board.id=170&thread.id=431779
You should change the library call to 'run in any thread' that way the UI thread can still run the event loop.
Can you debug through exec() to see where it crashes LabVIEW?
You can also set debugging the maximum in LabVIEW in the configuration page for the Call Library Node.
LabVIEW is finicky with DLLs. It may be easier to run the DLL as a service (write a service that runs the event loop), and then have LabVIEW call a DLL that retrieves the data from the service.
Old NI help note
Just a shot in the dark...could the Qt data be clobbering some of the LV memory space immediately after the exec loop is started?
You probably don't have a QApplication object created when you are trying to call exec() in the QThread. This might be causing your crash. For the main problem, however, I would say that it is very likely you aren't getting any activity in the DLL due to the event loop not executing.

Resources