Why does my Qt application lock up when I use QProcess or popen? - qt

My Qt application is locking up with high cpu usage after running correctly for a few hours, and I'm trying to figure out why. This is on an embedded linux system.
The first thing I did was attach gdb and look at a stack backtrace. It shows the lockup occurring in ptmalloc_lock_all(), which is called by fork(). The application uses system() to play a video, and periodically uses QProcess to check whether a USB drive is mounted.
I don't intentionally use multiple threads in my application, but gdb shows 4 threads at the point where the freeze occurs. I've included the backtrace for all 4 threads:
(gdb) thread apply all backtrace
Thread 4 (Thread 995):
#0 0x4282dc50 in select () from /opt/filesys/fs/lib/libc.so.6
#1 0x4246a768 in qt_safe_select(int, fd_set*, fd_set*, fd_set*, timeval const*) ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#2 0x4246f2a8 in QEventDispatcherUNIXPrivate::doSelect(QFlags<QEventLoop::ProcessEventsFlag>, timeval*) ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#3 0x4246f6e0 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#4 0x42435898 in QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#5 0x42435bac in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#6 0x42317834 in QThread::exec() ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#7 0x4231aaf8 in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
Cannot access memory at address 0x0
#8 0x4231aaf8 in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
Cannot access memory at address 0x0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 3 (Thread 973):
#0 0x425c64bc in pthread_cond_wait##GLIBC_2.4 () from /opt/filesys/fs/lib/libpthread.so.0
#1 0x413f8688 in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtWebKit.so.4
Cannot access memory at address 0x0
#2 0x413f8688 in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtWebKit.so.4
Cannot access memory at address 0x0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 2 (Thread 949):
#0 0x4282dc50 in select () from /opt/filesys/fs/lib/libc.so.6
#1 0x4240c16c in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
Cannot access memory at address 0x3054
#2 0x4240c16c in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
Cannot access memory at address 0x3054
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 1 (Thread 720):
#0 0x427e1194 in ptmalloc_lock_all () from /opt/filesys/fs/lib/libc.so.6
#1 0x428041c4 in fork () from /opt/filesys/fs/lib/libc.so.6
#2 0x4240fce8 in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#3 0x4240fce8 in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I'm not very experienced with multithreaded programming, but from reading online it sounds like it can be dangerous to use fork() in a multithreaded application. I'm not sure what to do as an alternative though.
Any input would be greatly appreciated!
Marlon

Related

Should LWP still be running after Thread.join() returns?

I have a hung python (CPython 3.6) process. From a gdb backtrace, a forked process waits indefinitely for a mutex held by a LWP/Thread which no longer exists. After some debugging, I can see that after join() returns on the blocking Thread, its LWP is still active, with its stack unwinding with a backtrace like this:
#0 dl_open_worker (a=a#entry=0x7fffefb0eb90) at dl-open.c:515
#1 0x00007ffff7b4b2df in __GI__dl_catch_exception (exception=0x7fffefb0eb70, operate=0x7ffff7de9dc0 <dl_open_worker>, args=0x7fffefb0eb90) at dl-error-skeleton.c:196
#2 0x00007ffff7de97ca in _dl_open (file=0x7ffff77d9bc0 "libgcc_s.so.1", mode=-2147483646, caller_dlopen=0x7ffff77d7deb <pthread_cancel_init+43>, nsid=<optimised out>, argc=9, argv=<optimised out>, env=0x7fffffffe318) at dl-open.c:605
#3 0x00007ffff7b4a3ad in do_dlopen (ptr=ptr#entry=0x7fffefb0edc0) at dl-libc.c:96
#4 0x00007ffff7b4b2df in __GI__dl_catch_exception (exception=exception#entry=0x7fffefb0ed60, operate=operate#entry=0x7ffff7b4a370 <do_dlopen>, args=args#entry=0x7fffefb0edc0) at dl-error-skeleton.c:196
#5 0x00007ffff7b4b36f in __GI__dl_catch_error (objname=objname#entry=0x7fffefb0edb0, errstring=errstring#entry=0x7fffefb0edb8, mallocedp=mallocedp#entry=0x7fffefb0edaf, operate=operate#entry=0x7ffff7b4a370 <do_dlopen>, args=args#entry=0x7fffefb0edc0) at dl-error-skeleton.c:215
#6 0x00007ffff7b4a4d9 in dlerror_run (args=0x7fffefb0edc0, operate=0x7ffff7b4a370 <do_dlopen>) at dl-libc.c:46
#7 __GI___libc_dlopen_mode (name=name#entry=0x7ffff77d9bc0 "libgcc_s.so.1", mode=mode#entry=-2147483646) at dl-libc.c:195
#8 0x00007ffff77d7deb in pthread_cancel_init () at ../sysdeps/nptl/unwind-forcedunwind.c:52
#9 0x00007ffff77d7fd4 in _Unwind_ForcedUnwind (exc=0x7fffefb0fd70, stop=stop#entry=0x7ffff77d5d80 <unwind_stop>, stop_argument=0x7fffefb0ef10) at ../sysdeps/nptl/unwind-forcedunwind.c:126
#10 0x00007ffff77d5f10 in __GI___pthread_unwind (buf=<optimised out>) at unwind.c:121
#11 0x00007ffff77cdae5 in __do_cancel () at pthreadP.h:297
#12 __pthread_exit (value=<optimised out>) at pthread_exit.c:28
#13 0x00007ffff7b14504 in __pthread_exit (retval=<optimised out>) at forward.c:173
#14 0x00000000006383c5 in PyThread_exit_thread () at ../Python/thread_pthread.h:300
#15 0x00000000005e5f0f in t_bootstrap () at ../Modules/_threadmodule.c:1030
#16 0x0000000000638084 in pythread_wrapper (arg=<optimised out>) at ../Python/thread_pthread.h:205
#17 0x00007ffff77cc6db in start_thread (arg=0x7fffefb0f700) at pthread_create.c:463
#18 0x00007ffff7b0588f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Should this LWP be there at all or should it terminate before join()? If the latter, am I looking at a CPython bug?
From inspection of the source code, it seems that this is expected. From https://github.com/python/cpython/blob/master/Modules/_threadmodule.c#L1028, when a thread has finished its work, these two calls are made:
PyThreadState_DeleteCurrent();
PyThread_exit_thread();
The first ends up releasing the lock which allows join() to return; the second results in the LWP unwinding its stack.
Debugging a trivial test python script which starts a Thread, join()s it and then sends a signal to itself (which provides a convenient breakpoint which gdb can understand) shows a similar backtrace to the one in the original post, which further supports this conclusion.

Qt 4.8.4 application crashes on Windows Vista while quitting

I have a Qt application which intermittently crashes while exiting on a Windows Vista machine (it works fine on Mac & Windows XP, need to try more on other Windows versions). I am doing a QApplication cleanup on exit and getting the following stack trace on crash. Please let me know if anyone gets any clue what I might be doing wrong.
Program received signal SIGSEGV, Segmentation fault.
[Switching to thread 876.0x15f0]
0x69dfb912 in QEventDispatcherWin32Private::unregisterTimer (this=0x14cdf9b8,
t=0x1a1f6868, closingDown=true) at kernel/qeventdispatcher_win.cpp:620
(gdb) bt
#0 0x69dfb912 in QEventDispatcherWin32Private::unregisterTimer (
this=0x14cdf9b8, t=0x1a1f6868, closingDown=true)
at kernel/qeventdispatcher_win.cpp:620
#1 0x69dfd479 in QEventDispatcherWin32::closingDown (this=0x14cdf998)
at kernel/qeventdispatcher_win.cpp:1115
#2 0x69cd7ff2 in QThreadPrivate::finish (arg=0x14cddb48, lockAnyway=true)
at thread/qthread_win.cpp:376
#3 0x69cd7ec8 in QThreadPrivate::start (arg=0x14cddb48)
at thread/qthread_win.cpp:348
#4 0x766f2599 in wcstombs () from C:\Windows\system32\msvcrt.dll
#5 0x766f26b3 in msvcrt!_beginthreadex () from C:\Windows\system32\msvcrt.dll
#6 0x7624d309 in KERNEL32!AcquireSRWLockExclusive ()
from C:\Windows\system32\kernel32.dll
#7 0x774b16c3 in ntdll!RtlInitializeNtUserPfn ()
from C:\Windows\system32\ntdll.dll
#8 0x774b1696 in ntdll!RtlInitializeNtUserPfn ()
from C:\Windows\system32\ntdll.dll
#9 0x00000000 in ?? ()

XCode 4.2.1 + Enable Guard Malloc -> immediate crash?

Anybody know how to solve this? With Xcode 4.2.1, turning on Enable Guard Malloc and running on iPhone Simulator 5, the app immediately crashes, and this is the stack trace:
#0 0x00000000 in <????> ()
#1 0x91594ef3 in mig_get_reply_port ()
#2 0x9158e70c in mach_ports_lookup ()
#3 0x031f0124 in _xpc_domain_init_local ()
#4 0x031edeb1 in _libxpc_initializer ()
#5 0x8fe5d15b in __dyld__ZN16ImageLoaderMachO18doModInitFunctionsERKN11ImageLoader11LinkContextE ()
#6 0x8fe5ccc0 in __dyld__ZN16ImageLoaderMachO16doInitializationERKN11ImageLoader11LinkContextE ()
#7 0x8fe5a220 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEjRNS_21InitializerTimingListE ()
#8 0x8fe5a1b6 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEjRNS_21InitializerTimingListE ()
#9 0x8fe5a1b6 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEjRNS_21InitializerTimingListE ()
#10 0x8fe5a1b6 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEjRNS_21InitializerTimingListE ()
#11 0x8fe5a1b6 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEjRNS_21InitializerTimingListE ()
#12 0x8fe5b1c0 in __dyld__ZN11ImageLoader15runInitializersERKNS_11LinkContextERNS_21InitializerTimingListE ()
#13 0x8fe4f626 in __dyld__ZN4dyld24initializeMainExecutableEv ()
#14 0x8fe53ef2 in __dyld__ZN4dyld5_mainEPK12macho_headermiPPKcS5_S5_ ()
#15 0x8fe4d2ef in __dyld__ZN13dyldbootstrap5startEPK12macho_headeriPPKclS2_ ()
#16 0x8fe4d063 in __dyld__dyld_start ()
You san see this answer: Application crashes on simulator 5.0 before reaching main.m
However, the linked solution didn't work for me. I had exactly the same diagnostic and absence of error message, but I found no "weak"-related issues in my project.pbxproj
However, I found that the cause of my problem was a deadlock in an +(void)initialize method.
More precisely, in this method I was calling dispatch_sync(dispatch_get_main_queue(), ^{[some block code]}). Changing this to a dispatch_async (note the "a") solved my problem.
The way I discovered the issue was accidental. While nothing seemed to happen, the "thread" navigator of Xcode was telling me that the app itself wasn't crashed. And I accidentaly clicked on "pause" in the debugger. And suddenly it stopped exactly where the deadlock was.
Enjoy.

Xcode 4 stack trace debugging issue

For a while now (I can't remember exactly which version) Xcode 4 has not been working properly. In that whenever my code crashes the debugger just shows me the main() function and there is a stack trace like this:
#0 0x9018b9c6 in __pthread_kill ()
#1 0x90105f78 in pthread_kill ()
#2 0x900f6bdd in abort ()
#3 0x03c93e78 in dyld_stub__Unwind_DeleteException ()
#4 0x03c9189e in default_terminate() ()
#5 0x0154df4b in _objc_terminate ()
#6 0x03c918de in safe_handler_caller(void (*)()) ()
#7 0x03c91946 in __cxa_bad_typeid ()
#8 0x03c92b3e in __cxa_current_exception_type ()
#9 0x0154de49 in objc_exception_rethrow ()
#10 0x012f2e10 in CFRunLoopRunSpecific ()
#11 0x012f2ccb in CFRunLoopRunInMode ()
#12 0x012a5879 in GSEventRunModal ()
#13 0x012a593e in GSEventRun ()
#14 0x00013a9b in UIApplicationMain ()
#15 0x00002a02 in main at /Users/dan/Dev/Container/Container/main.m:16
In the console I get some more meaningful information, in this case it tells me:
2011-11-09 10:39:53.886 Container[27273:f803] *** Terminating app due to uncaught exception 'NSRangeException', reason: '*** -[__NSArrayI objectAtIndex:]: index 1 beyond bounds for empty array'
In this example I know what the problem is because I made the error (I'm trying to access an object in an empty NSArray) on purpose to try and get to the bottom of this issue. However, I can't figure out what is going wrong. I've lived with it up until now as lot of the bugs I've had I've been familiar with and knew where to look anyway but still, it's becoming a real pain.
Can anyone please shed some light on this issue?
You probably want to stop at the point in code when the exception actually occurs, not when it has been bubbled up to your main method. To do so, set an exception breakpoint as explained here: Exception Breakpoint in Xcode

xlib/ xcb deadlock or block

I’ve a program developed using xlib and cairo. Just for the reference I do mix calls between cairo and xlib, although I’m not sure If that might be the cause of the error.
I get a deadlock or a block in some situations.
I’ve three threads that work with xlib. One is the main UI thread which makes calls to both xlib and cairo, another uses it just to send a XClientMessage and third makes some xlib calls like XCopyArea and at the end send an XClientMessage (those are for some animations).
I’ve called InitThreads in the beginning of the program. I’ve also guarded all xlib calls with XLockDisplay (cairo calls are also guarded with XLockDisplay).
I’m using ubuntu 10.10.
The stack traces are:
(gdb) thread 1
0 in __kernel_vsyscall ()
1 in poll () from /lib/tls/i686/cmov/libc.so.6
2 in ?? () from /usr/lib/libxcb.so.1
3 in ?? () from /usr/lib/libxcb.so.1
4 in xcb_writev () from /usr/lib/libxcb.so.1
5 in _XSend () from /usr/lib/libX11.so.6
6 in _XEventsQueued () from /usr/lib/libX11.so.6
7 in XPending () from /usr/lib/libX11.so.6
(gdb) thread 6
0 in __kernel_vsyscall ()
1 in __lll_lock_wait () from
/lib/tls/i686/cmov/libpthread.so.0
2 in _L_lock_752 () from /lib/tls/i686/cmov/libpthread.so.0
3 in pthread_mutex_lock () from /lib/tls/i686/cmov/libpthread.so.0
4 in ?? () from /usr/lib/libX11.so.6
5 in XLockDisplay () from /usr/lib/libX11.so.6
(gdb) thread 7
0 in __kernel_vsyscall ()
1 in __lll_lock_wait () from /lib/tls/i686/cmov/libpthread.so.0
2 in _L_lock_752 () from /lib/tls/i686/cmov/libpthread.so.0
3 in pthread_mutex_lock () from /lib/tls/i686/cmov/libpthread.so.0
4 in ?? () from /usr/lib/libX11.so.6
5 in XLockDisplay () from /usr/lib/libX11.so.6
Where thread 1 is the main ui thread, currently calling XPending (it has already called XLockDisplay) in an event loop, thead 7 is the thread that just sends XClientMessage and thread 6 is the thread that has made some calls to XCopyArea and is now about to make a call to XSendMessage (it is waiting along with thread 7 for thread 1 to finish). But thread 1 never seem to return from poll.
I’m not sure it is relevant (I’m by no means an expert on linux or libc), but I’ve another thread waiting in poll (it’s a thread for TCP/IP network communication)
(gdb) thread 2
0 in __kernel_vsyscall ()
1 in poll () from /lib/tls/i686/cmov/libc.so.6
Has anyone experience similar deadlock/block? Can this be a bug in xcb and is it worth trying to compile xlib without xcb?
Thanks
I just was running into an issue that had 1 in __lll_lock_wait () from
as a troublemaker as well. That was in an I/O portion of my code, perhaps your problem is there?

Resources