Should LWP still be running after Thread.join() returns? - cpython

I have a hung python (CPython 3.6) process. From a gdb backtrace, a forked process waits indefinitely for a mutex held by a LWP/Thread which no longer exists. After some debugging, I can see that after join() returns on the blocking Thread, its LWP is still active, with its stack unwinding with a backtrace like this:
#0 dl_open_worker (a=a#entry=0x7fffefb0eb90) at dl-open.c:515
#1 0x00007ffff7b4b2df in __GI__dl_catch_exception (exception=0x7fffefb0eb70, operate=0x7ffff7de9dc0 <dl_open_worker>, args=0x7fffefb0eb90) at dl-error-skeleton.c:196
#2 0x00007ffff7de97ca in _dl_open (file=0x7ffff77d9bc0 "libgcc_s.so.1", mode=-2147483646, caller_dlopen=0x7ffff77d7deb <pthread_cancel_init+43>, nsid=<optimised out>, argc=9, argv=<optimised out>, env=0x7fffffffe318) at dl-open.c:605
#3 0x00007ffff7b4a3ad in do_dlopen (ptr=ptr#entry=0x7fffefb0edc0) at dl-libc.c:96
#4 0x00007ffff7b4b2df in __GI__dl_catch_exception (exception=exception#entry=0x7fffefb0ed60, operate=operate#entry=0x7ffff7b4a370 <do_dlopen>, args=args#entry=0x7fffefb0edc0) at dl-error-skeleton.c:196
#5 0x00007ffff7b4b36f in __GI__dl_catch_error (objname=objname#entry=0x7fffefb0edb0, errstring=errstring#entry=0x7fffefb0edb8, mallocedp=mallocedp#entry=0x7fffefb0edaf, operate=operate#entry=0x7ffff7b4a370 <do_dlopen>, args=args#entry=0x7fffefb0edc0) at dl-error-skeleton.c:215
#6 0x00007ffff7b4a4d9 in dlerror_run (args=0x7fffefb0edc0, operate=0x7ffff7b4a370 <do_dlopen>) at dl-libc.c:46
#7 __GI___libc_dlopen_mode (name=name#entry=0x7ffff77d9bc0 "libgcc_s.so.1", mode=mode#entry=-2147483646) at dl-libc.c:195
#8 0x00007ffff77d7deb in pthread_cancel_init () at ../sysdeps/nptl/unwind-forcedunwind.c:52
#9 0x00007ffff77d7fd4 in _Unwind_ForcedUnwind (exc=0x7fffefb0fd70, stop=stop#entry=0x7ffff77d5d80 <unwind_stop>, stop_argument=0x7fffefb0ef10) at ../sysdeps/nptl/unwind-forcedunwind.c:126
#10 0x00007ffff77d5f10 in __GI___pthread_unwind (buf=<optimised out>) at unwind.c:121
#11 0x00007ffff77cdae5 in __do_cancel () at pthreadP.h:297
#12 __pthread_exit (value=<optimised out>) at pthread_exit.c:28
#13 0x00007ffff7b14504 in __pthread_exit (retval=<optimised out>) at forward.c:173
#14 0x00000000006383c5 in PyThread_exit_thread () at ../Python/thread_pthread.h:300
#15 0x00000000005e5f0f in t_bootstrap () at ../Modules/_threadmodule.c:1030
#16 0x0000000000638084 in pythread_wrapper (arg=<optimised out>) at ../Python/thread_pthread.h:205
#17 0x00007ffff77cc6db in start_thread (arg=0x7fffefb0f700) at pthread_create.c:463
#18 0x00007ffff7b0588f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Should this LWP be there at all or should it terminate before join()? If the latter, am I looking at a CPython bug?

From inspection of the source code, it seems that this is expected. From https://github.com/python/cpython/blob/master/Modules/_threadmodule.c#L1028, when a thread has finished its work, these two calls are made:
PyThreadState_DeleteCurrent();
PyThread_exit_thread();
The first ends up releasing the lock which allows join() to return; the second results in the LWP unwinding its stack.
Debugging a trivial test python script which starts a Thread, join()s it and then sends a signal to itself (which provides a convenient breakpoint which gdb can understand) shows a similar backtrace to the one in the original post, which further supports this conclusion.

Related

Julia JuMP Cbc solver hangs infinitely without message and without exiting

I'm using Julia language (Version 1.3.1), JuMP package (Version 0.20.1) and Cbc package (Version 0.6.6) to solve an optimization problem in a docker container with ubuntu:16.04. The optimizer Cbc seems to be hung, with 100% cpu usage, without exiting and without any message. The problems happens rarely on similar problem and seems to be not replicable: if I run the same code with the same data it doesn't hang anymore. Hope that backtrace got through gdb can be useful.
I can share my model, if needed. It has 11520 variables, 4652 constraints, 10080 variables used in linear objective function.
This is the log of Cbc optimizer:
Welcome to the CBC MILP Solver
Version: 2.10.3 Build Date: Oct 7 2019
command line - Cbc_C_Interface -threads 0 -seconds 360.0 -maxNodes 30000 -logLevel 1 -solve -quit (default strategy 1) seconds was
changed from 1e+100 to 360 maxNodes was changed from 2147483647 to
30000 Continuous objective value is 2.3607e+08 - 0.11 seconds
Cgl0002I 3197 variables fixed Cgl0005I 7 SOS with 8323 members
Cgl0004I processed model has 15 rows, 8323 columns (8323 integer (8323
of which binary)) and 26556 elements Cbc0045I Fixing only non-zero
variables. Cbc0045I Warning: mipstart values could not be used to
build a solution.
Here Cbc seems to be hung and becomes unresponsive, with 100% cpu usage.
Here the backtrace on the running pid process:
#0 0x00007f163c3facc9 in ?? () from target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbcSolver.so
#1 0x00007f163c4125b3 in ?? () from target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbcSolver.so
#2 0x00007f163c467586 in ?? () from target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbcSolver.so
#3 0x00007f163c46aebc in ?? () from target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbcSolver.so
#4 0x00007f163c40594a in ?? () from target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbcSolver.so
#5 0x00007f163c29afbe in ?? () from target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbcSolver.so
#6 0x00007f163c2ad844 in ?? () from target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbcSolver.so
#7 0x00007f163b8ea31f in CbcHeuristicDive::solution(double&, int&, int&, OsiRowCut**, CbcSubProblem&, double*) () from
target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbc.so.3
#8 0x00007f163b8ebf42 in CbcHeuristicDive::solution(double&, double*) () from
target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbc.so.3
#9 0x00007f163b938fd2 in CbcModel::solveWithCuts(OsiCuts&, int, CbcNode*) () from
target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbc.so.3
#10 0x00007f163b9472d7 in CbcModel::branchAndBound(int) () from target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbc.so.3
#11 0x00007f163c214c47 in CbcMain1(int, char const, CbcModel&, int ()(CbcModel, int), CbcSolverUsefulData&) () from
target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbcSolver.so
#12 0x00007f163c2252ae in CbcMain1(int, char const**, CbcModel&) () from
target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbcSolver.so
#13 0x00007f163c19bc50 in Cbc_solve () from target:/root/.julia/packages/Cbc/vWzyC/deps/usr/lib/libCbcSolver.so
#14 0x00007f16698e7e71 in ?? ()
#15 0x000000000000000c in ?? ()
#16 0x00007fff70694480 in ?? ()
#17 0x00007f16604ce110 in ?? ()
#18 0x000000000000262e in ?? ()
#19 0x0000000000000006 in ?? ()
#20 0x00007fff70694480 in ?? ()
#21 0x00007f165966ab40 in ?? ()
#22 0x00007f164a7ce1d0 in ?? ()
#23 0x00007f164a7ce220 in ?? ()
#24 0x00007f164a7ce1d0 in ?? ()
#25 0x00007f1688be7b00 in ?? () at /buildworker/worker/package_linux64/build/src/array.c:738 from
target:/opt/julia/bin/../lib/libjulia.so.1
#26 0x00007f163d909af0 in ?? ()
#27 0x00007f164439d3c0 in ?? ()
#28 0x00007f1689524200 in ?? ()
#29 0x0000000000000000 in ?? ()
Using next command in gdb console, than a StackOverflowError() error is catched on CbC.
Has the objective function too many terms?
Any help is really appreciable.
Thank you
This appears to be an issue with Cbc. It's impossible to provide more advice without a reproducible example. I suggest you try to simplify your model and create an MPS file.
You can set the time limit using the seconds parameter as follows.
For newer package versions:
model = Model(optimizer_with_attributes(Cbc.Optimizer
,"seconds" => 60
,"threads" => 4
,"loglevel" => 0
,"ratioGap" => 0.0001))
Or like this for older package versions:
model = Model(with_optimizer(Cbc.Optimizer
,seconds=60
,threads=4
,loglevel=0
,ratioGap=0.0001))

Running Qt GUI Application on VNC cause Segmentation Fault with error message

I'm trying to run a Qt 5.8 GUI application in vncviewer and I'm getting a segmentation fault.
System Configuration
Qt 5.8
Ubuntu 17.04
vncserver
Xvnc Free Edition 4.1.1 - built Feb 25 2015 23:02:21
vncviewer
TigerVNC Viewer 64-bit v1.7.0
VNC xstartup script contents:
#!/bin/sh
export XKL_XMODMAP_DISABLE=1
unset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS
xfce4-panel &
xfsettingsd &
xfwm4 &
xfdesktop &
pcmanfm &
xfce4-terminal &
Error Message:
$ ./MyApp
QXcbConnection: Failed to initialize XRandr
Segmentation fault (core dumped)
Core Dump
Note: Had to change some paths, app names, and omissions for brevity.
(gdb) run
Starting program: $HOME/MyApp
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe2023700 (LWP 5917)]
QXcbConnection: Failed to initialize XRandr
[New Thread 0x7fffd5cbc700 (LWP 5918)]
...
[omitted for brevity]
...
[New Thread 0x7fff6b32a700 (LWP 5945)]
Thread 23 "Chrome_InProcGp" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff8d7fa700 (LWP 5942)]
0x00007ffff081abba in ?? () from /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
(gdb) bt
#0 0x00007ffff081abba in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#1 0x00007ffff081b4bc in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#2 0x00007ffff1a51d54 in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#3 0x00007ffff1a54478 in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#4 0x00007ffff1a55589 in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#5 0x00007ffff1a4ffd0 in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#6 0x00007ffff1a5024e in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#7 0x00007ffff1a50969 in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#8 0x00007ffff1a51225 in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#9 0x00007ffff1a512f3 in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#10 0x00007ffff19f725d in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#11 0x00007ffff19a5dbe in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#12 0x00007ffff19a694d in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#13 0x00007ffff19a6c1b in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#14 0x00007ffff19a8559 in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#15 0x00007ffff19bb18a in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#16 0x00007ffff19d0c05 in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#17 0x00007ffff19d0de7 in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#18 0x00007ffff19cd76d in () at /opt/Qt5.8.0/5.8/gcc_64/lib/libQt5WebEngineCore.so.5
#19 0x00007ffff7bc06da in start_thread (arg=0x7fff8d7fa700) at pthread_create.c:456
#20 0x00007fffef6aad7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
This only happens in vncviewer of remote desktop. On the local system it runs as expected.
Please let me know if there are any workarounds.
The application needs an OpenGL context for the rendering process. It looks like it is not available on your server.
If you are running the application via ssh, try to run it with the graphical user interface integration:
ssh -X ...
This is needed because by default Qt's use the xcb backend on Linux so it needs an active X session to show something.
If it does not work, you may need to check the OpengGL support of the distribution/configuration you are running in your server.

Why does my Qt application lock up when I use QProcess or popen?

My Qt application is locking up with high cpu usage after running correctly for a few hours, and I'm trying to figure out why. This is on an embedded linux system.
The first thing I did was attach gdb and look at a stack backtrace. It shows the lockup occurring in ptmalloc_lock_all(), which is called by fork(). The application uses system() to play a video, and periodically uses QProcess to check whether a USB drive is mounted.
I don't intentionally use multiple threads in my application, but gdb shows 4 threads at the point where the freeze occurs. I've included the backtrace for all 4 threads:
(gdb) thread apply all backtrace
Thread 4 (Thread 995):
#0 0x4282dc50 in select () from /opt/filesys/fs/lib/libc.so.6
#1 0x4246a768 in qt_safe_select(int, fd_set*, fd_set*, fd_set*, timeval const*) ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#2 0x4246f2a8 in QEventDispatcherUNIXPrivate::doSelect(QFlags<QEventLoop::ProcessEventsFlag>, timeval*) ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#3 0x4246f6e0 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#4 0x42435898 in QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#5 0x42435bac in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#6 0x42317834 in QThread::exec() ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#7 0x4231aaf8 in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
Cannot access memory at address 0x0
#8 0x4231aaf8 in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
Cannot access memory at address 0x0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 3 (Thread 973):
#0 0x425c64bc in pthread_cond_wait##GLIBC_2.4 () from /opt/filesys/fs/lib/libpthread.so.0
#1 0x413f8688 in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtWebKit.so.4
Cannot access memory at address 0x0
#2 0x413f8688 in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtWebKit.so.4
Cannot access memory at address 0x0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 2 (Thread 949):
#0 0x4282dc50 in select () from /opt/filesys/fs/lib/libc.so.6
#1 0x4240c16c in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
Cannot access memory at address 0x3054
#2 0x4240c16c in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
Cannot access memory at address 0x3054
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 1 (Thread 720):
#0 0x427e1194 in ptmalloc_lock_all () from /opt/filesys/fs/lib/libc.so.6
#1 0x428041c4 in fork () from /opt/filesys/fs/lib/libc.so.6
#2 0x4240fce8 in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
#3 0x4240fce8 in ?? ()
from /opt/filesys/fs/opt/filesys/fs/opt/qt/lib/libQtCore.so.4
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I'm not very experienced with multithreaded programming, but from reading online it sounds like it can be dangerous to use fork() in a multithreaded application. I'm not sure what to do as an alternative though.
Any input would be greatly appreciated!
Marlon

XCode 4.2.1 + Enable Guard Malloc -> immediate crash?

Anybody know how to solve this? With Xcode 4.2.1, turning on Enable Guard Malloc and running on iPhone Simulator 5, the app immediately crashes, and this is the stack trace:
#0 0x00000000 in <????> ()
#1 0x91594ef3 in mig_get_reply_port ()
#2 0x9158e70c in mach_ports_lookup ()
#3 0x031f0124 in _xpc_domain_init_local ()
#4 0x031edeb1 in _libxpc_initializer ()
#5 0x8fe5d15b in __dyld__ZN16ImageLoaderMachO18doModInitFunctionsERKN11ImageLoader11LinkContextE ()
#6 0x8fe5ccc0 in __dyld__ZN16ImageLoaderMachO16doInitializationERKN11ImageLoader11LinkContextE ()
#7 0x8fe5a220 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEjRNS_21InitializerTimingListE ()
#8 0x8fe5a1b6 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEjRNS_21InitializerTimingListE ()
#9 0x8fe5a1b6 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEjRNS_21InitializerTimingListE ()
#10 0x8fe5a1b6 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEjRNS_21InitializerTimingListE ()
#11 0x8fe5a1b6 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEjRNS_21InitializerTimingListE ()
#12 0x8fe5b1c0 in __dyld__ZN11ImageLoader15runInitializersERKNS_11LinkContextERNS_21InitializerTimingListE ()
#13 0x8fe4f626 in __dyld__ZN4dyld24initializeMainExecutableEv ()
#14 0x8fe53ef2 in __dyld__ZN4dyld5_mainEPK12macho_headermiPPKcS5_S5_ ()
#15 0x8fe4d2ef in __dyld__ZN13dyldbootstrap5startEPK12macho_headeriPPKclS2_ ()
#16 0x8fe4d063 in __dyld__dyld_start ()
You san see this answer: Application crashes on simulator 5.0 before reaching main.m
However, the linked solution didn't work for me. I had exactly the same diagnostic and absence of error message, but I found no "weak"-related issues in my project.pbxproj
However, I found that the cause of my problem was a deadlock in an +(void)initialize method.
More precisely, in this method I was calling dispatch_sync(dispatch_get_main_queue(), ^{[some block code]}). Changing this to a dispatch_async (note the "a") solved my problem.
The way I discovered the issue was accidental. While nothing seemed to happen, the "thread" navigator of Xcode was telling me that the app itself wasn't crashed. And I accidentaly clicked on "pause" in the debugger. And suddenly it stopped exactly where the deadlock was.
Enjoy.

Xcode 4 stack trace debugging issue

For a while now (I can't remember exactly which version) Xcode 4 has not been working properly. In that whenever my code crashes the debugger just shows me the main() function and there is a stack trace like this:
#0 0x9018b9c6 in __pthread_kill ()
#1 0x90105f78 in pthread_kill ()
#2 0x900f6bdd in abort ()
#3 0x03c93e78 in dyld_stub__Unwind_DeleteException ()
#4 0x03c9189e in default_terminate() ()
#5 0x0154df4b in _objc_terminate ()
#6 0x03c918de in safe_handler_caller(void (*)()) ()
#7 0x03c91946 in __cxa_bad_typeid ()
#8 0x03c92b3e in __cxa_current_exception_type ()
#9 0x0154de49 in objc_exception_rethrow ()
#10 0x012f2e10 in CFRunLoopRunSpecific ()
#11 0x012f2ccb in CFRunLoopRunInMode ()
#12 0x012a5879 in GSEventRunModal ()
#13 0x012a593e in GSEventRun ()
#14 0x00013a9b in UIApplicationMain ()
#15 0x00002a02 in main at /Users/dan/Dev/Container/Container/main.m:16
In the console I get some more meaningful information, in this case it tells me:
2011-11-09 10:39:53.886 Container[27273:f803] *** Terminating app due to uncaught exception 'NSRangeException', reason: '*** -[__NSArrayI objectAtIndex:]: index 1 beyond bounds for empty array'
In this example I know what the problem is because I made the error (I'm trying to access an object in an empty NSArray) on purpose to try and get to the bottom of this issue. However, I can't figure out what is going wrong. I've lived with it up until now as lot of the bugs I've had I've been familiar with and knew where to look anyway but still, it's becoming a real pain.
Can anyone please shed some light on this issue?
You probably want to stop at the point in code when the exception actually occurs, not when it has been bubbled up to your main method. To do so, set an exception breakpoint as explained here: Exception Breakpoint in Xcode

Resources