X Tutup
The Wayback Machine - https://web.archive.org/web/20220906175247/https://github.com/python/cpython/issues/96387
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The main thread hang at drop_gil due to FORCE_SWITCHING after daemon thread exited #96387

Open
milizao opened this issue Aug 29, 2022 · 1 comment
Labels
interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@milizao
Copy link

milizao commented Aug 29, 2022

Bug report

Main thread hang at drop_gil after daemon thread exited, my Python version is 3.9.2, but I think the same issue is exists in the Python 3.10 newest version.

(gdb) info threads
  Id   Target Id                                   Frame 
* 1    Thread 0xffff909dd010 (LWP 226769) "python" 0x0000ffff909355c0 in futex_wait_cancelable (private=<optimized out>, expected=0, 
    futex_word=0xaaaadf99e8d4 <_PyRuntime+524>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
(gdb) bt
#0  0x0000ffff909355c0 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xaaaadf99e8d4 <_PyRuntime+524>)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0xaaaadf99e8d8 <_PyRuntime+528>, cond=0xaaaadf99e8a8 <_PyRuntime+480>) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=cond@entry=0xaaaadf99e8a8 <_PyRuntime+480>, mutex=mutex@entry=0xaaaadf99e8d8 <_PyRuntime+528>) at pthread_cond_wait.c:638
#3  0x0000aaaadf35d184 in drop_gil (ceval=0xaaaadf99e820 <_PyRuntime+344>, ceval@entry=0xaaaadf300274 <PyObject_Hash+116>, ceval2=ceval2@entry=0xaaaafbc61ff0, 
    tstate=tstate@entry=0xaaaafbc64a00) at Python/ceval_gil.h:178
#4  0x0000aaaadf2c05c4 in eval_frame_handle_pending (tstate=0xaaaafbc64a00) at Python/ceval.c:882
#5  _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xffff83929d40, throwflag=<optimized out>) at Python/ceval.c:1402
#6  0x0000aaaadf2bee1c in _PyEval_EvalFrame (throwflag=0, f=0xffff83929d40, tstate=0xaaaafbc64a00) at ./Include/internal/pycore_ceval.h:40
#7  function_code_fastcall (tstate=0xaaaafbc64a00, co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:330
#8  0x0000aaaadf31ac04 in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>, 
    tstate=<optimized out>) at ./Include/cpython/abstract.h:118
#9  _PyObject_CallNoArg (func=<optimized out>) at ./Include/cpython/abstract.h:172
#10 call_unbound_noarg (self=<optimized out>, func=0xffff83691ee0, unbound=<optimized out>) at Objects/typeobject.c:1515
#11 slot_tp_finalize (self=<optimized out>) at Objects/typeobject.c:6992
#12 0x0000aaaadf30011c in PyObject_CallFinalizer (self=self@entry=0xffff837ea6d0) at Objects/object.c:195
#13 0x0000aaaadf302694 in PyObject_CallFinalizerFromDealloc (self=self@entry=0xffff837ea6d0) at Objects/object.c:213
#14 0x0000aaaadf314b1c in subtype_dealloc (self=0xffff837ea6d0) at Objects/typeobject.c:1265
#15 0x0000aaaadf2f3f4c in _Py_DECREF (op=<optimized out>) at ./Include/object.h:430
#16 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:497
#17 insertdict (mp=mp@entry=0xffff8383de80, key=0xffff837450b0, hash=-5519671433175608536, value=value@entry=0xaaaadf94be68 <_Py_NoneStruct>)
    at Objects/dictobject.c:1123
#18 0x0000aaaadf2f52d4 in PyDict_SetItem (op=op@entry=0xffff8383de80, key=<optimized out>, value=value@entry=0xaaaadf94be68 <_Py_NoneStruct>)
    at Objects/dictobject.c:1573
#19 0x0000aaaadf2ffa48 in _PyModule_ClearDict (d=0xffff8383de80) at Objects/moduleobject.c:612
#20 0x0000aaaadf2ffc84 in _PyModule_Clear (m=m@entry=0xffff837f1a40) at Objects/moduleobject.c:560
#21 0x0000aaaadf3798e8 in _PyImport_Cleanup (tstate=tstate@entry=0xaaaafbc64a00) at Python/import.c:606
#22 0x0000aaaadf38e63c in Py_FinalizeEx () at Python/pylifecycle.c:1426
#23 0x0000aaaadf2c9110 in Py_RunMain () at Modules/main.c:679
#24 0x0000aaaadf2c99f8 in pymain_main (args=args@entry=0xffffd3754d88) at Modules/main.c:707
#25 0x0000aaaadf2c9ad4 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#26 0x0000ffff906c1f80 in __libc_start_main (main=0x0, argc=0, argv=0x0, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=<optimized out>) at ../csu/libc-start.c:308
#27 0x0000aaaadf2c87bc in _start ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) py-bt
Traceback (most recent call first):
  File "/home/ossadm/zhao/runtime.py", line 8, in __del__
    def __del__(self):
(gdb) 

Rough analysis

In Python version 3.9.2, the take_gil is implemented as following:

static void
take_gil(PyThreadState *tstate)
{
...
    while (_Py_atomic_load_relaxed(&gil->locked)) {
        unsigned long saved_switchnum = gil->switch_number;

        unsigned long interval = (gil->interval >= 1 ? gil->interval : 1);
        int timed_out = 0;
        COND_TIMED_WAIT(gil->cond, gil->mutex, interval, timed_out);

        /* If we timed out and no switch occurred in the meantime, it is time
           to ask the GIL-holding thread to drop it. */
        if (timed_out &&
            _Py_atomic_load_relaxed(&gil->locked) &&
            gil->switch_number == saved_switchnum)
        {
            if (tstate_must_exit(tstate)) {
                MUTEX_UNLOCK(gil->mutex);
                PyThread_exit_thread();   // [2]
            }
            assert(is_tstate_valid(tstate));

            SET_GIL_DROP_REQUEST(interp);  // [1]
        }
    }
...
}

A daemon thread may call SET_GIL_DROP_REQUEST and then exits from PyThread_exit_thread, so this thread will not get GIL really. This happens if the main thread is on the finalizing procedure and it calls _PyRuntimeState_SetFinalizing in Py_FinalizeEx before the daemon thread gets GIL (The daemon thread wait 5ms at most before exit).

After that, the flag gil_drop_request is set, but only the main thread is left. Then, if the main thread runs into drop_gil again, it will hang at FORCE_SWITCHING:

COND_WAIT(gil->switch_cond, gil->switch_mutex);

Reproducible example

This issue could be reproduced by:

$ python -m http.server 5678 &
$ for ((i=0;;i++)); do echo $i; python -W all service.py; done

And the scripts used are:

runtime.py

import socket

class TestUnclosedSocket:
    def __init__(self):
        self.s = socket.socket()
    def __del__(self):
        self.s.connect(('127.0.0.1', 5678))

ins = TestUnclosedSocket()

service.py

import threading
import runtime
import sys

sys.setswitchinterval(0.001)

def calc():
    sum = 0
    for i in range(100000):
        sum += i * i

def daemon_func():
    while True:
        calc()

if __name__ == '__main__':
    threading.Thread(target=daemon_func, name="daemon", daemon=True).start()
    calc()

I change GIL switch interval to 1ms to increase the probability here.

@milizao milizao added the type-bug An unexpected behavior, bug, or error label Aug 29, 2022
@113xiaoji
Copy link

113xiaoji commented Aug 30, 2022

Reproducible example

This issue could be reproduced by:

No need to__del__ Call self.s.connect(('127.0.0.1', 5678)), only initialize in init

call self.s = socket.socket()

And the scripts used are:

runtime.py

import socket

class TestUnclosedSocket:
    def __init__(self):
        self.s = socket.socket()
ins = TestUnclosedSocket()

service.py

import threading
import runtime
import sys

sys.setswitchinterval(0.001)

def calc():
    sum = 0
    for i in range(100000):
        sum += i * i

def daemon_func():
    while True:
        calc()

if __name__ == '__main__':
    threading.Thread(target=daemon_func, name="daemon", daemon=True).start()
    calc()

change GIL switch interval to 1ms to increase the probability here.

gdb trace is

(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x7f2266928740 (LWP 192050) "python3.9" 0x00007f22662ce995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

(gdb) bt
#0  0x00007f22662ce995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004e9246 in drop_gil (ceval=0x993c38 <_PyRuntime+344>, ceval2=<optimized out>, tstate=<optimized out>) at Python/ceval_gil.h:178
#2  0x00000000004206d6 in eval_frame_handle_pending (tstate=0x23b5a80) at Python/ceval.c:882
#3  _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:1402
#4  0x000000000041eb18 in _PyEval_EvalFrame (throwflag=0, f=0x7f22668d9610, tstate=0x23b5a80) at ./Include/internal/pycore_ceval.h:40
#5  function_code_fastcall (tstate=0x23b5a80, co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:330
#6  0x000000000048ecf1 in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>, tstate=<optimized out>)
    at ./Include/cpython/abstract.h:118
#7  _PyObject_CallNoArg (func=<optimized out>) at ./Include/cpython/abstract.h:172
#8  call_unbound_noarg (self=<optimized out>, func=0x7f225e808af0, unbound=<optimized out>) at Objects/typeobject.c:1515
#9  slot_tp_repr (self=<optimized out>) at Objects/typeobject.c:6630
#10 0x000000000047151d in PyObject_Repr (v=0x7f22667b7ac0) at Objects/object.c:420
#11 0x00000000004cea36 in unicode_fromformat_arg (vargs=0x7fff4339f510, f=0x7f225ec6e1b8 "R", writer=0x7fff4339f530) at Objects/unicodeobject.c:2989
#12 PyUnicode_FromFormatV (format=format@entry=0x7f225ec6e1ae "unclosed %R", vargs=vargs@entry=0x7fff4339f5b8) at Objects/unicodeobject.c:3053
#13 0x00000000004e7628 in _PyErr_WarnFormatV (vargs=0x7fff4339f5b8, format=0x7f225ec6e1ae "unclosed %R", stack_level=1, category=0x942be0 <_PyExc_ResourceWarning>, source=0x7f22667b7ac0)
    at Python/_warnings.c:1124
#14 PyErr_ResourceWarning (source=source@entry=0x7f22667b7ac0, stack_level=stack_level@entry=1, format=format@entry=0x7f225ec6e1ae "unclosed %R") at Python/_warnings.c:1179
#15 0x00007f225ec62aea in sock_finalize (s=0x7f22667b7ac0) at /opt/cph/Python-3.9.2/Modules/socketmodule.c:5044
#16 0x0000000000475b7a in PyObject_CallFinalizer (self=0x7f22667b7ac0) at Objects/object.c:195
#17 PyObject_CallFinalizerFromDealloc (self=self@entry=0x7f22667b7ac0) at Objects/object.c:213
#18 0x0000000000486afa in subtype_dealloc (self=0x7f22667b7ac0) at Objects/typeobject.c:1265
#19 0x000000000045ded5 in _Py_DECREF (op=<optimized out>) at ./Include/object.h:430
#20 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:497
#21 dict_dealloc (mp=0x7f226676fd80) at Objects/dictobject.c:2018
#22 0x0000000000486df2 in _Py_DECREF (op=<optimized out>) at ./Include/object.h:430
#23 subtype_dealloc (self=0x7f2266796f70) at Objects/typeobject.c:1321
#24 0x0000000000463a75 in _Py_DECREF (op=<optimized out>) at ./Include/object.h:430
#25 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:497
#26 insertdict (value=0x94e5e0 <_Py_NoneStruct>, hash=-5400324744021460460, key=0x7f226679a870, mp=0x7f2266794400) at Objects/dictobject.c:1123
#27 PyDict_SetItem (op=op@entry=0x7f2266794400, key=0x7f226679a870, value=0x94e5e0 <_Py_NoneStruct>) at Objects/dictobject.c:1573
#28 0x0000000000470ca6 in _PyModule_ClearDict (d=0x7f2266794400) at Objects/moduleobject.c:612
#29 0x0000000000470e1e in _PyModule_Clear (m=m@entry=0x7f2266798ea0) at Objects/moduleobject.c:560
#30 0x000000000050c8bf in _PyImport_Cleanup (tstate=tstate@entry=0x23b5a80) at Python/import.c:606
#31 0x000000000052a8ca in Py_FinalizeEx () at Python/pylifecycle.c:1426
#32 0x000000000052d3e5 in Py_FinalizeEx () at Python/pylifecycle.c:1510
#33 0x000000000042a230 in Py_RunMain () at Modules/main.c:679
#34 0x000000000042abef in pymain_main (args=0x7fff4339fa90) at Modules/main.c:707
#35 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#36 0x00007f226580b505 in __libc_start_main () from /lib64/libc.so.6
#37 0x00000000004299b5 in _start ()

referance https://bugs.python.org/issue39877

#84058 #84058
@vstinner

@ronaldoussoren ronaldoussoren added the interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) label Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants
X Tutup