X Tutup
The Wayback Machine - https://web.archive.org/web/20250209092909/https://github.com/python/cpython/pull/124533
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-120144: Make it possible to use sys.monitoring for bdb and make it default for pdb #124533

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

gaogaotiantian
Copy link
Member

@gaogaotiantian gaogaotiantian commented Sep 25, 2024

This is the most conservative attempt ever to utilize sys.monitoring in bdb and pdb.

Highlights:

  • Full backward compatibility - no changes to test_pdb and test_bdb at all with the new backend
  • bdb will still default to sys.settrace, which keeps all the old behavior. Users can opt-in to the new sys.monitoring backend, and the interface is still the same, even for trace_dispatch (that's how test_bdb passes).
  • New additional and optional interfaces in bdb where user can disable certain events to improve the peformance.
  • pdb.Pdb will use sys.settrace by default too, and is configurable with pdb.Pdb.DEFAULT_BACKEND
  • pdb CLI and breakpoint() uses the monitoring backend and no noticable difference I can observe at this point.

Solution:

Basically, I mimicked the behavior of sys.settrace with sys.monitoring to keep the old behavior as much as possible. But I had the chance to use the new API of sys.monitoring to disable certain events.

Performance:

It's not as impressive as the original proposal, but for the following code:

import time
def f(x):
    # Set breakpoint here
    x *= 2
    return x + 1

def test():
    start_time = time.time()
    for i in range(1000):
        for j in range(1000):
            j + i
    cost = time.time() - start_time
    print(cost)
    f(0)

test()

On my laptop, without debugger, it takes 0.025s. With the new pdb attached (b f then c), it takes the same amount of time, and with Python 3.12 pdb, it takes 1.04s(4100%+ overhead). The performance improvement is significant to say at least.

And as you can tell from the diff, the actual changes to pdb is minimal - just change sys.settrace(tracefunc) to self.start_trace(tracefunc) and sys.settrace(None) to self.stop_trace(). That's what the debugger developers need to do to onboard.

@gaogaotiantian
Copy link
Member Author

And of course we need documentation updates, I will do it later when the feature is accepted and the interface is decided.

@terryjreedy
Copy link
Member

Its after midnight so will test much later today. If all ok using default, will patch IDLE to pass 'monitoring'.

@terryjreedy
Copy link
Member

terryjreedy commented Sep 29, 2024

Ran fine with default backend.

Not fine with backend='monitoring'.
Usually, when I start debugger, stack should one line with bdb.run(). Running a file should top line under that. I think showing bdb.run is an error, but this is what to compare to. With monitoring, there is initially nothing in stack window. Running a file results in 17 lines from threading, idlelib, and bdb. I have to hit 'go' to get to bdb.run + first line. After that, over and step seem to work, but 'go' freezes debugger.

EDIT: The remote execution process crashes because of an unrecoverable exception in the rpc code. Monitoring does not seem to work across the socket connection. Some of the debugger code likely needs a change (as pdb does). (But IDLE does not have to use 'monitoring'.

@gaogaotiantian
Copy link
Member Author

Right - the most important thing is IDLE can simply keep working as it is, but it's also a very important example to test the new mechanism.

I think at least part of the issue is multi-threading. For sys.settrace, it only sets trace on current thread, but sys.monitoring sets events on all threads. That should answer some of the questions. I can patch this PR for thread check and maybe you can take a look after the fix.

@gaogaotiantian
Copy link
Member Author

Hi @terryjreedy , I "fixed" the multi-threading issue. Well by "fixed" I meant making it the same behavior as before. Let me know if you have some time to test it out :)

@gaogaotiantian
Copy link
Member Author

gaogaotiantian commented Oct 17, 2024

With clear_tool_id, I can easily change the events like LINE to local, and that gives me the full speed! The code example runs with basically 0 overhead now.

@pyscripter
Copy link

pyscripter commented Jan 17, 2025

It's not as impressive as the original proposal, but for the following code:

import time
def f(x):
    # Set breakpoint here
    x *= 2
    return x + 1

def test():
    start_time = time.time()
    for i in range(1000):
        for j in range(1000):
            j + i
    cost = time.time() - start_time
    print(cost)
    f(0)

test()

On my laptop, without debugger, it takes 0.025s. With the new pdb attached (b f then c), it takes the same amount of time, and with Python 3.12 pdb, it takes 1.04s(4100%+ overhead). The performance improvement is significant to say at least.

The improvement you see here is entirely due to the changes in break_anywhere(self, frame): introduced in python 3.14. If you try with the released alpha versions of 3.14 you get similar results to yours.

I have tried your new bdb code, based on sys.monitoring, in different cases and, unfortunately, I did not see any improvements. Quite the opposite!

@gaogaotiantian
Copy link
Member Author

The improvement you see here is entirely due to the changes in break_anywhere(self, frame): introduced in python 3.14.

#124553 was merged after this PR so that's not possible (I did it btw).

But yes, for this specific case, the changes to break_anywhere will give the same result, and that's good.

What case did you try for this implementation? Like I mentioned, this is not the perfect solution, the major problem is solves is when a line event is triggered multiple times. So, for the same code, if you put a breakpoint at print(cost), it should show the similar diff, and break_anywhere change won't affect it.

When you say quite the opposite, do you mean it's actually significant slower than the original solution? Do you have any examples?

@pyscripter
Copy link

pyscripter commented Jan 17, 2025

So, for the same code, if you put a breakpoint at print(cost), it should show the similar diff, and break_anywhere change won't affect it.

Yes it does! Massive improvement in this case!

the major problem is solves is when a line event is triggered multiple times

Could you please elaborate on how this is achieved?

When you say quite the opposite, do you mean it's actually significant slower than the original solution? Do you have any examples?

One script I saw degradation of performance was the following:

from timeit import timeit

class A(object):
    def __init__(self):
        self.value = 1

class B(A):
    @staticmethod
    def create_property(name):
        return property(
            lambda self: getattr(self.srg, name),
            lambda self, v: setattr(self.srg, name, v),
        )
    value = create_property.__func__("value")

    def __init__(self):
        self.srg = __import__("threading").local()
        self.value = 2
        super().__init__()

b1 = B()
b2 = B()

print(timeit("b1.value = 4; c = b1.value", number=100000, globals=globals()))
print(timeit("b1.v = 4; c = b1.v", number=100000, globals=globals()))

With a breakpoint in the last line, the monitoring-based Bdb takes 4 times as much time to reach it.

But let me add some more details about how I am testing. I am using a Bdb-based debugger in PyScripter, which is adapted for multi-treaded debugging. I am not using your modified bdb.py directly. Instead I created a subclass of Bdb integrating your code (see fastbdb.zip). Since, I wanted to back port your code to python 12 and 13, I replaced clear_tool_id with a custom function. Finally, since I am using the code for multi-threaded debugging, I have removed the changes in 99ea70c.
fastbdb.zip

@gaogaotiantian
Copy link
Member Author

Okay that's a completely different story. Unfortunately this is the CPython PR so I can't fully solve issues for your debugger, but I have some potential theories.

sys.monitoring, unlike sys.settrace, will be triggered on all threads. If you mentioned multi-thread, that could be the problem. Are you using 4 threads when you say it's 4x slower?

clear_tool_id is also critical in this case, because local events will not be disabled automatically when you free the tool id. I don't know if your implementation of that is correct.

So, if you can reproduce the performance issue on pdb from this branch, I can investigate more into it. If you are experiencing issues on a debugger that's based on a customized version of this bdb, I'm afriad you are on your own because it's highly possible that the issue is caused by your customization (otherwise you should be able to repro it with pure pdb).

@pyscripter
Copy link

pyscripter commented Jan 17, 2025

I am testing on python 3.14 and I am using the following clear_tool_id

        def clear_tool_id(tool_id):
            import sys
            if sys.version_info >= (3,14):
                sys.monitoring.clear_tool_id(tool_id)
            else:
                for event in sys.monitoring.events.__dict__.values():
                    if isinstance(event, int) and event:  # Ensure it's an event constant
                        sys.monitoring.register_callback(tool_id, event, None)

So, no difference here.

sys.monitoring, unlike sys.settrace, will be triggered on all threads. If you mentioned multi-thread, that could be the problem. Are you using 4 threads when you say it's 4x slower?

No. There is just one thread and it works exactly like your code. PyScripter allows you to debug multi-threaded python code, but for single-threaded scripts it just uses Bdb.

And I am not asking you to solve my problems. I reported one script where monitoring degrades debugging performance. Why don't you try it on your side?

@gaogaotiantian
Copy link
Member Author

gaogaotiantian@DESKTOP-I8L3RCK:~/programs/mycpython$ ./python -m pdb example.py 
> /home/gaogaotiantian/programs/mycpython/example.py(1)<module>()
-> from timeit import timeit
(Pdb) b 26
Breakpoint 1 at /home/gaogaotiantian/programs/mycpython/example.py:26
(Pdb) c
0.6028096990003178
0.0015645000003132736
> /home/gaogaotiantian/programs/mycpython/example.py(26)<module>()
-> pass
(Pdb) 
gaogaotiantian@DESKTOP-I8L3RCK:~/programs/mycpython$ ./python -m pdb example.py 
> /home/gaogaotiantian/programs/mycpython/example.py(1)<module>()
-> from timeit import timeit
(Pdb) b 26
Breakpoint 1 at /home/gaogaotiantian/programs/mycpython/example.py:26
(Pdb) c
0.4884036080002261
0.004922300000544055
> /home/gaogaotiantian/programs/mycpython/example.py(26)<module>()
-> pass

Former is monitoring and latter is settrace. Let me know if you have different results. Again, not results from pyscripter, please only send results of pdb monitoring vs pdb settrace. (And you can use the latest PR which merged in the 3.14 changes).

@pyscripter
Copy link

Using the attached script I get

using settrace
time1=  0.318285699991975
+++ <test> 26 <module> : print("time2= ", timeit("b1.v = 4; c = b1.v", number=100000, globals=globals()))
time2=  0.003290199994808063

using monitoring
time1=  0.3893818999931682
+++ <test> 26 <module> : print("time2= ", timeit("b1.v = 4; c = b1.v", number=100000, globals=globals()))
time2=  0.0014353000151459128

So monitoring is about 25% slower in this example. Consistent with your results.

testfastdbd.zip

@gaogaotiantian
Copy link
Member Author

gaogaotiantian commented Jan 17, 2025

Yes, the callback is actually slower because we have more checks. However, pdb could support multi thread in the future which will turn this into a positive.

Performance of a debugger is important, but 25% overhead is not unacceptable (it's a debugger, so overhead is expected). It's about a tradeoff between how much we gain on more common cases vs how much we lose on others.

Eliminating repeating line events is a big gain and we can polish the performance in the future. I think we can actually get much better with the ability to disable events. This is just a start.

@pyscripter
Copy link

Thanks! And by the way, I found why I got the large performance hit in my earlier tests. It looks good.

@pyscripter
Copy link

pyscripter commented Jan 18, 2025

@gaogaotiantian I would like to mention that using bdb for multi-threaded debugging, even as it stands today, is quite straightforward. What it takes is:

  • store fields like botframe, retrunframe, stopframe etc. in thread local storage
  • patch threading.Thread to enable tracing and send notifications on thread start and stop

I am doing this in PyScripter, now using your monitoring version, which I have backported to 3.12 and 3.13. It also works with the free-threaded version of python.

@gaogaotiantian
Copy link
Member Author

Well it's definitely out of the scope of this PR, but from a debugger's view, there are a lot of things to consider, more than "how to trigger a callback on an arbitrary thread". For example, will other threads be halted while a thread is being debugged? If so, how? Otherwise, what if they are printing stuff? How to switch between threads? I believe some of the issues were discussed in an issue specifically for the matter. I do think there are solutions, but I don't think it'll be trivial. For now that's an item on todo list :)

@pyscripter
Copy link

pyscripter commented Jan 18, 2025

Well it's definitely out of the scope of this PR

Of course.

I was talking about Bdb. All Bdb is doing is allowing descendent classes to implement user_line, user_call etc. The multi-threaded bdb would just add one more say user_thread to notify about thread start and finish. How these are implemented it depends on the debugger UI.

The difficult bit is to think through how for example Pdb would handle multiple threads. This becomes very tricky if the debugger is running in the same process as the UI, especially so in free-threaded python.

In an IDE context, it would work like Visual Studio works. In PyScripter the debugger displays the running threads and their state and the user can step over any thread, examine the locals, issue commands in the context of the selected frame, resume any thread or all of them etc.

image

For example in this case, the MainThread is running and all other threads have hit a breakpoint and they are paused. The call stack shown is for the selected (active) thread. You can activate any thread and frame to inspect them or show the respective source code line in the editor.

@pyscripter
Copy link

pyscripter commented Jan 18, 2025

By the way, I am in awe by the sys.monitoring.DISABLE magic. which appears to work on a per line (code location basis).

Your little test:

import time

def test():
    start_time = time.time()
    for i in range(1000):
        for j in range(1000):
            j + i
    cost = time.time() - start_time
    print("Elapsed time: ", cost)
    f(0)

def f(x):
    # Set breakpoint here
    x *= 2
    return x + 1

test()

results in just a few calls to dispatch_line compared to over 1 million for the current bdb.

using monitoring
dispatch_line:  2
dispatch_line:  4
dispatch_line:  13
dispatch_line:  18
+++ call test None
dispatch_line:  5
dispatch_line:  6
dispatch_line:  7
dispatch_line:  8
dispatch_line:  7
dispatch_line:  6
dispatch_line:  9
dispatch_line:  10
Elapsed time:  0.00011396408081054688
dispatch_line:  11
+++ <test> 11 test : f(0)

Fantastic!

@gaogaotiantian
Copy link
Member Author

gaogaotiantian commented Jan 19, 2025

Hi @brandtbucher @markshannon , I'd like to know if we can move this forward. As you can tell the changes to pdb is minimum (so for other debuggers derived from bdb, they can either keep all the same code to only use the sys.settrace or change a little bit to enjoy both backends). We basically created another optional path to use sys.monitoring, which, at this point, just mimics what sys.settrace does.

The documentation is not done yet but it should be relatively simple - the interfaces are mostly kept (except for the new start_trace() and stop_trace() functions).

I really want to land this for 3.14 - it will be enabled by default for pdb, but not any debugger derived from pdb or bdb. (I also got the blessing from @Yhg1s during the sprint).

@pyscripter
Copy link

pyscripter commented Jan 20, 2025

@gaogaotiantian Any chance of provide an option to enable the tracing of other threads, when using monitoring?
This is possible with the current Bdb by calling settrace inside thread.run.

It could be just an option in _MonitoringTracer.

@gaogaotiantian
Copy link
Member Author

Like I said, it's on the todo-list. For CPython, it's not simply implement it and go - we need to consider other potential impacts. We need to make sure all other stuff works well with the changes. If bdb supports multi-thread, it would be natural for pdb to support at least some features. It won't be a super quick patch.

Also this PR is not even approved at this point. We can discuss features based on this PR after it got merged.

@pyscripter
Copy link

pyscripter commented Jan 20, 2025

Maybe I did not make myself clear. I did not ask for full mutli-threaded support in this PR. Just an option like this:

    def callback_wrapper(func):
        import functools

        @functools.wraps(func)
        def wrapper(self, *args):
            if self.single_thread and (self._tracing_thread != threading.current_thread()):
                return
            try:

self.single_thread would be True by default.

@gaogaotiantian
Copy link
Member Author

Yes, I'm aware what you are asking for - that's not a trivial feature. It's a small change code wise, but it brings multi-thread support to bdb and that's a promise. For now, I'm only focused on how to merge this PR, with as less difference to the existing implementation as possible.

@pyscripter
Copy link

pyscripter commented Jan 29, 2025

I found a bug in the code.

If a breakpoint has a condition or ignore > 0 the code line is disabled and the breakpoint is not triggered when it should.

Example:

def main():
    sum = 0
    for i in range(1000):
        sum = sum + i
    print(sum)

if __name__ == '__main__':
    main()

If you set a conditional breakpoint in line 4, with a condition "sum > 0" or "i==100", the breakpoint will never fire. Same if you set the ignore property of the breakpoint to a value > 0.

The reason is that although break_here returns False the first time it could return True at a later point.

Suggested fix:

    def dispatch_line(self, frame):
        """Invoke user function and return trace function for line event.

        If the debugger stops on the current line, invoke
        self.user_line(). Raise BdbQuit if self.quitting is set.
        Return self.trace_dispatch to continue tracing in this scope.
        """
        if self.stop_here(frame) or self.break_here(frame):
            self.user_line(frame)
            if self.quitting: raise BdbQuit
        elif not self.get_break(frame.f_code.co_filename, frame.f_lineno):
            self.disable_current_event()
        return self.trace_dispatch

@gaogaotiantian
Copy link
Member Author

You are right, this is a bug. Fixed with regression tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
X Tutup