X Tutup
The Wayback Machine - https://web.archive.org/web/20221003132104/https://github.com/python/cpython/issues/72352
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The 'subprocess' module leaks memory when called in certain ways #72352

Closed
Xavion mannequin opened this issue Sep 15, 2016 · 21 comments
Closed

The 'subprocess' module leaks memory when called in certain ways #72352

Xavion mannequin opened this issue Sep 15, 2016 · 21 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@Xavion
Copy link
Mannequin

Xavion mannequin commented Sep 15, 2016

BPO 28165
Nosy @vstinner, @bitdancer, @ztane, @The-Compiler
Files
  • Memory-Leak-Test.zip: All test files - in correct hierarchy
  • Test.sh: Memory monitoring script
  • Test-1.py: First test case
  • Test-2.py: Second test case
  • Test-1-no-gc.log: First test results - no garbage collection
  • Test-1-gc.log: First test results - garbage collection enforced
  • Test-2-no-gc.log: Second test results - no garbage collection
  • Test-2-gc.log: Second test results - garbage collection enforced
  • Test-3.py
  • Test-3a.py: Third test case (revised)
  • Test-3a-no-gc.log: Third test results - no garbage collection
  • Test-3a-gc.log: Third test results - garbage collection enforced
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2016-09-15.04:31:15.193>
    labels = ['library', 'performance']
    title = "The 'subprocess' module leaks memory when called in certain ways"
    updated_at = <Date 2016-09-20.22:41:54.050>
    user = 'https://bugs.python.org/Xavion'

    bugs.python.org fields:

    activity = <Date 2016-09-20.22:41:54.050>
    actor = 'Xavion'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2016-09-15.04:31:15.193>
    creator = 'Xavion'
    dependencies = []
    files = ['44723', '44724', '44725', '44726', '44727', '44728', '44729', '44730', '44749', '44761', '44762', '44763']
    hgrepos = []
    issue_num = 28165
    keywords = []
    message_count = 19.0
    messages = ['276514', '276515', '276517', '276523', '276539', '276775', '276794', '276850', '276950', '276986', '276987', '276989', '276990', '276994', '277009', '277010', '277012', '277015', '277077']
    nosy_count = 5.0
    nosy_names = ['vstinner', 'r.david.murray', 'ztane', 'The Compiler', 'Xavion']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue28165'
    versions = ['Python 3.5']

    @Xavion
    Copy link
    Mannequin Author

    Xavion mannequin commented Sep 15, 2016

    Each time I run a shell command via the 'subprocess' module, I notice that the memory footprint of my program increases by roughly 4 KiB.

    I've tested the problem with two different slices of code; the result is the same in either case (long after the function finishes).

    Code slice 1:
    check_output("true")

    Code slice 2:
    pTest = Popen("true", stdout=PIPE, stderr=PIPE)
    pTest.wait()
    pTest.stdout.close()
    pTest.stderr.close()
    del pTest
    gc.collect()

    I'm using Python v3.5.2-1 on Arch Linux; it was installed via the [extra] repository. Let me know if you need any further information.

    @Xavion Xavion mannequin added the stdlib Python modules in the Lib dir label Sep 15, 2016
    @ztane
    Copy link
    Mannequin

    ztane mannequin commented Sep 15, 2016

    3.5.1+ ubuntu; I run the Popen case in while True, and watch top - not a single digit changes in the memory usage (the last digit being the kilobytes). That the memory footprint increases once by 4KiB is nothing; please run this in a loop.

    @Xavion
    Copy link
    Mannequin Author

    Xavion mannequin commented Sep 15, 2016

    I wouldn't have reported this if it was only happening *once*. I already have it in a loop; a new shell command is fired every second.

    The memory footprint increases by roughly 4 KiB *each* time. I monitor it via the following Bash script:
    while true; do
    ps -C "python3 ./Program.pyw" -o pid=,%mem=,rss= >> ./Output.log
    sleep 1
    done &

    I have attached the logfile for your convenience. Let me know if you'd like me to run any other tests.

    @Xavion
    Copy link
    Mannequin Author

    Xavion mannequin commented Sep 15, 2016

    It's easier to reproduce the issue if you use a timer (rather than a loop). The newly attached logfile was generated with the following code fragment.

        def fTest() :
            check_output("true")
            threading.Timer(1, fTest, ()).start()

    @ztane
    Copy link
    Mannequin

    ztane mannequin commented Sep 15, 2016

    Ahhah, the title should say: subprocess module leaks 4kiB memory **per thread**.

    @pppery pppery mannequin added the performance Performance or resource usage label Sep 15, 2016
    @Xavion
    Copy link
    Mannequin Author

    Xavion mannequin commented Sep 17, 2016

    Okay, I've modified the title to match what I've written below.

    I've just run some further tests on this problem. The attached archive contains code samples and the output generated (both with and without garbage collection).

    As you can see, the memory stays constant in the first case. In the second, the presence of the loop is probably what throws a spanner in the works. Garbage collection seems to make the outcome slightly worse (in the second case).

    The situation isn't as bad as I first reported, but the memory does nonetheless keep increasing in the second case (which it probably shouldn't).

    @Xavion Xavion mannequin changed the title The 'subprocess' module leaks roughly 4 KiB of memory per call The 'subprocess' module leaks memory when called in certain ways Sep 17, 2016
    @bitdancer
    Copy link
    Member

    bitdancer commented Sep 17, 2016

    Could you post files instead of a zip, please? It will be easier to review.

    @Xavion
    Copy link
    Mannequin Author

    Xavion mannequin commented Sep 17, 2016

    I put them into an archive so that the folder hierarchy would be preserved. Doing it that way makes it faster for you guys to run the tests at your end.

    Nonetheless, I will post the seven (7) files individually as well. It doesn't look like I can upload more than one at a time, so get ready for a few emails!

    @bitdancer
    Copy link
    Member

    bitdancer commented Sep 19, 2016

    I can't reproduce this with either python3.4.3 or 3.5 or 3.6 tip running it on gentoo linux. For me it bumps up initially but then remains constant even if I let it run for many more probes than in your example.

    I'm not sure what to suggest to you for further debugging this. It is surprising that Arch would have a different behavior than Gentoo in this context. If we are lucky maybe someone else will be able to reproduce it.

    @Xavion
    Copy link
    Mannequin Author

    Xavion mannequin commented Sep 19, 2016

    Wow, that is surprising (given how simple it is)! Did you try both tests? Remember that only the second one produces the bug here.

    Let's leave this sit for a while. If no-one else can reproduce it on their OSs/distributions, I'll seek advice from the Arch community.

    @bitdancer
    Copy link
    Member

    bitdancer commented Sep 19, 2016

    I only ran the second one. I didn't bother with the first one :)

    @vstinner
    Copy link
    Member

    vstinner commented Sep 19, 2016

    I'm unable to reproduce any memory leak on subprocess itself:
    ---

    import tracemalloc; tracemalloc.start()
    import subprocess, gc
    
    def func(loops) :
        for x in range(loops):
            proc = subprocess.Popen(['true'])
            with proc:
                proc.wait()
    
    # warmup
    func(10)

    gc.collect();gc.collect();gc.collect()
    print(tracemalloc.get_traced_memory()[1])

    func(100)

    gc.collect();gc.collect();gc.collect()
    print(tracemalloc.get_traced_memory()[1])

    gc.collect();gc.collect();gc.collect()
    print(tracemalloc.get_traced_memory()[1])

    func(100)

    gc.collect();gc.collect();gc.collect()
    print(tracemalloc.get_traced_memory()[1])
    ---

    Output on Fedora 24 (Linux) and Python 3.5:
    ---
    996450
    996450
    996450
    996450
    ---

    @vstinner
    Copy link
    Member

    vstinner commented Sep 19, 2016

    No memory leak if subprocess is spawned in a thread neither:
    ---

    import tracemalloc; tracemalloc.start()
    import subprocess, threading, time, gc
    
    def spawn(event) :
        subprocess.check_output("true")
        gc.collect(), gc.collect(), gc.collect()
        event.set()
    
    def func(loops):
        event = threading.Event()
        for x in range(loops):
            event.clear()
            timer = threading.Timer(0, spawn, (event,))
            timer.start()
            event.wait()
    
    func(100)

    gc.collect();gc.collect();gc.collect();gc.collect()
    a = tracemalloc.get_traced_memory()[1]
    print("first", a, "B")

    loops = 1000
    func(loops)

    gc.collect();gc.collect();gc.collect();gc.collect()
    b = tracemalloc.get_traced_memory()[1]
    print("after", loops, "loops, mem:", b, "B")

    d = (b-a) / loops
    print("diff: %.1f B/loop" % d)
    
    loops = 1000
    func(loops)

    gc.collect();gc.collect();gc.collect();gc.collect()
    c = tracemalloc.get_traced_memory()[1]
    print("after", loops, "loops, mem:", c, "B")

    d = (c-b) / loops
    print("diff2: %.1f B/loop" % d)

    Output:
    ---
    first 1013738 B
    after 1000 loops, mem: 1014266 B
    diff: 0.5 B/loop
    after 1000 loops, mem: 1014318 B
    diff2: 0.1 B/loop
    ---

    Sorry, 0.5 byte/loop is not a memory leak :-)

    @Xavion
    Copy link
    Mannequin Author

    Xavion mannequin commented Sep 20, 2016

    What about when you test it using the files I provided? I didn't want you guys to have to write your own code.

    Note that I was monitoring the memory externally (via good old 'ps'). This could make a difference to the outcome.

    @vstinner
    Copy link
    Member

    vstinner commented Sep 20, 2016

    If tracemalloc doesn't show any leak but the RSS memory increases, it can
    be memory fragmentation or memory alloctions not traced by tracemalloc.

    @ztane
    Copy link
    Mannequin

    ztane mannequin commented Sep 20, 2016

    The title of the issue is still wrong. As I noted before the problem is not with subprocess leaking 4K memory *always*. The issue comes from the fact that subprocess seems to leak 4K memory per individual thread. The test code to use is thus

        def test():
            check_output("true")
            threading.Timer(1, test, ()).start()
    
        test()

    which will invoke subprocess always in a new thread. Using subprocess in a loop, or using the timer as above without subprocess will not increase memory usage.

    I have changed the title accordingly

    @ztane ztane mannequin changed the title The 'subprocess' module leaks memory when called in certain ways The 'subprocess' module leaks 4 kiB memory for each thread Sep 20, 2016
    @Xavion
    Copy link
    Mannequin Author

    Xavion mannequin commented Sep 20, 2016

    haypo: So, what is the result when you run "Test-2.py" and monitor the memory usage with "Test.sh"?

    ztane: The code you've provided is the same as "Test-1.py". You need to run "Test-2.py" in order to see the bug!

    @vstinner
    Copy link
    Member

    vstinner commented Sep 20, 2016

    Test-2.py has issues:

    • it doesn't call Timer.join()
    • it uses a weak synchronization between the main thread and the Timer thread: see msg276990 for an example using Event

    If you use a better synchronization code, call timer.join() and call gc.collect(), the memory usage is very stable even after creating more than 100 000 threads + subprocesses.

    Xavion: "As I noted before the problem is not with subprocess leaking 4K memory *always*. The issue comes from the fact that subprocess seems to leak 4K memory per individual thread."

    I'm unable to reproduce a *leak*.

    Test-3.py output:
    ----
    497 thread+subprocess
    VmRSS: 9584 kB
    986 thread+subprocess
    VmRSS: 9596 kB <== after the warmup, the usage seems stable
    1490 thread+subprocess
    VmRSS: 9596 kB
    (...)
    10361 thread+subprocess
    VmRSS: 9596 kB
    (...)
    30282 thread+subprocess
    VmRSS: 9596 kB
    30695 thread+subprocess
    VmRSS: 9672 kB
    31160 thread+subprocess
    VmRSS: 9684 kB <=== memory usage decreases :-)
    (...)
    60768 thread+subprocess
    VmRSS: 9684 kB
    ^C
    ---

    If you really want to say that something is wrong: I don't understand why we must call gc.collect() to keep the memory usage stable. But I guess that the GC is not always called for performance.

    Without the GC it's not that bad:
    ---
    1083 thread+subprocess
    VmRSS: 9764 kB
    2097 thread+subprocess
    VmRSS: 9888 kB
    3136 thread+subprocess
    VmRSS: 9888 kB
    (...)
    11750 thread+subprocess
    VmRSS: 9888 kB
    12668 thread+subprocess
    VmRSS: 9940 kB
    13705 thread+subprocess
    VmRSS: 9940 kB
    (...)
    70948 thread+subprocess
    VmRSS: 9940 kB
    ^C
    ---

    There is no such "4k leak per function call".

    I close the issue. It's a bug in your code, not in Python.

    @Xavion
    Copy link
    Mannequin Author

    Xavion mannequin commented Sep 20, 2016

    Firstly, you've misquoted me. The quote you attributed to me in your latest post was actually made by 'ztane'.

    Secondly, your extra thread/event code makes no difference here. I will attach the memory usage logs in subsequent posts.

    For consistency, I have removed all of the collateral stuff from your "Test-3.py" script and reattached it here as "Test-3a.py".

    @Xavion Xavion mannequin reopened this Sep 20, 2016
    @Xavion Xavion mannequin changed the title The 'subprocess' module leaks 4 kiB memory for each thread The 'subprocess' module leaks memory when called in certain ways Sep 20, 2016
    @Xavion Xavion mannequin removed the invalid label Sep 20, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @certara-msale
    Copy link

    certara-msale commented May 8, 2022

    Was there ever a resolution to this question? I think I'm having the same issue, memory gradually creeping up with calls to pOpen. I'm saving the process

    proc = Popen(command, stdout=DEVNULL, stderr=STDOUT)
    then calling poll to see if it is done
    if not proc.poll() is None

    seems to accumulate ~ 5 Gb of memory with 100,000 calls.
    Is the a better way to do this, or was there an answer to the original question?

    thanks

    @vstinner
    Copy link
    Member

    vstinner commented May 23, 2022

    Test-3a.py doesn't show any memory leak. I close the issue again.

    Output on Linux (Fedora) with Python 3.11 beta1 if I uncomment the grep line:

    $ python3.11 Test-3a.py 
    .VmRSS:	   10472 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10700 kB
    .VmRSS:	   10704 kB
    .VmRSS:	   10704 kB
    .VmRSS:	   10704 kB
    .VmRSS:	   10704 kB
    .VmRSS:	   10704 kB
    .VmRSS:	   10704 kB
    .VmRSS:	   10704 kB
    (...)
    

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants
    X Tutup