pkgutil.walk_packages returns extra modules #58992

cjerdonek · 2012-05-12T09:01:11Z

BPO	14787
Nosy	@ncoghlan, @ericvsmith, @merwok, @cjerdonek, @scorphus

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2012-05-12.09:01:10.723>
labels = ['type-bug', 'library', 'docs']
title = 'pkgutil.walk_packages returns extra modules'
updated_at = <Date 2020-01-29.00:16:41.772>
user = 'https://github.com/cjerdonek'

bugs.python.org fields:

activity = <Date 2020-01-29.00:16:41.772>
actor = 'brett.cannon'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation', 'Library (Lib)']
creation = <Date 2012-05-12.09:01:10.723>
creator = 'chris.jerdonek'
dependencies = []
files = []
hgrepos = []
issue_num = 14787
keywords = []
message_count = 11.0
messages = ['160464', '160469', '165094', '165537', '165605', '165612', '165618', '165627', '205021', '221986', '261589']
nosy_count = 10.0
nosy_names = ['ncoghlan', 'eric.smith', 'eric.araujo', 'Arfrever', 'chris.jerdonek', 'docs@python', 'gennad', 'faassen', 'scorphus', 'Andrey Nehaychik']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue14787'
versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']

cjerdonek · 2012-05-12T09:01:11Z

pkgutil.walk_packages(paths) seems to return incorrect results when the name of a subpackage of a path in paths matches the name of a package in the standard library. It both excludes modules it should include, and includes modules it should exclude. Here is an example:

mkdir temp
touch temp/init.py
touch temp/foo.py
mkdir temp/logging
touch temp/logging/init.py
touch temp/logging/bar.py
python
Python 3.2.3 (default, Apr 29 2012, 01:19:06)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> from pkgutil import walk_packages
>>> for info in walk_packages(['temp']):
...   print(info[1], info[0].path)
... 
foo temp
logging temp
logging.config /opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/logging
logging.handlers /opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/logging
>>>

Observe that logging.bar is absent from the list, and logging.config and logging.handlers are included.

gennad · 2012-05-12T13:32:23Z

I confirm this behavior in 2.7 and 3.2 versions. In my 3.3.0a3+ it actually outputs nothing.
Also note that if you rename logging to logging2, you actually get

foo temp
logging2 temp

brettcannon · 2012-07-09T16:25:06Z

So the lack of output in 3.3 is not surprising as walk_packages() won't work with the new import implementation as it relies on a non-standard method on loaders that import does not provide.

cjerdonek · 2012-07-15T17:20:26Z

For the record, this issue is still present after Nick's pkgutil changes documented in bpo-15343 (not that I expected it to be resolved since this issue is a bit different).

ncoghlan · 2012-07-16T13:53:43Z

Right, this is a separate bug in pkgutil. Specifically, when it goes to import a package in order to check it for submodules, it invokes the global import system via __import__() rather than constraining the import to the path argument supplied to walk_packages.

This means that it will only find it if the path being walked is already on sys.path. In the case of your example, it isn't (it's on a subdirectory).

The reason my new tests didn't pick this up is that they're built on the test_runpy infrastructure, and one of the steps in that infrastructure is to add the new package path to sys.path so it can be imported.

This isn't an easy one to fix - you basically need something along the lines of a PEP-406 style import engine API in order to do the import without having potentially adverse effects on the state in the sys module.

ncoghlan · 2012-07-16T14:22:20Z

At the very least, the pkgutil docs need to state clearly that walk_packages only works properly with sys.path entries, and the constraint feature may not descend into packages correctly if an entry is shadowed by a sys.modules entry or an entry earlier on sys.meta_path or sys.path.

ncoghlan · 2012-07-16T14:35:17Z

I just realised this is going to behave strangely with namespace packages as well: the __import__ step will pick up *every* portion of the namespace package, not just those defined in the identified subset of sys.path.

cjerdonek · 2012-07-16T15:39:22Z

This isn't an easy one to fix - you basically need something along the lines of a PEP-406 style import engine API in order to do the import without having potentially adverse effects on the state in the sys module.

By adverse, do you just mean side effects? If so, since the documentation doesn't explicitly say so, is there any reason for the user to think there shouldn't be side effects? For example, I tried this in Python 2.7:

>>> import os, sys, pkgutil, unittest
>>> len(sys.modules)
86
>>> g = pkgutil.walk_packages([os.path.dirname(unittest.__file__)])
>>> len(sys.modules)
86
>>> for i in g:
...   pass
... 
>>> len(sys.modules)
95

Or maybe this isn't what you mean. If not, can you provide an example?

faassen · 2013-12-02T15:58:02Z

I just ran into this bug myself with namespace packages (in Python 2.7). When you have multiple packages (ns.a, ns.b) under a namespace package (ns), and constrain the paths in walk_packages so it should only pick up ns.a, it will pick up ns.b as well.

Any hope for a fix or workaround?

BreamoreBoy · 2014-06-30T21:32:45Z

Note that this is reference from bpo-15358.

AndreyNehaychik · 2016-03-11T18:14:45Z

Any hope to add the warning in pkgutil docs about this problem?

For example:
Warning!!! The walk_packages function uses sys.path to import nested packages for provided paths. It means it walks deeply by relative import for subpackages. If you provide path that is not in sys.path as an argument the result won't be correct.

cjerdonek added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels May 12, 2012

ncoghlan added the docs Documentation in the Doc dir label Jul 16, 2012

ncoghlan assigned docspython Jul 16, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

rexor12 mentioned this issue Oct 31, 2022

Discovery fails due to looking for non-existent packages rexor12/kanata#23

Closed

Nov	DEC	Jan
	15
2023	2024	2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkgutil.walk_packages returns extra modules #58992

pkgutil.walk_packages returns extra modules #58992

cjerdonek commented May 12, 2012

cjerdonek commented May 12, 2012

gennad mannequin commented May 12, 2012

brettcannon commented Jul 9, 2012

cjerdonek commented Jul 15, 2012

ncoghlan commented Jul 16, 2012

ncoghlan commented Jul 16, 2012

ncoghlan commented Jul 16, 2012

cjerdonek commented Jul 16, 2012

faassen mannequin commented Dec 2, 2013

BreamoreBoy mannequin commented Jun 30, 2014

AndreyNehaychik mannequin commented Mar 11, 2016

pkgutil.walk_packages returns extra modules #58992

pkgutil.walk_packages returns extra modules #58992

Comments

cjerdonek commented May 12, 2012

cjerdonek commented May 12, 2012

gennad mannequin commented May 12, 2012

brettcannon commented Jul 9, 2012

cjerdonek commented Jul 15, 2012

ncoghlan commented Jul 16, 2012

ncoghlan commented Jul 16, 2012

ncoghlan commented Jul 16, 2012

cjerdonek commented Jul 16, 2012

faassen mannequin commented Dec 2, 2013

BreamoreBoy mannequin commented Jun 30, 2014

AndreyNehaychik mannequin commented Mar 11, 2016