New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isolate PyModuleDef to Each Interpreter for Extension/Builtin Modules #101758
Comments
|
Well, this is the main reason why single-phase-init modules don't support multiple interpreters well. |
This change is almost entirely moving code around and hiding import state behind internal API. We introduce no changes to behavior, nor to non-internal API. (Since there was already going to be a lot of churn, I took this as an opportunity to re-organize import.c into topically-grouped sections of code.) The motivation is to simplify a number of upcoming changes. Specific changes: * move existing import-related code to import.c, wherever possible * add internal API for interacting with import state (both global and per-interpreter) * use only API outside of import.c (to limit churn there when changing the location, etc.) * consolidate the import-related state of PyInterpreterState into a single struct field (this changes layout slightly) * add macros for import state in import.c (to simplify changing the location) * group code in import.c into sections *remove _PyState_AddModule() #101758
|
#101919 removed |
|
Or can we be absolutely sure no stable ABI extension since Python 3.2 could use it? |
|
tl;dr I'm fine with adding it back (and will do so today). Thanks for bringing this up! In my mind, it's exceptionally unlikely that anyone is using the function, but obviously not impossible. (more explanation below) Also, I left this footnote on the PR:
("6 years ago" was in time for the 3.6 release, meaning the function was exposed as technically public API from 3.2 to 3.5.) Let me also explain why it's unlikely anyone is using the function. It's an undocumented "private" function corresponding to a similarly named documented public function, where the only differences are that I'll also point out that PEP 384 says "All functions starting with _Py are not available to applications.", which I'm guessing is why @serhiy-storchaka moved All that said, while there's a limit to how conservative we can be with the stable ABI, from a practicality perspective, in this case there really is no motivation to remove this function from the stable ABI other than to clean up. So I don't mind adding it back. I'll do that today. |
We're adding the function back, only for the stable ABI symbol and not as any form of API. I had removed it yesterday. This undocumented "private" function was added with the implementation for PEP 3121 (3.0, 2007) for internal use and later moved out of the limited API (3.6, 2016) and then into the internal API (3.9, 2019). I removed it completely yesterday, including from the stable ABI manifest (where it was added because the symbol happened to be exported). It's unlikely that anyone is using _PyState_AddModule(), especially any stable ABI extensions built against 3.2-3.5, but we're playing it safe. #101758
|
Are the |
|
Yeah. The failures should be fixed now though. |
|
To be clear, I've skipped the leaky/broken tests in test_imp. If there are other refleak failures, they should not be related to the work I've been doing. |
Thank you!
I do think it's possible that nothing uses this, and maybe we can even prove nothing does, but proving it is harder than keeping the function. Anyway, here's a plausible breaking scenario: something like a language binding gets the list of exposed ABI symbols, and re-exports all of them, regardless of whether they're usable. Removing any symbol will break them.
Yeah, but the trouble is that if they were called e.g. by a public macro, they need to stay in the ABI. Who knows how it's been used over the years -- it's possible to find out, but the archaeology is usually more trouble than keeping the function in. For
FWIW, in cases like these, turning the function into a stup that always raises is acceptable (as a last resort of course), since that doesn't technically break the ABI. (It would break specific code paths that the new CPython presumably can't support, rather than the entire extension being not importable due to a missing symbol.) |
|
Did you meant Note that the current In 3.2: Current: I am sure |


Typically each
PyModuleDeffor a builtin/extension module is a static global variable. Currently it's shared between all interpreters, whereas we are working toward interpreter isolation (for a variety of reasons). Isolating eachPyModuleDefis worth doing, especially if you consider we've already run into problems1 because ofm_copy.The main focus here is on
PyModuleDef.m_base.m_copy2 specifically. It's the state that facilitates importing legacy (single-phase init) extension/builtin modules that do not support repeated initialization3 (likely the vast majority).(expand for more context)
PyModuleDeffor an extension/builtin module is usually stored in a static variable and (with immortal objects, see gh-101755) is mostly immutable. The exception ism_copy, which is problematic in some cases for modules imported in multiple interpreters.Note that
m_copyis only relevant for legacy (single-phase init) modules, whether builtin and an extension, and only if the module does not support repeated initialization3. It is never relevant for multi-phase init (PEP 489) modules.m_copyis only set by_PyImport_FixupExtensionObject()(and thus indirectly_PyImport_FixupBuiltin()and_imp.create_builtin())_PyImport_FixupExtensionObject() is called by_PyImport_LoadDynamicModuleWithSpec()` when a legacy (single-phase init) extension module is loadedm_copyis only used inimport_find_extension(), which is only called by_imp.create_builtin()and_imp.create_dynamic()(via the respective importers)When such a legacy module is imported for the first time,
m_copyis set to a new copy of the just-imported module's__dict__, which is "owned" by the current interpreter (the one importing the module). Whenever the module is loaded again (e.g. reloaded or deleted fromsys.modulesand then imported), a new empty module is created andm_copyis [shallow] copied into that object's__dict__.When
m_copyis originally initialized, normally that will be the first time the module is imported. However, that code can be triggered multiple times for that module if it is imported under a different name (an unlikely case but apparently a real one). In that case them_copyfrom the previous import is replaced with the new one right after it is released (decref'ed). This isn't the ideal approach but it's also been the behavior for quite a while.The tricky problem here is that the same code is triggered for each interpreter that imports the legacy module. Things are fine when a module is imported for the first time in any interpreter. However, currently, any subsequent import of that module in another interpreter will trigger that replacing code. The second interpreter decref's the old
m_copy, but that object is "owned" by the first interpreter. This is a problem1.Furthermore, even if the decref-in-the-wrong-interpreter problem was gone. When
m_copyis copied into the new module's__dict__on subsequent imports, it's only a shallow copy. Thus such a legacy module, imported in other interpreters than the first one, would end up with its__dict__filled with objects not owned by the correct interpreter.Here are some possible approaches to isolating each module's
PyModuleDefto the interpreter that imports it:PyModuleDeffor each interpreter (would_PyRuntimeState.imports.extensionsneed to move to the interpreter?)m_copyfor/on each interpreter_PyImport_FixupExtensionObject()some other way...Linked PRs
Footnotes
see https://github.com/python/cpython/pull/101660#issuecomment-1424507393↩ ↩ 2
We should probably consider isolating↩
PyModuleDef.m_base.m_index, but for now we simply sync themodules_by_indexlist of each interpreter. (Also,modules_by_indexandm_indexare only used for single-phase init modules.)specifically↩ ↩ 2
def->m_size == -1; multi-phase init modules always havedef->m_size >= 0; single-phase init modules can also have a non-negativem_sizeThe text was updated successfully, but these errors were encountered: