Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyImport_ImportModule #126038

Closed
mwriter opened this issue Oct 27, 2024 · 19 comments
Closed

PyImport_ImportModule #126038

mwriter opened this issue Oct 27, 2024 · 19 comments
Labels
topic-C-API topic-importlib type-bug An unexpected behavior, bug, or error

Comments

@mwriter
Copy link

mwriter commented Oct 27, 2024

Bug report

Bug description:

import sys

import json
print(json) # <module 'json'>
print(sys.modules["json"]) # <module 'json'>

import ctypes
print(ctypes.pythonapi.PyImport_ImportModule(ctypes.c_char_p(b"json"))) # 2356734321

sys.modules = sys.modules.copy()

import email
print(email) # <module 'email'>
print(sys.modules["email"]) # <module 'email'>

print(ctypes.pythonapi.PyImport_ImportModule(ctypes.c_char_p(b"email"))) # KeyError: 'email'

CPython versions tested on:

3.12, 3.13

Operating systems tested on:

No response

@mwriter mwriter added the type-bug An unexpected behavior, bug, or error label Oct 27, 2024
@mwriter mwriter changed the title sys.modules bug PyImport_ImportModule Oct 27, 2024
@ZeroIntensity
Copy link
Member

I'll investigate this a little. At first, I thought this was just user-error, but all PyImport_ImportModule does is call builtins.__import__, which does seem to work with the reproducer, so I'm not exactly sure what's going on.

@mwriter
Copy link
Author

mwriter commented Oct 27, 2024

so I'm not exactly sure what's going on

I'm researching the development of event audit ideas. And these probably won't be the last problems found :)

@ZeroIntensity
Copy link
Member

A fair warning: issues resulting from manual manipulation of sys.modules probably won't get fixed. I'll see if there's something obvious that can be worked around here, but it might very well be a wontfix.

@mwriter
Copy link
Author

mwriter commented Oct 27, 2024

I'll see if there's something obvious that can be worked around here, but it might very well be a wontfix.

My job is to write about strange things. But the verdict is yours :)
And the manipulation of sys.modules does not affect the python code version in any way.
Since the behavior is very different for actually the same import, this is very similar to a bug.

@ZeroIntensity
Copy link
Member

And the manipulation of sys.modules does not affect the python version in any way.

It does, the interpreter relies on Python modules like traceback--if you break sys.modules and stop those imports from working, there's nothing we can do.

@mwriter
Copy link
Author

mwriter commented Oct 27, 2024

At first, I thought this was just user-error, but all PyImport_ImportModule does is call builtins.__import__

Also

import sys

import builtins

name = "json"
print(builtins.__import__(name)) # <module 'json'>
print(ctypes.pythonapi.PyImport_ImportModule(ctypes.c_char_p(name.encode()))) # 3536781451

sys.modules = sys.modules.copy()

name = "email"
print(builtins.__import__(name)) # <module 'email'>
print(ctypes.pythonapi.PyImport_ImportModule(ctypes.c_char_p(name.encode()))) # KeyError: 'email'

@mwriter
Copy link
Author

mwriter commented Oct 27, 2024

I thought this was just user-error, but all PyImport_ImportModule does is call builtins.__import__, which does seem to work with the reproducer, so I'm not exactly sure what's going on.

Do you know about any internal caches?
I have a suspicion that PyImport_ImportModule is working with the previous version of sys.modules before copying.

@mwriter
Copy link
Author

mwriter commented Oct 27, 2024

This is definitely a bug :)

import sys
import builtins

sms = sys.modules
sys.modules = sms.copy()

name = "email"

m = builtins.__import__(name)
print(m) # <module 'email'>
sms[name] = m
print(ctypes.pythonapi.PyImport_ImportModule(ctypes.c_char_p(name.encode()))) # 456145783

@ZeroIntensity
Copy link
Member

Ah, I found the issue--it's a wontfix. The interpreter uses two references to the modules dictionary: the actual one on the interpreter state, and then sys.modules is simply a reference to that. Reassigning sys.modules doesn't update the one on the interpreter state, so there ends up being two mismatched copies of it. Python code uses sys.modules, and then C code uses the interpreter state.

The fix would be to allow changes made on sys to directly affect the sysdict, but that's a). a lot of work for a niche feature and b). unsafe anyway, because the user could put something other than a dictionary in there, which can segfault depending on what the finalizer does (sysdict gets cleared after everything else in the interpreter, so lots of C API calls fail).

@ZeroIntensity ZeroIntensity added the pending The issue will be closed if no feedback is provided label Oct 27, 2024
@ZeroIntensity ZeroIntensity closed this as not planned Won't fix, can't repro, duplicate, stale Oct 27, 2024
@ZeroIntensity ZeroIntensity removed the pending The issue will be closed if no feedback is provided label Oct 27, 2024
@mwriter
Copy link
Author

mwriter commented Oct 27, 2024

The interpreter uses two references to the modules dictionary: the actual one on the interpreter state, and then sys.modules is simply a reference to that.

I'm using a similar idea but for a different task. But the main collection is still only one.

Reassigning sys.modules doesn't update the one on the interpreter state, so there ends up being two mismatched copies of it. Python code uses sys.modules, and then C code uses the interpreter state.

And is this considered normal and beyond fixing?

@ZeroIntensity
Copy link
Member

And is this considered normal and beyond fixing?

It's "fixable", but would be such a large change that a PEP would probably be needed. The main issue is that it's unsafe to store arbitrary objects in the sysdict. It's much easier to just use the private C API and mess with the interpreter state if you want to change it, but accept that you're in uncharted waters when doing that.

If you really do want this without the C API or some other black magic, then you'll have to go to DPO, and make a convincing argument for a). why this is needed and b). how it would be implemented (the finalizers part will be exceptionally difficult to figure out).

@mwriter
Copy link
Author

mwriter commented Oct 27, 2024

The main issue is that it's unsafe to store arbitrary objects in the sysdict

But at the same time, this is allowed in the python version ;)

If you really do want this without the C API or some other black magic, then you'll have to go
I would still like to hear the core developers' answers here.

I've already talked enough there and I'll talk more. But I was told to write about bugs here. And I finally want to hear not references back and forth but the opinions of the core developers.
If at least five core developers say that such duality is normal, then I will take note of it and willn't interfere :)
Because an argument like "it requires a PEP, so we won't fix it" seems very strange to me ;)

@ZeroIntensity
Copy link
Member

I don't think this is a bug at this point, it's a feature. The behavior works exactly as I would expect it to, knowing that the interpreter's module dictionary cannot be overwritten.

@mwriter
Copy link
Author

mwriter commented Oct 27, 2024

I don't think this is a bug at this point, it's a feature. The behavior works exactly as I would expect it to, knowing that the interpreter's module dictionary cannot be overwritten.

I understand your opinion. But I think you understood mine too :)
It remains to hear the core developers.

@encukou
Copy link
Member

encukou commented Oct 28, 2024

See the sys.modules docs:

This can be manipulated to force reloading of modules and other tricks. However, replacing the dictionary will not necessarily work as expected and deleting essential items from the dictionary may cause Python to fail.

You're welcome to play around with this cache. It can be quite useful for debugging or, if you're careful, testing. But if it breaks, you get to keep both pieces.

@encukou
Copy link
Member

encukou commented Oct 28, 2024

That said, if this was a crash, rather than exception or other unexpected behaviour, we would want to fix it.

@ZeroIntensity
Copy link
Member

It might be worth cross-referencing this from sys.modules onto PyImport_ImportModule (or more specifically PyImport_Import).

@encukou
Copy link
Member

encukou commented Oct 28, 2024

Perhaps the sys.modules warning should be stonger. But I don't think we need to add docs to all the places that can break if you reassign sys.modules.

@ZeroIntensity
Copy link
Member

I think the more useful note would be to mention that PyImport_Import doesn't 100% do what __import__ does, that's what had me tripped up here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-C-API topic-importlib type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants