-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-107137: Add _PyTuple_NewNoTrack() internal C API #107183
Conversation
Fix a crash in PySequence_Tuple() when the tuple which is being initialized was accessed via GC introspection, like the gc.get_referrers() function. Now the function only tracks the tuple by the GC once it's fully initialized. Add _PyTuple_NewNoTrack() and _PyTuple_ResizeNoTrack() functions to the internal C API: similar to PyTuple_New() and _PyTuple_Resize(), but don't track the tuple by the GC. Modify PySequence_Tuple() to use these functions.
9c9f4a7
to
55d1b7a
Compare
gc.get_referrers() documentation has a warning added in 2003 by issue #39117:
|
@pablogsal: in 2021, you wrote:
Are you still against the GC bug in PySequence_Tuple()? Note: I'm only proposing to add functions to the internal C API. Make such API public would deserves its own discussion. Also, I proposed a different API to fix the issue, an internal "tuple builder" API: PR #107139. |
I mark this PR as a draft since in 2021, the issue #59313 was closed as "not a bug" by @pablogsal who wrote:
|
This leaves me worried. The issue where the crash due to incomplete tuples (gh-59313) was closed, basically telling people that this is an acceptable risk, and the risk is due to the existence of cg.get_referrers() (see also gh-39117). It seems that using An alternative way to deal with this is to not call Introducing a new internal API, I feel we have two choices here:
I don't feel that just fixing (This discussion should probably go into an issue, but the issues are all closed.) |
Correct
It sounds like a backward incompatible changes. Such subtle change reminds me when instances of heap types should now call Py_DECREF(Py_TYPE(self)) in their tp_dealloc function, implement tp_traverse and tp_clear, visit the type in tp_traverse, and clear the type in tp_clear :-( I'm not saying that so many changes are needed. My worry is that such incompatible change can be silently ignored, and "suddenly" people will get new GC problems that they didn't have before since their tuples were tracked by the GC. Not tracking tuples by the GC sounds worse then the risk of race condition of code calling gc.get_referrers().
That's why I have a bigger plan: provide a new API to replace PyTuple_New() + PyTuple_SET_ITEM(): a "tuple builder" API, issue #107137. The migration plan doesn't imply to remove immediately the old API. It can be slow. Moreover, at this stage, I only propose an internal C API and so how it goes in Python internals first, how the API is used, and see if it does help fixing the race conditions for us at least.
Sure, that's also an option. But even if we decide to not provide a solution to fix 3rd party C extensions, does it mean that we should not fix known bugs in Python internals if there is a simple solution for that?
I created issue #107137. These days, most discussions happen on pull requests, issues are just created to get "an issue number" for the Changelog and What's New entries :-) I don't understand well the rationale against fixing PySequence_Tuple(). Do you mean that my internal API is too complicated to use? |
APIs, even internal ones, are forever. I would like you to ask other core devs to chime in here rather than it just being you and me. After all the problem has existed for decades, so why hurry now? |
The difference is that we don't provide any backward compatibility warranty on the internal C API. We have greater freedom to change it (rename functions, add parameters, remove functions, etc.)
Sure, I would like to hear more voices. I pinged multiple core devs on the issue, on my first PR and this second PR. I only created this issue very recently, other people seem to be busy (the EuroPython is just over ;-)). Why makes you think that there is a hurry? |
Because you have a pattern of making changes, requesting a review (or not!), and merging before you actually get a review. See e.g. gh-106871. |
From PyO3's perspective, if you want us to replace PyO3 can then use either |
Well. I created this PR to get reviews. 3 weeks later: no review. I pinged 3 core devs (including you). I tried to explain my problem with reviews a few times:
How can I get a review on this internal C API to fix a GC crash? :-) |
Not true. I wrote two separate review comments, but stating concerns about API bloat (even internal) and pointing out that the issue was closed as "won't fix" for a good (IMO) reason. What more do you want? We have one core dev (me) still against and nobody else in favor. In that case the status quo wins. |
Since issue #59313 got closed as "wont fix" two years ago, the C API was discussed and a working group is trying to make it better and address C API Problems. I'm proposing a concrete solution to one of these problems. I wrote a concrete PR to discuss the problem and my specific solution. As explained in the issue related to this PR, issue #107137, the problem is that the C API allows to create incomplete and inconsistent Python objects: capi-workgroup/problems#56 We can incrementally fix the C API to reduce this problem and design a migration path to a safer API which doesn't allow that. Fixing the problem in the internal C API is a first step, and IMO it's worth it even if the work stops here (since it does fix a concrete crash).
Well, it would be nice to hear other voices. It seems like @davidhewitt is interested by a fix for this C API issue for PyO3. |
It might also make sense to just wait until the core dev sprint in Brno, where the future of the C API will be discussed. |
FYI, I don't think the problem is just with |
But for tuple objects,
This I'm not so sure about -- IIUC the bug reported in gh-59313 shows how to get the GC to run (note it was closed because it was deemed an acceptable risk, not because it was fixed). FWIW This PR should reference that issue in its title, not gh-107137 (which is about adding a different tuple building API). |
While I'm not aware of crashes in the wild caused by this in PyO3, the issue in #59313 probably does affect Rust code using PyO3. We allow tuples to be built from Rust iterators. Users have the flexibility to run arbitrary Python code while the uninitialized tuple is under construction. We could work around this by first collecting the iterator into a Rust vector of If the fix to avoid the crash and keep current performance "just" exposing a new API which is
I think that's not feasible because stable API extensions built before this change will start producing tuples not tracked by GC, which sounds bad to me. |
As @vstinner said in capi-workgroup/problems#56 creating incomplete tuples is unsafe. Rather than an add another unsafe API, would it not be better to promote the safe ones and deprecate the unsafe ones? From smaller tuples, For larger tuples, the safe thing to do is build a list, then call |
I'd be in favour of making |
Looks good to me also, because I am not sure how people want to use APIs based on elements of size. |
This change is related to issue #107137 which proposes adding a new safe API. But for the implementation, I need to add internal unsafe functions, right. |
Right, initializing tuple times to None would prevent crashes on repr(tuple) when devs play with GC introspection functions. But my concern is more general than that. For me, PyTuple_SET_ITEM() is too different from all other C API functions: it doesn't clear the reference to the old item, since it makes the assumption that the function must only be called on newly created tuple with uninitialized items.
I would prefer to move towards an API where it's impossible to create unitialized/incomplete objects and so a function like PyTuple_SET_ITEM() would make no sense. Reference counting would be more regular and the API would be less error-prone.
On the implementation side, because PyTuple_New() API exists, most Other APIs like PyTuple_Pack() or proposed "tuple builder" API doesn't have such problem: if we manage to remove |
By the way, this change also fix |
I close my issue, see: #107139 (comment) |
Ooops, wrong window, I wanted to close the issue #107137 |
Apparently, I failed to convinced that _PyTuple_NewNoTrack() is the good fix for the issue, and some people consider that this crash is not important enough to be fixed, so I close just my PR. |
Could you open an issue for the crash? |
There was an issue closed by @pablogsal and @rhettinger was also against it. I asked @pablogsal if he can reconsider his decision but he didn't replied. So I give up. If you feel ready to reconsider this design choice, please go ahead and open a new issue :-) I'm done on this topic for now ;-) |
I think we should fix this. It is a crash in legal Python code. I don't think we need a new API, we need to fix |
My PR only added an internal C API. Is it now wrong to add internal C API functions? |
No, but I don't think this is the way to go. |
And If the loss of efficiency in refcount handling is an obstacle to adoption, we could add variants that consume the references. |
See #117747 |
Fix a crash in PySequence_Tuple() when the tuple which is being initialized was accessed via GC introspection, like the gc.get_referrers() function. Now the function only tracks the tuple by the GC once it's fully initialized.
Add _PyTuple_NewNoTrack() and _PyTuple_ResizeNoTrack() functions to the internal C API: similar to PyTuple_New() and _PyTuple_Resize(), but don't track the tuple by the GC.
Modify PySequence_Tuple() to use these functions.