Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-75459: Doc: C API: Improve object life cycle documentation #125962

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

rhansen
Copy link
Contributor

@rhansen rhansen commented Oct 25, 2024

  • Add "cyclic isolate" to the glossary.
  • Add a new "Object Life Cycle" page.
    • Illustrate the order of life cycle functions.
    • Document PyObject_CallFinalizer and PyObject_CallFinalizerFromDealloc.
  • PyObject_Init does not call tp_init.
  • PyObject_New:
    • also initializes the memory
    • does not call tp_alloc, tp_new, or tp_init
    • should not be used for GC-enabled objects
    • memory must be freed by PyObject_Free
  • PyObject_GC_New memory must be freed by PyObject_GC_Del.
  • Warn that garbage collector functions can be called from any thread.
  • tp_finalize and tp_clear:
    • Only called when there's a cyclic isolate.
    • Only one object in the cyclic isolate is finalized/cleared at a time.
    • Clearly warn that they might not be called.
    • They can optionally be manually called from tp_dealloc (via PyObject_CallFinalizerFromDealloc in the case of tp_finalize).
  • tp_finalize:
    • Reference object.__del__.
    • The finalizer can resurrect the object.
    • Suggest PyErr_GetRaisedException and PyErr_SetRaisedException instead of the deprecated PyErr_Fetch and PyErr_Restore functions.
    • Add links to PyErr_GetRaisedException and PyErr_SetRaisedException.
    • Suggest using PyErr_WriteUnraisable if an exception is raised during finalization.
    • Rename the example function from local_finalize to foo_finalize for consistency with the tp_dealloc documentation and as a hint that the name isn't special.
    • Minor wording and sylistic tweaks.
    • Warn that tp_finalize can be called during shutdown.

📚 Documentation preview 📚: https://cpython-previews--125962.org.readthedocs.build/

@rhansen
Copy link
Contributor Author

rhansen commented Oct 25, 2024

I'm not familiar enough with CPython's internals to be super confident about these changes. I would appreciate it if a GC expert would carefully review this.

Thanks!

@hugovk
Copy link
Member

hugovk commented Oct 25, 2024

Hmm, I'm not sure if we can require graphviz for the docs.

We'd have to consider installing it on the main docs server in addition to Read the Docs, and also make sure the docs can still build without it, for downstream redistributors who might only want to build with "vanilla" Sphinx and no extra extensions. Plus other developers would need an easy way to build the docs on their machines.

cc @AA-Turner

.readthedocs.yml Outdated
Comment on lines 14 to 15
apt_packages:
- graphviz
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out how to tell readthedocs to install graphviz. This doesn't work despite the documentation suggesting it should. Any help would be appreciated.

@rhansen
Copy link
Contributor Author

rhansen commented Oct 25, 2024

Hmm, I'm not sure if we can require graphviz for the docs.

Maybe I should just commit the generated .svg (and the input dot file so it can be revised easily). Would that be acceptable?

  * Add "cyclic isolate" to the glossary.
  * Add a new "Object Life Cycle" page.
    * Illustrate the order of life cycle functions.
    * Document `PyObject_CallFinalizer` and
      `PyObject_CallFinalizerFromDealloc`.
  * `PyObject_Init` does not call `tp_init`.
  * `PyObject_New`:
    * also initializes the memory
    * does not call `tp_alloc`, `tp_new`, or `tp_init`
    * should not be used for GC-enabled objects
    * memory must be freed by `PyObject_Free`
  * `PyObject_GC_New` memory must be freed by `PyObject_GC_Del`.
  * Warn that garbage collector functions can be called from any
    thread.
  * `tp_finalize` and `tp_clear`:
    * Only called when there's a cyclic isolate.
    * Only one object in the cyclic isolate is finalized/cleared at a
      time.
    * Clearly warn that they might not be called.
    * They can optionally be manually called from `tp_dealloc` (via
      `PyObject_CallFinalizerFromDealloc` in the case of
      `tp_finalize`).
  * `tp_finalize`:
    * Reference `object.__del__`.
    * The finalizer can resurrect the object.
    * Suggest `PyErr_GetRaisedException` and
      `PyErr_SetRaisedException` instead of the deprecated
      `PyErr_Fetch` and `PyErr_Restore` functions.
    * Add links to `PyErr_GetRaisedException` and
      `PyErr_SetRaisedException`.
    * Suggest using `PyErr_WriteUnraisable` if an exception is raised
      during finalization.
    * Rename the example function from `local_finalize` to
      `foo_finalize` for consistency with the `tp_dealloc`
      documentation and as a hint that the name isn't special.
    * Minor wording and sylistic tweaks.
    * Warn that `tp_finalize` can be called during shutdown.
@rhansen
Copy link
Contributor Author

rhansen commented Oct 25, 2024

I committed the generated .svg so that the substance of this PR can be reviewed while we figure out if it is acceptable to add graphviz as a dependency. (Note that sphinx.ext.graphviz is a built-in extension, so enabling it doesn't add any new sphinx dependencies.)

@AA-Turner
Copy link
Member

I think that requiring graphviz should be fine -- Debian, Fedora, Gentoo, and OpenSUSE all package it. As Richard notes, it's a built-in extension, so should be fine from the "Vanilla" perspective.

I would want to include a NEWS entry to say that graphviz is now required to build the docs, though.

A

@AA-Turner AA-Turner added the docs Documentation in the Doc dir label Oct 25, 2024
.readthedocs.yml Outdated Show resolved Hide resolved
@ZeroIntensity
Copy link
Member

My main concern with documenting nitty-gritty details of the lifecycle is that we're technically documenting implementation details, which are subject to change (and we've been bad at updating these kind of things from version-to-version in past). I suggest the SVG go into the InternalDocs folder instead.

It's also worth noting here that tp_finalize isn't 100% related to garbage collection, it's supposed to be used over tp_dealloc if complicated things are being done upon finalization, even for non-GC types. And while we're here, I think it would be a good idea to document the cases that tp_clear should exist for a tracked type.

@rhansen
Copy link
Contributor Author

rhansen commented Oct 26, 2024

My main concern with documenting nitty-gritty details of the lifecycle is that we're technically documenting implementation details, which are subject to change (and we've been bad at updating these kind of things from version-to-version in past). I suggest the SVG go into the InternalDocs folder instead.

I don't want to document any implementation details here, so I'm happy to remove what isn't necessary. It's hard to tell what is and isn't necessary because the end of an object's life is especially fraught with peril. I think that it is better to err on the side of over-documenting this topic than under-documenting.

I wrote this PR because there were several things that I needed to know that the existing documentation didn't make clear:

  • The order the tp_* functions might be called (to know what invariants are possible).
  • Which threads might execute the functions (for locking correctness).
  • Details about when a function is called (or not) that are necessary for locking correctness. (e.g., if multiple objects in the same cyclic isolate are never finalized concurrently then a lock-free design might be possible)
  • Approximately how often a tp_* function might be called: maybe never, exactly once, at most once, at least once, very frequently, etc.
  • Which other objects might be in an inconsistent state.

It's also worth noting here that tp_finalize isn't 100% related to garbage collection, it's supposed to be used over tp_dealloc if complicated things are being done upon finalization, even for non-GC types.

If I understand correctly, tp_finalize is never called for non-GC types unless the class designer calls it from tp_dealloc. In that case tp_finalize is just like any other helper function that might be called from tp_dealloc. (Maybe this is only true for static types and not heap types? I don't fully understand the difference.)

And while we're here, I think it would be a good idea to document the cases that tp_clear should exist for a tracked type.

I thought that was already sufficiently explained, even before this PR. Can you explain what you think is lacking?

@ZeroIntensity
Copy link
Member

First, thanks for doing this!

I think that it is better to err on the side of over-documenting this topic than under-documenting.

I don't, that limits our ability to modify the lifecycle in the future (especially because there's no good way to deprecate things here). I'll point this out when doing a more in-depth review though, I don't see anything particularly bad right now.

If I understand correctly, tp_finalize is never called for non-GC types unless the class designer calls it from tp_dealloc.

You're right, it's not, but I don't think we should limit ourselves to that in the future. It might be possible someday to automatically do this for untracked types as well. We should just document that all types, even GC, require PyObject_CallFinalizerFromDealloc in the destructor if they want tp_finalize to get eventually called--we can note that it could happen automatically, though.

I thought that was already sufficiently explained, even before this PR. Can you explain what you think is lacking?

Basically, it's not documented which types need to have a tp_clear, because not all GC types have it. I'm not even sure which cases require it. I think it's only needed if the type can have a direct reference cycle to itself? (As in, running its finalizer will try to Py_DECREF itself.)

Also, I don't think it should be documented that tp_clear is related to tp_dealloc by making an "optional call", that's sort of incidental. They tend to do the same thing, and the destructor can utilize the clear function for convenience, but they're for different purposes.


A few other notes:

  • It's fine to document what the specific allocators (e.g. PyObject_GC_New and PyObject_GC_Del) do, but we should point users to using tp_alloc and tp_free instead.
  • That said, I don't see the need to mention that functions like PyObject_New don't call tp_init. Those are strictly allocators, and currently documented as such.
  • This isn't exhaustive--static objects and some immortal objects don't follow this lifecycle (single-phase modules don't either, I think). Other objects might not follow this in the future either.
  • If I read "called from any thread" in the docs, I would be worried about holding the GIL. Maybe mention that while it can be called from any thread, it will still hold the GIL.

Comment on lines +33 to +39
Calls :c:func:`PyObject_Malloc` to allocate memory for a new Python object
using the C structure type *TYPE* and the Python type object *typeobj*
(``PyTypeObject*``), then initializes the memory like
:c:func:`PyObject_Init`. The caller will own the only reference to the
object (i.e. its reference count will be one). The size of the memory
allocation is determined from the :c:member:`~PyTypeObject.tp_basicsize`
field of the type object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the original sentence did a better job at explaining what the macro does, whereas the new one focuses more on the "how".

Perhaps this is better?

Suggested change
Calls :c:func:`PyObject_Malloc` to allocate memory for a new Python object
using the C structure type *TYPE* and the Python type object *typeobj*
(``PyTypeObject*``), then initializes the memory like
:c:func:`PyObject_Init`. The caller will own the only reference to the
object (i.e. its reference count will be one). The size of the memory
allocation is determined from the :c:member:`~PyTypeObject.tp_basicsize`
field of the type object.
Allocate a new Python object using the C structure type *TYPE*
and the Python type object *typeobj* (``PyTypeObject*``)
by calling :c:func:`PyObject_Malloc` to allocate memory and
initializing it like :c:func:`PyObject_Init`.
The caller will own the only reference to the object
(i.e. its reference count will be one).
The size of the memory allocation is determined from the
:c:member:`~PyTypeObject.tp_basicsize` field of the type object.

Comment on lines +11 to +13
The following is an illustration of the stages of life of an object. Arrows
indicate a "happens before" relationship. Octagons indicate functions specific
to :ref:`garbage collection support <supporting-cycle-detection>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence seems unnecessary:

Suggested change
The following is an illustration of the stages of life of an object. Arrows
indicate a "happens before" relationship. Octagons indicate functions specific
to :ref:`garbage collection support <supporting-cycle-detection>`.
The following is an illustration of the stages of life of an object.
Octagons indicate functions specific
to :ref:`garbage collection support <supporting-cycle-detection>`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "alive, refcount > 0" is not entirely clear to me.
There are 4 arrows coming out from it:

  • one says "refcount == 0", seemingly implying that the condition in the rectangle (shouldn't it be a diamond then?) is false;
  • the one that does to tp_traverse is unlabeled (maybe this happens when the condition is true?);
  • the other two are both labeled "cyclic isolate" -- are both always executed? Is there another hidden condition that determines which one is executed?

Comment on lines +61 to +62
To allocate and free memory, see :ref:`Allocating Objects on the Heap
<allocating-objects>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The :ref: should display the title of the section it refers to, so if the title is already "Allocating Objects on the Heap", you should be able to just do:

Suggested change
To allocate and free memory, see :ref:`Allocating Objects on the Heap
<allocating-objects>`.
To allocate and free memory, see :ref:`allocating-objects`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review docs Documentation in the Doc dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants