-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose os.DirEntry
objects from pathlib
#125413
Comments
Add a `Path.dir_entry` attribute. In any path object generated by `Path.iterdir()`, it stores an `os.DirEntry` object corresponding to the path; in other cases it is `None`. This can be used to retrieve the file type and attributes of directory children without necessarily incurring further system calls. Under the hood, we use `dir_entry` in our implementations of `PathBase.glob()`, `PathBase.walk()` and `PathBase.copy()`, the last of which also provides the implementation of `Path.copy()`, resulting in a modest speedup when copying local directory trees.
Add a `Path.dir_entry` attribute. In any path object generated by `Path.iterdir()`, it stores an `os.DirEntry` object corresponding to the path; in other cases it is `None`. This can be used to retrieve the file type and attributes of directory children without necessarily incurring further system calls. Under the hood, we use `dir_entry` in our implementations of `PathBase.glob()`, `PathBase.walk()` and `PathBase.copy()`, the last of which also provides the implementation of `Path.copy()`, resulting in a modest speedup when copying local directory trees.
I put this feedback on the PR, but it's probably better placed here: while I like the general idea, I don't think this specific API is the right way to do it.
I think we can eliminate both of those bits of awkwardness:
If it's impractical to add |
… once Improve `pathlib._abc.PathBase.copy()` (which provides `Path.copy()`) by fetching operands' supported metadata keys up-front, rather than once for each path in the tree. This prepares the way for using `os.DirEntry` objects in `copy()`.
pathlib.Path.dir_entry
os.DirEntry
objects from pathlib
Add `pathlib.Path.scandir()` as a trivial wrapper of `os.scandir()`. In the private `pathlib._abc.PathBase` class, we can rework the `iterdir()`, `glob()`, `walk()` and `copy()` methods to call `scandir()` and make use of cached directory entry information, and thereby improve performance. Because the `Path.copy()` method is provided by `PathBase`, this also speeds up traversal when copying local files and directories.
Use the new `PathBase.scandir()` method in `PathBase.glob()`, which greatly reduces the number of `PathBase.stat()` calls needed when globbing. There are no user-facing changes, because the pathlib ABCs are still private and `Path.glob()` doesn't use the implementation in its superclass.
To tie up the above loose ends, we went with a |
Use the new `PathBase.scandir()` method in `PathBase.walk()`, which greatly reduces the number of `PathBase.stat()` calls needed when walking. There are no user-facing changes, because the pathlib ABCs are still private and `Path.walk()` doesn't use the implementation in its superclass.
Use the new `PathBase.scandir()` method in `PathBase.copy()`, which greatly reduces the number of `PathBase.stat()` calls needed when copying. This also speeds up `Path.copy()`, which inherits the superclass implementation. Under the hood, we use directory entries to distinguish between files, directories and symlinks, and to retrieve a `stat_result` when reading metadata. This logic is extracted into a new `pathlib._abc.CopierBase` class, which helps reduce the number of underscore-prefixed support methods in the path interface.
Use the new `PathBase.scandir()` method in `PathBase.copy()`, which greatly reduces the number of `PathBase.stat()` calls needed when copying. This also speeds up `Path.copy()`, which inherits the superclass implementation. Under the hood, we use directory entries to distinguish between files, directories and symlinks, and to retrieve a `stat_result` when reading metadata. This logic is extracted into a new `pathlib._abc.CopierBase` class, which helps reduce the number of underscore-prefixed support methods in the path interface.
Feature or enhancement
Path.iterdir()
usesos.scandir()
under-the-hood, but it throws away the resultingos.DirEntry
objects, despite their numerous useful features.I propose we add a newPath.dir_entry
attribute that stores anos.DirEntry
object orNone
. This attribute will be set to a directory entry in paths yielded fromPath.iterdir()
.This would allow users to call methods such aschild.dir_entry.is_symlink()
to check for symlinks without incurring a mandatory system call. It will help speed up the implementation ofPath.copy()
too.This description needs a rework. I've found the above suggestion is a whole can of worms!
See discussion: https://discuss.python.org/t/is-there-a-pathlib-equivalent-of-os-scandir/46626
Linked PRs
pathlib.Path.dir_entry
attribute #125419pathlib.Path.copy()
: get common metadata keys only once #125990pathlib.Path.scandir()
method #126060scandir()
to speed upglob()
#126261scandir()
to speed upwalk()
#126262scandir()
to speed upcopy()
#126263The text was updated successfully, but these errors were encountered: