Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-101112: Add "pattern language" section to pathlib docs #114030

Merged
merged 18 commits into from
Feb 26, 2024

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Jan 13, 2024

Explain the full_match() / glob() / rglob() pattern language in its own section. Move rglob() documentation under glob() and reduce duplicated text.


📚 Documentation preview 📚: https://cpython-previews--114030.org.readthedocs.build/

Explain the `match()` / `glob()` / `rglob()` pattern language in its own
section. Move `rglob()` documentation under `glob()` and reduce duplicated
text.
Doc/library/pathlib.rst Outdated Show resolved Hide resolved
@barneygale
Copy link
Contributor Author

barneygale commented Jan 27, 2024

How patterns with trailing slashes should work:

  1. In glob(), only directories are matched
  2. In glob(), results also have trailing slashes
  3. In full_match(), the path must have the same trailing slash configuration.

However pathlib's implementation of globbing only supports (1), and it's more-or-less impossible to support (2) and (3) in a backwards-compatible way. Because of this, I've not documented trailing slashes in this PR. Too many caveats. They get a mention in the glob() documentation instead.

See also: #106747

@barneygale barneygale removed needs backport to 3.11 only security fixes needs backport to 3.12 bug and security fixes labels Jan 27, 2024
@zooba
Copy link
Member

zooba commented Jan 29, 2024

Maybe it's worth mentioning that trailing slashes are currently inconsistent and to check individual docs, but also recommend a way to use them consistently so that as we make fixes, you're less likely to be affected?

e.g. if we'd been telling people "yeah, glob("**") will only include directories right now, but if you do glob("**/") then we'll know you meant it and won't ever change it on you" it would be less trouble to make that fix.

@barneygale
Copy link
Contributor Author

Honestly I think it's a dead end given the strength of feeling in this thread: https://discuss.python.org/t/pathlib-preserve-trailing-slash/33389

The only possible avenue I see is that we introduce a new class like RawPath that doesn't perform normalization. But that would also face considerable opposition: https://discuss.python.org/t/suggestion-for-pathlib-differentiate-explicit-and-implicit-local-paths-pathlib-strictpath/31235. And even if it came to pass, it wouldn't affect pathlib.Path so preparation isn't possible for users I don't think.

@zooba
Copy link
Member

zooba commented Jan 30, 2024

By the logic I see there, ** should return both files and directories with no way to prevent it - that is, **/ should also return files and directories. So clearly we need to pick some logic that has exceptions, because those seem pretty bad.

Since we're talking about a pattern here, rather than a path, I think it's okay for **/ to restrict to directories, **/* to restrict to files, and ** to include both. None of the resulting Path objects will have trailing slashes, but there's a way to efficiently filter, and that's enough. Right now, there isn't even a good way to filter by collecting everything and using a comprehension (expensive, but obvious).

@barneygale
Copy link
Contributor Author

I agree that making trailing slashes meaningful to patterns in glob() but nowhere else is weird and messy. I think Antoine pointed that out in the thread too. With what I know now, I might not have accepted #10349 so readily, and instead suggested we add optional files=True, dirs=True arguments.

... but at least it only affects glob(), whereas to generally support paths with trailing slashes (required to implement behaviours (2) and (3) from my earlier post) would require much more disruptive changes.

@barneygale
Copy link
Contributor Author

I've added a couple of paragraphs that attempts to explain how trailing slashes work, lmk what you think?

Doc/library/pathlib.rst Outdated Show resolved Hide resolved
Doc/library/pathlib.rst Outdated Show resolved Hide resolved
@barneygale
Copy link
Contributor Author

barneygale commented Feb 10, 2024

Is it worth adding a "Comparison to glob module" section? It would note that:

  1. Hidden files are not special (equivalent to glob(include_hidden=True))
  2. ** is always recursive (equivalent to glob(recursive=True))
  3. ** doesn't follow symlinks by default
  4. Trailing slashes are not preserved
  5. bytes paths and fd-based functionality isn't available

It could also note that pathlib's implementation is heckin' fast

Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor suggestions, but looks good.

Doc/library/pathlib.rst Outdated Show resolved Hide resolved
Doc/library/pathlib.rst Outdated Show resolved Hide resolved
Doc/library/pathlib.rst Outdated Show resolved Hide resolved
Doc/library/pathlib.rst Outdated Show resolved Hide resolved
Doc/library/pathlib.rst Outdated Show resolved Hide resolved
@barneygale
Copy link
Contributor Author

Thanks for the review, Steve.

@barneygale barneygale enabled auto-merge (squash) February 26, 2024 00:16
@barneygale barneygale merged commit e921f09 into python:main Feb 26, 2024
22 checks passed
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this pull request Mar 4, 2024
…on#114030)

Explain the `full_match()` / `glob()` / `rglob()` pattern language in its own section. Move `rglob()` documentation under `glob()` and reduce duplicated text.
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 25, 2024
…on#114030)

Explain the `full_match()` / `glob()` / `rglob()` pattern language in its own section. Move `rglob()` documentation under `glob()` and reduce duplicated text.
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
…on#114030)

Explain the `full_match()` / `glob()` / `rglob()` pattern language in its own section. Move `rglob()` documentation under `glob()` and reduce duplicated text.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir skip news topic-pathlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants