Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: provide (safe) traversal/extraction facilities for zipfile.Path #123727

Open
jaraco opened this issue Sep 5, 2024 · 4 comments
Open
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@jaraco
Copy link
Member

jaraco commented Sep 5, 2024

zipfile.Path could provide its own traversal that could offer some safety checks.

Something like that might be nice. Though perhaps it could be more generic instead of being part of zipfile.Path, even if zipfile.Path would then perhaps provide an extraction API using that?

Originally posted by @obfusk in #123270 (comment)

@jaraco jaraco added the stdlib Python modules in the Lib dir label Sep 5, 2024
@jaraco jaraco changed the title Feature: provide (safe) traversal facilities for zipfile.Path Feature: provide (safe) traversal/extraction facilities for zipfile.Path Sep 5, 2024
@jaraco
Copy link
Member Author

jaraco commented Sep 5, 2024

As mentioned in that issue, there does already exist one extraction API in importlib.resources.as_file, although that implementation is written to create a temporary context. It also has certain limitations, such as it can extract a single file or directory, but not a set of files in a directory, and there's no facility to filter content in subdirectories.

One thing to be careful about when extracting is that using mkdir(exists_ok=True) could lead to traversal outside the target directory (e.g. if the zip file contains ../../../etc/passwd).

Before we embark on any implementation, let's first capture what are the motivations, use-cases, and requirements for such a feature? Who would use it and how?

@picnixz picnixz added the type-feature A feature request or enhancement label Sep 5, 2024
@rruuaanng
Copy link
Contributor

For example, returning a list after calling, right?

@jaraco
Copy link
Member Author

jaraco commented Sep 7, 2024

Perhaps. We're looking for more complete user stories. For example:

  • A user has a zipfile.Path to a subdirectory of some zip file in memory that they downloaded from an online source. They wish to extract the contents of that directory to an existing directory on the file system. The existing directory might also contain other contents, which should be merged with the extracted contents. Extracted files should overwrite any existing files, but any existing subdirectories should be merged. The user does not expect any file names to overlap with directory names (such a case would be an error). At the end, the user wants a list of pathlib.Path objects for all files created and directories touched during the extraction.

That's a contrived user story as an illustration. What I want are real user stories from users who have legitimate use-cases that aren't met by the current zipfile.Path and importlib.resources.as_file functionality.

@obfusk
Copy link
Contributor

obfusk commented Sep 7, 2024

Being able to combine .glob() and extract seems potentially useful to me.

FYI: I just noticed .glob() does not seem to be documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
Status: No status
Development

No branches or pull requests

4 participants