Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base class for PathBase.stat() return value? #3

Open
barneygale opened this issue Dec 16, 2023 · 4 comments
Open

Base class for PathBase.stat() return value? #3

barneygale opened this issue Dec 16, 2023 · 4 comments

Comments

@barneygale
Copy link
Owner

Is it unreasonable or incorrect to expect users to instantiate an os.stat_result object? It's a little tricky.

@barneygale
Copy link
Owner Author

barneygale commented Jan 25, 2024

Proposal: add a status() method that returns a Status object, which would resemble an os.DirEntry and include methods like is_dir() and file_type().

PathBase.status() would be abstract, and PathBase.stat() would call self.status().as_stat_result().

Path.status() would call Status.from_stat_result(self.stat()), and Path.stat() would call os.stat() as it does now.

Users wouldn't need to deal with the low-level stat_result interface.

@ap--
Copy link

ap-- commented Jan 30, 2024

This sounds very interesting for universal_pathlib. Cross-referencing fsspec/universal_pathlib#145

In filesystem_spec this "non os.stat_result returning" method is provided by AbstractFileSystem.info() and its return value is a dictionary with a few standardised keys and unstructured optional information that a filesystem implementation can return.

Apache Airflow provides a wrapper class based on dict that takes this info() dict and provides some of os.stat_result's interface: https://github.com/apache/airflow/blob/8b5a1ff5e9b3ef6941ab932de64dd1c939fcc2fb/airflow/io/utils/stat.py#L22-L83

In internal code I have mostly seen .stat() being used to infer the size or the creation / modification time of files.

@barneygale
Copy link
Owner Author

barneygale commented Jan 30, 2024

Very interesting, thank you!

Out of interest, why not a dataclass rather than a dict with standardised keys? I was leaning towards a dataclass personally. Subclasses (e.g. S3Status) could add their own fields.

And do you think there's any sense in adding a PathBase.set_status/info() method? Other methods like PathBase.chmod() could be made to call it, supplying a Status(permissions=...) object. Users could call it with an S3Status object to set S3-specific metadata.

@ap--
Copy link

ap-- commented Feb 5, 2024

Out of interest, why not a dataclass rather than a dict with standardised keys? I was leaning towards a dataclass personally. Subclasses (e.g. S3Status) could add their own fields.

I'm not sure, but my guess would be that this was a pragmatic decision. fsspec.spec.AbstractFileSystem.info() returns a dict since fsspec's first release.
There is an open issue in the repo to standardise this across filesystems.

And do you think there's any sense in adding a PathBase.set_status/info() method?

Hmmm, interesting. I'll try to collect some examples for specific filesystems, and will report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants