You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using GCSFileSystem.glob with a pattern like "bucket-name/prefix*suffix", version 2023.9.0 introduced a performance regression. Previously, this glob would be resolved with an efficient API call whose performance was proportional to the number of matching objects. Since 2023.9.0, the performance seems to scale with the number of objects in the bucket. In my system, the buckets have a "flat" pseudo-folder structure with 1e5+ objects.
Perhaps the prefix argument is no longer being specified to the GCS backend (e.g. in GCSFileSystem._list_objects). I've been studying the differences between 2023.6.0 and 2023.9.0 in both this repo and filesystem_spec, but I haven't seen evidence of this change being explicit or intentional. The unit testing of glob seems to be functional, so it wouldn't catch a performance regression.
The text was updated successfully, but these errors were encountered:
When using
GCSFileSystem.glob
with a pattern like"bucket-name/prefix*suffix"
, version 2023.9.0 introduced a performance regression. Previously, thisglob
would be resolved with an efficient API call whose performance was proportional to the number of matching objects. Since 2023.9.0, the performance seems to scale with the number of objects in the bucket. In my system, the buckets have a "flat" pseudo-folder structure with 1e5+ objects.Debug output from 2023.6.0:
Debug output from 2023.9.0 (and more recent versions like 2024.6.0):
Perhaps the
prefix
argument is no longer being specified to the GCS backend (e.g. inGCSFileSystem._list_objects
). I've been studying the differences between 2023.6.0 and 2023.9.0 in both this repo andfilesystem_spec
, but I haven't seen evidence of this change being explicit or intentional. The unit testing ofglob
seems to be functional, so it wouldn't catch a performance regression.The text was updated successfully, but these errors were encountered: