Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade huggingface_hub to v0.27.x in dataset initializer v2 #2379

Merged
merged 1 commit into from
Jan 10, 2025

Conversation

astefanutti
Copy link
Contributor

What this PR does / why we need it:

This upgrades huggingface_hub in the dataset initializer container image to v0.27.1 and includes huggingface/huggingface_hub#2333 that fixes #2378.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):

Fixes #2378

Checklist:

  • Docs included if any changes are user facing

@coveralls
Copy link

coveralls commented Jan 7, 2025

Pull Request Test Coverage Report for Build 12675989950

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 12656204551: 0.0%
Covered Lines: 85
Relevant Lines: 85

💛 - Coveralls

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this fix @astefanutti!

@@ -1 +1 @@
huggingface_hub==0.23.4
huggingface_hub==0.27.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug is similar to: #2367

@astefanutti @Electronic-Waste @kubeflow/wg-training-leads @helenxie-bit @akshaychitneni @shravan-achar @seanlaii @deepanker13 @saileshd1402 what do we think about relaxing the huggingface_hub dependency for the initializer in V2 ?
Since we don't do serialization between the client and server, I am not sure if we should have strict version for the huggingface_hub.

I can see that datasets are using huggingface-hub>=0.24.0

Copy link
Contributor Author

@astefanutti astefanutti Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Un-constraining the version would certainly work. That being said, non-reproducible builds can sometime make regressions difficult to track down as the upgrades are pulled implicitly. So without going as far as locking the dependency tree with Pipfile.lock for example, I find constraining the versions in the requirements a good compromise.

Copy link
Member

@andreyvelich andreyvelich Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if HuggingFace community cherry-pick fixes to the previous minor releases of huggingface_hub?
If yes, we can always do: huggingface-hub>=0.27.0,<0.28.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like fixes are cherry-picked in Z stream, but the fix for that particular issue is already present in previous minor releases, so we can use huggingface-hub>=0.27.0,<0.28 in any case and get patches.
I've updated it according to your suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's try to use this for now.

@astefanutti astefanutti changed the title Upgrade huggingface_hub to v0.27.1 in dataset initializer v2 Upgrade huggingface_hub to v0.27.x in dataset initializer v2 Jan 8, 2025
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @astefanutti!
/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 1dfa40c into kubeflow:master Jan 10, 2025
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HF dataset initializer v2 fails with KeyError: 'tags' when downloading datasets with no tags
3 participants