Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is a bug with the new version of boto used by s3fs that prevents writes to non-AWS S3 buckets #1546

Open
ryanovas opened this issue Jan 21, 2025 · 2 comments

Comments

@ryanovas
Copy link

ryanovas commented Jan 21, 2025

Apache Iceberg version

0.8.1 (latest release)

Please describe the bug 🐞

Here is a link to the relevant boto issue: boto/boto3#4398

Attempting to use table.append or table.overwrite when using the 1.36.x version installed by default with s3fs causes a very confusing error botocore.exceptions.ClientError: An error occurred (MissingContentLength) when calling the PutObject operation: None

The workaround is to downgrade back to botocore 1.35.99 manually.

I'm unsure if there's work on the pyiceberg side to resolve this but I am posting this issue for others like me googling this issue and struggling to find answers.

I might also suggest pyiceberg consider a library that doesn't rely on AWS controlled boto and uses something more like s2cmd that is more S3-agnostic for those of us using things like Digital Ocean, minio, backblaze, etc.

@Fokko
Copy link
Contributor

Fokko commented Jan 21, 2025

Thanks for raising awareness here @ryanovas. It looks like the lock is at 1.36.1:

iceberg-python/poetry.lock

Lines 406 to 415 in c84dd8d

[[package]]
name = "boto3"
version = "1.36.1"
description = "The AWS SDK for Python"
optional = false
python-versions = ">=3.8"
files = [
{file = "boto3-1.36.1-py3-none-any.whl", hash = "sha256:eb21380d73fec6645439c0d802210f72a0cdb3295b02953f246ff53f512faa8f"},
{file = "boto3-1.36.1.tar.gz", hash = "sha256:258ab77225a81d3cf3029c9afe9920cd9dec317689dfadec6f6f0a23130bb60a"},
]

This is interesting since we test against minio in our tests, and it didn't show up. Do you know how to replicate this?

@ryanovas
Copy link
Author

In my case I was using a digital ocean spaces bucket and lakekeeper then tried to follow the getting started guide from the docs with the taxi data. When I got to the append part I started getting the errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants