Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv with s3 glob pattern fails with TypeError: Object does not have a .read() method. #20827

Open
2 tasks done
laurentS opened this issue Jan 21, 2025 · 0 comments
Open
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@laurentS
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

storage_options = {...}

df = pl.read_csv(
    [f"s3://{prefix}/my_data/**"],
    storage_options=storage_options,
)

This fails with TypeError: Object does not have a .read() method.

Log output

Traceback (most recent call last):
  File "./main.py", line 51, in <module>
    load_files()
  File "./main.py", line 30, in load_files
    df = pl.read_csv(
         ^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/polars/_utils/deprecation.py", line 92, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/polars/_utils/deprecation.py", line 92, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/polars/_utils/deprecation.py", line 92, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/polars/io/csv/functions.py", line 534, in read_csv
    df = _read_csv_impl(
         ^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/polars/io/csv/functions.py", line 682, in _read_csv_impl
    pydf = PyDataFrame.read_csv(

Issue description

I am aware that read_csv does not officially accept a list argument, but I gave it a shot as it is allowed by fsspec and it goes down to prepare_file_arg which returns an OpenFiles from fsspec. It then ends up in _read_csv_impl where it fails, as it cannot handle a list.

On the other hand, I can write pl.read_csv('/some/local/path/**') and it reads all the files in that folder.

Expected behavior

I would expect the remote s3 read to work the same way, especially as the underlying fsspec and s3fs allow it.
Whether it works through a glob like s3://some/path/** or a list like above doesn't really matter, though I think the latter is easier, as fsspec knows how to deal with it.

Installed versions

--------Version info---------
Polars:              1.19.0
Index type:          UInt32
Platform:            Linux-6.12.8-amd64-x86_64-with-glibc2.40
Python:              3.12.8 (main, Jan 11 2025, 09:42:09) [GCC 14.2.0]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                1.36.1
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.12.0
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         <not installed>
numpy                <not installed>
openpyxl             <not installed>
pandas               <not installed>
pyarrow              <not installed>
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           2.0.37
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@laurentS laurentS added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

1 participant