scan_parquet panics when file is bigger than 2^32 but materialized query isn't. duckdb and pyarrow can do query. #20777
Labels
A-io
Area: reading and writing data
A-io-cloud
Area: reading/writing to cloud storage
A-io-parquet
Area: reading/writing Parquet files
A-panic
Area: code that results in panic exceptions
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
rust
Related to Rust Polars
Checks
Reproducible example
Same error with python, of course.
Log output
Issue description
I ran it again with
but the log still says
parquet scan with parallel = RowGroups
same is true when I did ParallelStrategy::NoneExpected behavior
The size of the row group is well under 2^32 so it should be able to materialize. Using pyarrow dataset filter, or duckdb can each get the row group just fine.
I can sort of read a row group with
but I get this weird panic after it prints the df.
it's weird because the left/right numbers change every time I run this.
I compiled with release and I didn't get the tokio panic
Installed versions
polars = { version = "0.45.1", features = ["json", "temporal", "timezones","dtype-datetime",
"strings", "dtype-date","lazy","parquet", "simd", "performant", "azure", "dtype-u8","offset_by", "streaming", "partition_by", "is_in"]}
polars-core = {version = "0.45.1" }
polars-io = {version="0.45.1"}
polars-plan = {version="0.45.1"}
The text was updated successfully, but these errors were encountered: