smarter, faster, more memory-efficient, lazy pivot #2374
Labels
enhancement
New feature or request. Once marked with this label, its in the backlog.
performance
polars
Milestone
Even though we load the CSV in lazy mode (primarily to faciltate date inferencing with
--try-parsedates
and--infer-len
),pivotp
currently uses Polars' eager mode to do the pivot.https://docs.pola.rs/user-guide/concepts/lazy-api/
https://github.com/pola-rs/polars/blob/9ea5839c52bf0606aaa0b174d9a974992e0ea328/crates/polars-ops/src/frame/pivot/mod.rs#L119-L153
Since we already use the stats-cache to do pivot validation and smart aggregation, use the stats cache to infer a polars schema as well and do the pivot using the Lazy API's group by method instead.
The text was updated successfully, but these errors were encountered: