Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: [comet-parquet-exec] Unit test fixes, default scan impl to native_comet #1265

Merged
merged 19 commits into from
Jan 13, 2025

Conversation

parthchandra
Copy link
Contributor

Notable changes:

  1. There are three scan implementations:
Name Description Operator Underlying implementation
native_comet original, supports primitive types (default) CometScan Comet
native_datafusion POC1, supports (or will support) complex types CometNativeScan DataFusion
native_iceberg_compat POC2, supports (or will support) complex types, exposes API for Iceberg Comet Scan DataFusion

The scan implementation can be selected by setting the conf spark.comet.scan.impl or by setting the environment variable COMET_PARQUET_SCAN_IMPL

  1. Plan compatibility suites generate a different plan based on the implementation. As a result, we now have three sets of expected plans based on the scan implementation chosen

  2. We now use the Spark Session timezone instead of UTC while reading timestamp fields. This is so that we can compare them with literal timestamps (Spark apparently automatically applies the session timezone to those)

@parthchandra
Copy link
Contributor Author

@andygrove @mbutrovich

@andygrove andygrove changed the title Unit test fixes, default scan impl to comet_native [comet-parquet-exec] Unit test fixes, default scan impl to comet_native Jan 10, 2025
@andygrove
Copy link
Member

@parthchandra could you run cargo fmt and cargo clippy

@andygrove andygrove changed the title [comet-parquet-exec] Unit test fixes, default scan impl to comet_native chore: [comet-parquet-exec] Unit test fixes, default scan impl to comet_native Jan 10, 2025
@andygrove
Copy link
Member

@parthchandra could you run cargo fmt and cargo clippy

and make format .. that is if we want to see tests passing on this PR, which I think they should now? Or do we want to wait until we PR comet-parquet-exec against main before trying to get CI green?

@parthchandra
Copy link
Contributor Author

@parthchandra could you run cargo fmt and cargo clippy

and make format .. that is if we want to see tests passing on this PR, which I think they should now? Or do we want to wait until we PR comet-parquet-exec against main before trying to get CI green?

Fixed the build. Yes, we should get the CI to turn green with this PR before we attempt any more changes to the feature branch

@parthchandra
Copy link
Contributor Author

Updated the plns for Spark 3.5 and Spark 4.0. However plan generation for the native_datafusion impl is failing which will not affect the ci, but which needs to be addressed at some point.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @parthchandra

@parthchandra
Copy link
Contributor Author

Added the plans for Spark 4.0 for the NATIVE_DATAFUSION and NATIVE_ICEBERG_COMPAT scans

@parthchandra parthchandra changed the title chore: [comet-parquet-exec] Unit test fixes, default scan impl to comet_native chore: [comet-parquet-exec] Unit test fixes, default scan impl to native_comet Jan 12, 2025
@andygrove andygrove merged commit b49c17b into apache:comet-parquet-exec Jan 13, 2025
75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants