-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introduce feature flags to select major arrow versions #654
base: main
Are you sure you want to change the base?
Conversation
This change introduces arrow_53 and arrow_54 feature flags on kernel which are _required_ when using default-engine or sync-engine. Fundamentally we must push users of the crate to select their arrow major version through flags since Cargo _will_ include multiple major versions in the dependency tree which can cause ABI breakages when passing around symbols such as `RecordBatch` See delta-io#640 Signed-off-by: R. Tyler Croy <[email protected]>
87f2330
to
212b392
Compare
@nicklan @zachschuermann A lot of the ❌ are due to the use of |
Because the arrow_53 and arrow_54 flags are mutually exclusive, we can no longer rely on the --all-features flag 😭 Signed-off-by: R. Tyler Croy <[email protected]>
# Arrow supported versions | ||
## 53 | ||
# Used in default engine | ||
arrow-buffer = { workspace = true, optional = true } | ||
arrow-array = { workspace = true, optional = true, features = ["chrono-tz"] } | ||
arrow-select = { workspace = true, optional = true } | ||
arrow-arith = { workspace = true, optional = true } | ||
arrow-cast = { workspace = true, optional = true } | ||
arrow-json = { workspace = true, optional = true } | ||
arrow-ord = { workspace = true, optional = true } | ||
arrow-schema = { workspace = true, optional = true } | ||
arrow_53 = { package = "arrow", version = "53", features = ["chrono-tz", "json", "prettyprint"], optional = true } | ||
# Used in default and sync engine | ||
parquet_53 = { package = "parquet", version = "53", features = ["async", "object_store"] , optional = true } | ||
###### | ||
## 54 | ||
arrow_54 = { package = "arrow", version = "54", features = ["chrono-tz", "json", "prettyprint"], optional = true } | ||
parquet_54 = { package = "parquet", version = "54", features = ["async", "object_store"] , optional = true } | ||
###### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicklan @zachschuermann A lot of the ❌ are due to the use of
--all-features
in the GitHub Actions invocations. Unfortunately this will need to be worked around since there are now mutually exclusive features present 😞
Crazy idea... what if we defined the features as lower bounds instead of equality matches? So if e.g. arrow_min_54
feature is requested, that forces the version to be at least arrow-54; selecting arrow_min_53
would not cause any problems in that case, because both conditions are satisfied by choosing arrow-54. Tho maybe it would serve our purposes better to define arrow_max_53
and arrow_max_54
instead (which, if both are requested, chooses arrow-53)? Then --all-features
would still compile and if somebody only specified one of those it would have the desired effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice idea, but I don't think cargo can support it. That is, you do have to specify something like:
arrow_53 = { package = "arrow", version = "53", features = ["chrono-tz", "json", "prettyprint"], optional = true }
arrow_54 = { package = "arrow", version = "54", features = ["chrono-tz", "json", "prettyprint"], optional = true }
Somewhere to specify the optional dependencies. So now if we want a feature like arrow_max_54
, we need to have that depend on.... something. What exactly it would depend on would depend on what other flags you had enabled, but afaik cargo does not support logic in the toml so you can't unify the flags and pick the right dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two parts to this trick:
- cargo.toml brings in arrow@53 as arrow_53 etc. This effectively bifurcates the
arrow
crate into multiple apparent crates, and would remain unchanged. - the extern crate trickery that decides which of the available versions to map back to
arrow
crate name. This is the part that might use the lower/upper bound concept.
But in retrospect, you're probably right... we can't use the original names to make decisions for the extern crate decl, and if we want e.g. arrow_max_54
to potentially point to arrow-53, then we'd have to bring both arrow-54 and arrow-53 as dependencies, which at last partly defeats the purpose of splitting them out.
Maybe it still works, tho? Technically both arrow versions would be dependencies, but only the one mapped in as an extern crate would actually get used by any kernel code?
(ugh, cargo... this shouldn't be so difficult!)
delta_kernel::arrow::compute::cast( | ||
&col, | ||
&DataType::Timestamp(*unit, Some("UTC".into())), | ||
) | ||
.expect("Could not cast to UTC") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
delta_kernel::arrow::compute::cast( | |
&col, | |
&DataType::Timestamp(*unit, Some("UTC".into())), | |
) | |
.expect("Could not cast to UTC") | |
let data_type = DataType::Timestamp(*unit, Some("UTC".into())); | |
delta_kernel::arrow::compute::cast(&col, &data_type) | |
.expect("Could not cast to UTC") |
pub mod arrow; | ||
pub mod parquet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might be able to make this cleaner with some extern crate trickery:
pub mod arrow; | |
pub mod parquet; | |
#[cfg(feature = "arrow_53")] | |
macro_rules! declare_extern_arrow_and_parquet { | |
() => { | |
pub extern crate arrow_53 as arrow; | |
pub extern crate parquet_53 as parquet; | |
}; | |
} | |
#[cfg(feature = "arrow_54")] | |
macro_rules! declare_extern_arrow_and_parquet { | |
() => { | |
pub extern crate arrow_54 as arrow; | |
pub extern crate parquet_54 as parquet; | |
}; | |
} | |
declare_extern_arrow_and_parquet!(); |
At least, it seemed to have the desired effect when I tested a bit locally.
The macro is required because apparently extern crate
is immune to conditional compilation and you get double-import compilation failures trying to do it directly... but a conditionally-defined macro that defines the extern crates works fine. 🤷
We probably need to pub export the macro so that the other kernel crates can use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that seems like a fun trick - I guess could omit our arrow/parquet modules with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything looks reasonable :) I do wonder if we could use some of @scovich's tricks to avoid mutually-exclusive features? Though it does seem like this PR follows a somewhat typical approach of allowing major version selection of dependencies...
@@ -20,24 +20,9 @@ license = "Apache-2.0" | |||
repository = "https://github.com/delta-io/delta-kernel-rs" | |||
readme = "README.md" | |||
rust-version = "1.80" | |||
version = "0.6.1" | |||
version = "0.7.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually do the updates to version all at once during a release
version = "0.7.0" | |
version = "0.6.1" |
@@ -24,7 +24,7 @@ url = "2" | |||
delta_kernel = { path = "../kernel", default-features = false, features = [ | |||
"developer-visibility", | |||
] } | |||
delta_kernel_ffi_macros = { path = "../ffi-proc-macros", version = "0.6.1" } | |||
delta_kernel_ffi_macros = { path = "../ffi-proc-macros", version = "0.7.0" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delta_kernel_ffi_macros = { path = "../ffi-proc-macros", version = "0.7.0" } | |
delta_kernel_ffi_macros = { path = "../ffi-proc-macros", version = "0.6.1" } |
@@ -39,27 +39,29 @@ uuid = "1.10.0" | |||
z85 = "3.0.5" | |||
|
|||
# bring in our derive macros | |||
delta_kernel_derive = { path = "../derive-macros", version = "0.6.1" } | |||
delta_kernel_derive = { path = "../derive-macros", version = "0.7.0" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delta_kernel_derive = { path = "../derive-macros", version = "0.7.0" } | |
delta_kernel_derive = { path = "../derive-macros", version = "0.6.1" } |
//! This module exists to help re-export the version of arrow used by default-gengine and other | ||
//! parts of kernel that need arrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! This module exists to help re-export the version of arrow used by default-gengine and other | |
//! parts of kernel that need arrow | |
//! This module exists to re-export the version of arrow used by default-engine and other | |
//! parts of kernel that need arrow |
pub mod arrow; | ||
pub mod parquet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that seems like a fun trick - I guess could omit our arrow/parquet modules with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! A couple of small things, and we should be sure ryan's idea doesn't work, but otherwise looks great.
//! This module exists to help re-export the version of arrow used by default-gengine and other | ||
//! parts of kernel that need arrow | ||
#[cfg(all(feature = "arrow_53", feature = "arrow_54"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oof, this #cfg
is gonna get ugly as the number of supported arrow versions goes up. we can probably make it a little better by not using all
in that case, and having multiple checks, but it's still not gonna be great. I don't think there's much better than that though sadly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the plan was to only support 2-3 versions max at a time? Tho even 3 already makes this pretty ugly... and agree we'd need to split it into multiple checks, e.g. all(v1, v2); all(v1, v3); all(v2, v3).
//! This module exists to help re-export the version of arrow used by default-gengine and other | ||
//! parts of kernel that need arrow | ||
|
||
#[cfg(all(feature = "arrow_53", feature = "arrow_54"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should be the parquet flags:
#[cfg(all(feature = "arrow_53", feature = "arrow_54"))] | |
#[cfg(all(feature = "parquet_53", feature = "parquet_54"))] |
same below.
But that said, do you think we ever want say arrow_53
and parquet_54
? I thought they always co-versioned. Could we just have the arrow feature flags and pull the appropriate parquet crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point... would it even be safe to use arrow-54 and parquet-53 or vice-versa? Seems like we'd quickly run into ABI compat issues?
This change introduces arrow_53 and arrow_54 feature flags on kernel which are required when using default-engine or sync-engine. Fundamentally we must push users of the crate to select their arrow major version through flags since Cargo will include multiple major versions in the dependency tree which can cause ABI breakages when passing around symbols such as
RecordBatch
See #640