-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: make DataFile Serializable && Deserializable #774
Comments
Hey @ZENOTME thanks for raising this. Technically the |
Thanks @Fokko. I think this is the solution that works. But we should expose the _serde::DataFile so that the user can use this serializable representation. I create the PR for this #797. |
Hi, thank you @ZENOTME for starting this discussion. I prefer to make |
To make Also I think contain type info in DataFile can also help #777.
There are two choices: contain the partition type info in DataFile or contain the whole struct type of DataFile. The reason I prefer to contain the whole struct type of DataFile is the better compatibility, e.g in the future the DataFile add more struct types. |
Thanks @ZENOTME for raising this. In your question, the typical use case is compute engine, which needs to serialized |
For read part, the compute engines required is to serialize/deserialized planned task. But for write part, the compute engines required to pass the data file. #797 is implement used solution 1. |
Should we close this now? |
Yes, I think #797 is enough now. |
Context
Make Datafile Serializable && Deserializable is useful, e.g. In distributed compute engine, it will create multiple writers in multiple machines and write the data in parallel and get the DataFile as the results, these DataFiles will be sent to a coordinator and append using transaction. In this case, DataFile should able to be Serializable && Deserializable.
Solution
For now, we support Serialize DataFile in _serde module and we should convert the DataFile to _serde::DataFile first, the interface looks like:
pub fn try_from(value: super::DataFile, partition_type: &StructType,is_version_1: bool) -> _serde::DataFile
. More detail:iceberg-rust/crates/iceberg/src/spec/manifest.rs
Line 1361 in 98cd34d
There is something we need to resolve to support Datafile Serializable && Deserializable:
pub fn try_from(value: super::DataFile, partition_type: &StructType,is_version_1: bool) -> _serde::DataFile
To solve the above, I think there are two solutions:
I prefer solution 1 because it looks more natural. Welcome to different opinions and solutions. cc @liurenjie1024 @Fokko @Xuanwo @c-thiel
The text was updated successfully, but these errors were encountered: