-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split metadata tables into separate modules #872
Conversation
10a3d48
to
a6017ce
Compare
a6017ce
to
e8cabed
Compare
e8cabed
to
be77803
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// - <https://iceberg.apache.org/docs/latest/spark-queries/#querying-with-sql> | ||
/// - <https://py.iceberg.apache.org/api/#inspecting-tables> | ||
#[derive(Debug)] | ||
pub struct MetadataTable(Table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why we need this data struct here. It seems just a wrapper to provide more api, while just like this:
impl Table {
pub fn snapshots(&self) -> SnapshotsTable {
...
}
pub fn manifests(&self) -> ManifestsTable {
...
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why we need this data struct here. It seems just a wrapper to provide more api, while just like this:
Yes, I intend to make the API exposed at Table
more organized. For example, users will have:
table.metadata_table().snapshots();
table.metadata_table().manifests();
instead of:
// Could be confused with `table.metadata().snapshots()` which returns `Snapshot`.
table.snapshots();
// Verbose and long
table.metadata_snapshots_table();
table.metadata_manifests_table();
While I believe we could use better API names, such as table.inspect().snapshots()
, the overall structure looks good to me and aligns better with other implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata_table
to inspect
rename is here: #881
} | ||
|
||
/// Returns the schema of the snapshots table. | ||
pub fn schema(&self) -> Schema { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as #868 . But we could defer this to later issue.
} | ||
|
||
/// Scans the snapshots table. | ||
pub fn scan(&self) -> Result<RecordBatch> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as #870
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rshkv for this pr, LGTM!
I'll merge this first as it's a pure refactoring. |
Split metadata tables into separate modules.
Context for this is to address #863 (comment) where the point was made that
metadata_scan.rs
will grow unwieldy if we shove all metadata table implementations in there. Especially as we're going to add extra utilities for those metadata tables.The structure in this PR is:
In the future this can expand as described in #863 (comment).