Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make view metadata path configurable by write.metadata.path #12017

Merged
merged 4 commits into from
Jan 30, 2025

Conversation

tomtongue
Copy link
Contributor

@tomtongue tomtongue commented Jan 21, 2025

Overview

This commit makes the View's metadata file location configurable by write.metadata.path as the table configuration, it and provides more flexible way to configure the metadata file path for Iceberg Views.

Background

Currently, View's metadata file location respects its relevant database location first. If the database doesn't have the location, the View's metadata location is set to a warehouse location.

For now, when you customize the View's metadata location, you need to set the warehouse parameter. However, if your database has the configured location in its property such as location: /db/table/path, you need to change the database location (this is not always easy).

Example

-- SparkSQL
CREATE VIEW db.view TBLPROPERTIES('write.metadata.path'='s3://bucket/path/new-metadata-loc') 
AS SELECT year, count(*) as cnt FROM db.tbl GROUP BY year

DESCRIBE EXTENDED db.view
+---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|col_name                   |data_type                                                                                                                                                               |comment|
+---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|year                       |int                                                                                                                                                                     |       |
|cnt                        |bigint                                                                                                                                                                  |       |
|                           |                                                                                                                                                                        |       |
|# Detailed View Information|                                                                                                                                                                        |       |
|Comment                    |                                                                                                                                                                        |       |
|View Catalog and Namespace |hive_catalog.db                                                                                                                                                         |       |
|View Query Output Columns  |[year, cnt]                                                                                                                                                             |       |
|View Properties            |['format-version' = '1', 'location' = 's3://bucket/path/database-location', 'provider' = 'iceberg', 'write.metadata.path' = 's3://bucket/path/new-metadata-loc']        |       |
|Created By                 |Spark 3.5.3-amzn-0                                                                                                                                                      |       |
+---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+

@tomtongue
Copy link
Contributor Author

tomtongue commented Jan 28, 2025

@nastra I believe you implemented the relevant part. If possible, can I ask you to review this change?

@@ -28,6 +28,7 @@ Iceberg views support properties to configure view behavior. Below is an overvie
| Property | Default | Description |
|--------------------------------------------|---------|------------------------------------------------------------------------------------|
| write.metadata.compression-codec | gzip | Metadata compression codec: `none` or `gzip` |
| write.metadata.path | table location + /metadata | Base location for metadata files |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you align the formatting of the table please?

assertThat(view).isNotNull();
assertThat(catalog().viewExists(identifier)).as("View should exist").isTrue();
assertThat(view.properties()).containsEntry("write.metadata.path", customLocation);
assertThat(((BaseView) view).operations().current().metadataFileLocation()).isNotNull();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also add a .startsWith(customLocation) check at the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thank you!

.withDefaultNamespace(identifier.namespace())
.withDefaultCatalog(catalog().name())
.withQuery("spark", "select * from ns.tbl")
.withProperty("write.metadata.path", customLocation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use the statically defined property from ViewProperties here

@@ -239,6 +239,36 @@ public void completeCreateView() {
assertThat(catalog().viewExists(identifier)).as("View should not exist").isFalse();
}

@Test
public void createViewWithCustomMetadataLocation() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also please add a test to TestViews with a custom metadata location. You should be able to do a DESCRIBE in that test and verify that the metadata locations match

Copy link
Contributor Author

@tomtongue tomtongue Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Added the test in the TestViews.

@tomtongue
Copy link
Contributor Author

Thanks for the review! I'm working on it, and will fix them.

@github-actions github-actions bot added the spark label Jan 29, 2025
@tomtongue
Copy link
Contributor Author

@nastra @amogh-jahagirdar Reflected the review by adding tests. Could you review the new changes?

| Property | Default | Description |
|----------------------------------|----------------------------|------------------------------------------------------------------------------------|
| write.metadata.compression-codec | gzip | Metadata compression codec: `none` or `gzip` |
| write.metadata.path | table location + /metadata | Base location for metadata files |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| write.metadata.path | table location + /metadata | Base location for metadata files |
| write.metadata.path | view location + /metadata | Base location for metadata files |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed this, let me fix now. Thank you!

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one remaining comment, thanks @tomtongue. I'll also leave this open for a bit so that @amogh-jahagirdar has a chance to review this too

@tomtongue
Copy link
Contributor Author

Sure. Thanks so much for the review!

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tomtongue!

@tomtongue
Copy link
Contributor Author

Thanks so much for the quick review!

@nastra nastra merged commit 2a0d5e8 into apache:main Jan 30, 2025
46 checks passed
@tomtongue tomtongue deleted the view-metadata-path branch January 30, 2025 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants