Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Add UnknownType #12012

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

API: Add UnknownType #12012

wants to merge 6 commits into from

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Jan 20, 2025

Fixes #11732

@Fokko Fokko requested a review from HonahX January 20, 2025 18:37
@@ -93,6 +95,10 @@ public SerializableFunction<T, T> bind(Type type) {

@Override
public boolean canTransform(Type maybePrimitive) {
if (maybePrimitive.typeId() == Type.TypeID.UNKNOWN) {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we update the spec to reflect this? Currently Identity Transform accepts "any" type: https://iceberg.apache.org/spec/#primitive-types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's follow the spec here. I don't think there is a real application for it, but technically it should work.

@HonahX HonahX linked an issue Jan 22, 2025 that may be closed by this pull request
3 tasks
@@ -539,7 +539,8 @@ private static String sanitize(Type type, Object value, long now, int today) {
case FIXED:
case BINARY:
case VARIANT:
// for boolean, uuid, decimal, fixed, variant, and binary, match the string result
case UNKNOWN:
// for boolean, uuid, decimal, fixed, variant, unknown, and binary, match the string result
return sanitizeSimpleString(value.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be an issue if unknown value as a non-valid string representation? I haven't looked to hard into the potential issues here but i'm imaging a situation where a binary ends up here and it doesn't have a UTF-8 representation? Maybe not possible here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of feel like maybe Unknown should just output "Unknown" rather than anything sanitized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good point. I've changed it to (unknown), similar to (time)

public void testUnknownSupported() {
Schema v3Schema = new Schema(Types.NestedField.optional(1, "u", Types.UnknownType.get()));

SortOrder.builderFor(v3Schema).withOrderId(10).asc("u").build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this doesn't make sense, but as @HonahX pointed out here #12012 (comment), the Unknown type is a PrimitiveType and is accepted by the IdentityTransform. It doesn't make sense, but it is the same as sorting on a column that has many null values. I think we trust the user to make a wise decision and not to sort on an UnknownType, until it evolved in an actual type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 Since they only exist in variant doesn't that never come up?

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok allowing Unknown in all these transforms and such, but I don't think we actually have a case where it can happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support UnknownType in API Support UnknownType for V3 Schema
4 participants