-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Add UnknownType
#12012
base: main
Are you sure you want to change the base?
API: Add UnknownType
#12012
Conversation
api/src/test/java/org/apache/iceberg/types/TestSerializableTypes.java
Outdated
Show resolved
Hide resolved
@@ -93,6 +95,10 @@ public SerializableFunction<T, T> bind(Type type) { | |||
|
|||
@Override | |||
public boolean canTransform(Type maybePrimitive) { | |||
if (maybePrimitive.typeId() == Type.TypeID.UNKNOWN) { | |||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we update the spec to reflect this? Currently Identity Transform
accepts "any" type: https://iceberg.apache.org/spec/#primitive-types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's follow the spec here. I don't think there is a real application for it, but technically it should work.
@@ -539,7 +539,8 @@ private static String sanitize(Type type, Object value, long now, int today) { | |||
case FIXED: | |||
case BINARY: | |||
case VARIANT: | |||
// for boolean, uuid, decimal, fixed, variant, and binary, match the string result | |||
case UNKNOWN: | |||
// for boolean, uuid, decimal, fixed, variant, unknown, and binary, match the string result | |||
return sanitizeSimpleString(value.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to be an issue if unknown value as a non-valid string representation? I haven't looked to hard into the potential issues here but i'm imaging a situation where a binary ends up here and it doesn't have a UTF-8 representation? Maybe not possible here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of feel like maybe Unknown should just output "Unknown" rather than anything sanitized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a good point. I've changed it to (unknown)
, similar to (time)
public void testUnknownSupported() { | ||
Schema v3Schema = new Schema(Types.NestedField.optional(1, "u", Types.UnknownType.get())); | ||
|
||
SortOrder.builderFor(v3Schema).withOrderId(10).asc("u").build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this doesn't make sense, but as @HonahX pointed out here #12012 (comment), the Unknown
type is a PrimitiveType
and is accepted by the IdentityTransform
. It doesn't make sense, but it is the same as sorting on a column that has many null
values. I think we trust the user to make a wise decision and not to sort on an UnknownType
, until it evolved in an actual type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 Since they only exist in variant doesn't that never come up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok allowing Unknown in all these transforms and such, but I don't think we actually have a case where it can happen.
Fixes #11732