Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define datum reader/writer interfaces with variant #1635

Merged
merged 6 commits into from
Nov 4, 2024
Merged

Conversation

RCHowell
Copy link
Member

Description

This PR introduces the DatumReader and DatumWriter interfaces along with the IonDatumReader implementation. In the interest of time, I've opened the PR now to get the interfaces in – and will continue to flesh out the implementations when I have time.

Once completed, and RFC will be raised with the encoding scheme.

Other Information

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.is pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

github-actions bot commented Nov 1, 2024

CROSS-ENGINE-REPORT ❌

BASE (LEGACY-V0.14.8) TARGET (EVAL-48F3A00) +/-
% Passing 89.67% 94.39% 4.72% ✅
Passing 5287 5565 278 ✅
Failing 609 50 -559 ✅
Ignored 0 281 281 🔶
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: v0.14.8
  • Base Engine: LEGACY
  • Target Commit: 48f3a00
  • Target Engine: EVAL

Result Details

  • ❌ REGRESSION DETECTED. See Now Failing/Ignored Tests. ❌
  • Passing in both: 2643
  • Failing in both: 17
  • Ignored in both: 0
  • PASSING in BASE but now FAILING in TARGET: 3
  • PASSING in BASE but now IGNORED in TARGET: 108
  • FAILING in BASE but now PASSING in TARGET: 180
  • IGNORED in BASE but now PASSING in TARGET: 0

Now FAILING Tests ❌

The following 3 test(s) were previously PASSING in BASE but are now FAILING in TARGET:

Click here to see
  1. undefinedUnqualifiedVariableWithUndefinedVariableBehaviorMissing, compileOption: PERMISSIVE
  2. undefinedUnqualifiedVariableIsNullExprWithUndefinedVariableBehaviorMissing, compileOption: PERMISSIVE
  3. undefinedUnqualifiedVariableIsMissingExprWithUndefinedVariableBehaviorMissing, compileOption: PERMISSIVE

Now IGNORED Tests ❌

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Now Passing Tests

180 test(s) were previously failing in BASE (LEGACY-V0.14.8) but now pass in TARGET (EVAL-48F3A00). Before merging, confirm they are intended to pass.

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

CROSS-COMMIT-REPORT ✅

BASE (EVAL-BD7745F) TARGET (EVAL-48F3A00) +/-
% Passing 94.39% 94.39% 0.00% ✅
Passing 5565 5565 0 ✅
Failing 50 50 0 ✅
Ignored 281 281 0 ✅
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: bd7745f
  • Base Engine: EVAL
  • Target Commit: 48f3a00
  • Target Engine: EVAL

Result Details

  • Passing in both: 5565
  • Failing in both: 50
  • Ignored in both: 281
  • PASSING in BASE but now FAILING in TARGET: 0
  • PASSING in BASE but now IGNORED in TARGET: 0
  • FAILING in BASE but now PASSING in TARGET: 0
  • IGNORED in BASE but now PASSING in TARGET: 0

*/
public abstract class PEnum {
public abstract class Enum {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally named PEnum due to the ubiquity of java.lang.Enum. I don't mind it either way since most people don't use Enum directly, but bringing it up for awareness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I figured since not used directly, parameterized, and requires explicit import then there's little to no ambiguity.

Comment on lines 12 to 19
public interface Variant<T> extends Datum {

/**
* Unpack the inner variant value.
*
* @return T
*/
T unpack();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my assumption that PType.Kind would have another variant: VARIANT with some method to expose the parameterization. And, a Datum holding type VARIANT would allow for the invocation of two methods: a new Datum unpack() and the existing byte[] getBytes().

Doing it like this, we could still unpack the variant to a PartiQL value (Datum), and if we expose IonVariant as a public class, then Ion's API of IonElement wouldn't be exposed either. I think it would provide enough APIs for variant consumers (ourselves included) to create a DatumReader/Writer for the variant type. Also, it would allow for us to internalize Variant, IonDatumReader, IonDatumWriter, and IonVariant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this clarifies the model.

  • A variant value is a datum (partiql value) but holds an inner type (hence unpack)
  • You do not unpack the variant to a value because it already is a value – variant extends datum
  • The VARIANT PType is parameterized by a string aka variant("ion"), variant("json"), variant("cbor")
  • Unpack is to provide direct access (like old .ionValue)
  • Pack is to encode as bytes for the writer.

Hopefully this model is a bit more clear? To address the concerns

  • no need to unpack, a variant is a datum implementation
  • DatumReader/DatumWriter are the interfaces to read some encoding of PartiQL values into the Datum interface
  • Variant, IonDatumReader, IonDatumWriter, and IonVariant are explicitly public.

Copy link
Member Author

@RCHowell RCHowell Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you are describing adding variant to the datum fat interface, I thought about that but variant is parameterized on some type. I'm not sure of a better solution without parameterizing Datum or having some method Object unpack()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to unpack, a variant is a datum implementation

Yes, exactly. We've established that variant is a runtime type and should be reflected in our runtime values.

In the IonVariant impl, there is a TODO to return PType.variant("ion") on getType(). That's where it's a bit odd to me, because now, we've broken the contract between getType and any of the other getters. It seems like the contract would be better between PType.Kind.VARIANT and Datum.unpack(): Datum.

We're back to a similar .check<Variant>() as before, where we can't fully trust the relationship between Datum#getType() and the getters. For example:

// This is all pseudo-code, for brevity
class Udf : Function {
    override fun signature() = Signature("udf", params = listOf(PType.variantIon()))

    override fun eval(args: Array<Datum>) {
        val datum = args[0]
        val variant: IonVariant = datum.check<IonVariant>() // which will throw a TypeCheckException and perform the cast
        val underlyingValue: IonElement = datum.unpack()
        return // whatever we want
    }
}

Rather than:

// This is all pseudo-code, for brevity
class Udf : Function {
    override fun signature() = Signature("udf", params = listOf(PType.variantIon()))

    override fun eval(args: Array<Datum>) {
        val underlyingValue: Datum = datum.unpack() // Or, we can call datum.getBytes()
        return // whatever we want
    }
}

This is especially apparent when we want to be in charge of casting. How would we know which Datum getter to invoke when getType() returns PType.variant("ion")?

* @return next Datum or null.
*/
@Nullable
public Datum next();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't this look a bit more like the IonReader with stepIn() and stepOut()?

Kind of similar to IonReader, the constructors for IonTextReader takes in an IonValue or stream of data , and it doesn't return IonValues -- it returns primitives as you walk the data. I assumed this would be similar. They do expose an IonElementLoader which seemingly wraps a reader and creates an IonElement, but it's different than the reader itself.

As a side, this came into my mind because I saw the JSON directory placeholder. JSON doesn't allow for multiple top-level values, so the DatumReader would always just return a single Datum. As a consumer of this, you'd still need to know the in's and out's of the Datum API.

Copy link
Member Author

@RCHowell RCHowell Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reader/write are really the two functions "encode" and "decode" you could conceive of backing them by a cursor-like API, but that is not the intention here. Notice how the methods return datums and not primitives. These encode a Datum into some bytes, and can decode back into the Datum interface.

The consumer of a reader is not responsible for how the value is traversed, that's what the Datum interface allows for them. This is simply about getting bytes into and outof the Datum interface.

I don't believe you would need to know the in's and out's of Datum to use this. It just wraps a stream of one-or-more values. Yes, json is one top-level value, but there's others like

  • jsonlines
  • yaml
  • csv
  • etc.

which allow for multiple values in a stream

*
* @param datum to write.
*/
public DatumWriter write(Datum datum);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got similar thoughts as the DatumReader comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intentionally not a cursor. The functionality is really just

write(value) -> bytes
read(bytes) -> value

@RCHowell RCHowell merged commit d561ece into v1 Nov 4, 2024
12 checks passed
@RCHowell RCHowell deleted the v1-datum-rw branch November 4, 2024 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants