Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add discriminator property support #214

Merged
merged 6 commits into from
Nov 6, 2024
Merged

add discriminator property support #214

merged 6 commits into from
Nov 6, 2024

Conversation

eli-bl
Copy link

@eli-bl eli-bl commented Oct 28, 2024

See description in my PR against the upstream repo: openapi-generators#1154

Why we need this fix: Both the v3 and v2 API specs make heavy use of discriminators for polymorphic types.

Without discriminator support, polymorphism will only work for types that have mutually exclusive validation. That's because, without a discriminator, it'll just try deserializing as each of the possible types in the order that they're declared, until deserialization succeeds for one of them. For instance, something like this does work without a discriminator:

components:
  schemas:
    TypeA:
      properties:
        modelType:
          type: string
        prop1:
          type: string
      required: ["prop1"]
    TypeB:
      properties:
        modelType:
          type: string
        prop2:
          type: string
      required: ["prop2"]
    UnionType:
      oneOf:
        - $ref: "#/components/schemas/TypeA"
        - $ref: "#/components/schemas/TypeB"

—because if you try to deserialize {"modelType": "TypeA", "prop2": "x"} as UnionType, it will first try to deserialize it as TypeA, which will fail because it doesn't have a prop1; then it'll try to deserialize it as TypeB, which will pass. But it would not work if you had this:

components:
  schemas:
    TypeA:
      properties:
        modelType:
          type: string
        everyoneHasThisProp:
          type: string
      required: ["everyoneHasThisProp"]
    TypeB:
      properties:
        modelType:
          type: string
        everyoneHasThisProp:
          type: string
      required: ["everyoneHasThisProp"]
    UnionType:
      oneOf:
        - $ref: "#/components/schemas/TypeA"
        - $ref: "#/components/schemas/TypeB"

—because then the same data would pass deserialization for either of those types and so you would always get the first one, TypeA. The correct way to do something like that with a discriminator is:

components:
  schemas:
    TypeA:
      properties:
        modelType:
          type: string
        everyoneHasThisProp:
          type: string
      required: ["everyoneHasThisProp"]
    TypeB:
      properties:
        modelType:
          type: string
        everyoneHasThisProp:
          type: string
      required: ["everyoneHasThisProp"]
    UnionType:
      oneOf:
        - $ref: "#/components/schemas/TypeA"
        - $ref: "#/components/schemas/TypeB"
      discriminator:
        property: "modelType"

I tested this by running the generator locally with this branch against the current v3 API draft spec from aurelia. The spec currently has a lot of invalid uses of discriminator which cause appropriate complaints (like "All schema variants must be objects when using a discriminator"), but it also has a valid one in the type Benchling.CreateEntryContentInput, which is used by the endpoint Benchling.Entry.Update. I verified that the generated code for that endpoint now has the same kind of logic in it that's in the example file model_with_discriminated_union.py in this PR.

@eli-bl eli-bl marked this pull request as ready for review October 28, 2024 20:47
@@ -2841,15 +2841,17 @@
"modelType": {
"type": "string"
}
}
},
"required": ["modelType"]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary in order for the discriminator in the spec to really be valid: a discriminator property must be a required property in all of the variant schemas. My current implementation wouldn't actually catch a mistake like this, but I figured it was best to have the test spec be valid.

T = TypeVar("T", bound="ADiscriminatedUnionType1")


@_attrs_define
class ADiscriminatedUnionType1:
"""
Attributes:
model_type (Union[Unset, str]):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this generated file and the next one are because I changed modelType to be required in the test spec.

raise TypeError()
componentsschemas_a_discriminated_union_type_1 = ADiscriminatedUnionType1.from_dict(data)

return componentsschemas_a_discriminated_union_type_1
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 68-72 here are boilerplate that comes from the generator's template macro for "call the deserializer for some type". The weird function name on line 67 is due to how the generator makes unique Python names for stuff in its internal data structure.

return cast(
Union["ADiscriminatedUnionType1", "ADiscriminatedUnionType2", None, Unset], _parse_fn(data)
)
raise TypeError("unrecognized value for property modelType")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 81-89 are from my new custom template logic for discriminators.

@@ -67,16 +75,7 @@ def build(
return PropertyError(detail=f"Invalid property in union {name}", data=sub_prop_data), schemas
sub_properties.append(sub_prop)

def flatten_union_properties(sub_properties: list[PropertyProtocol]) -> list[PropertyProtocol]:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this helper function to line 204 (it doesn't need to be an inner function because it doesn't rely on any closure variables).

return flattened

sub_properties = flatten_union_properties(sub_properties)
sub_properties, discriminators_list = _flatten_union_properties(sub_properties)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flattening logic is a pre-existing behavior of the generator, causing it to treat this:

  oneOf:
    - $ref: "#/components/schemas/TypeA"
    oneOf:
      - $ref: "#/components/schemas/TypeB"
      - $ref: "#/components/schemas/TypeC"

—as exactly equivalent to this:

  oneOf:
    - $ref: "#/components/schemas/TypeA"
    - $ref: "#/components/schemas/TypeB"
    - $ref: "#/components/schemas/TypeC"

But now we have to also keep track of any discriminator definitions that might have existed in a nested union. For instance, it's totally valid to have something like this—

  UnionType:
    oneOf:
      - oneOf:
        - $ref: "#/components/schemas/TypeA"
        - $ref: "#/components/schemas/TypeB"
        discriminator:
          property: prop1
      - oneOf:
        - $ref: "#/components/schemas/TypeC"
        - $ref: "#/components/schemas/TypeD"
        discriminator:
          property: prop2

In this example, TypeA and TypeB both have the property prop1, whose value will be "TypeA" or "TypeB"; but with TypeC and TypeD, the relevant property is prop2. The discriminators_list returned by _flatten_union_properties in this case looks like: `[DiscriminatorDefinition(property_name="prop1", value_to_model_map={"TypeA": , "TypeB": }),DiscriminatorDefinition(property_name="prop1", value_to_model_map={"TypeA": , "TypeB": }), DiscriminatorDefinition(property_name="prop2", value_to_model_map={"TypeC": , "TypeD": })]. The Jinja template logic will use this to build an appropriate piece of Python deserializer code where first it checks for prop1, then for prop2, etc.

@@ -10,7 +10,14 @@
from ...utils import PythonIdentifier
from ..errors import ParseError, PropertyError
from .protocol import PropertyProtocol, Value
from .schemas import Schemas
from .schemas import Schemas, get_reference_simple_name, parse_reference_path

Copy link
Author

@eli-bl eli-bl Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file defines UnionProperty, which is the generator's internal class for representing any kind of anyOf or oneOf situation.

When parsing a schema, the generator will call UnionProperty.build (line 38), which returns a tuple of 1. either a valid UnionProperty or an error, and 2. an updated version of the Schemas object that keeps track of all parsed schemas so far. This is a standard pattern used with all of the generator's ___Property types. In general, error handling is done via this pattern instead of using exceptions.

if discriminators_list:
if error := _validate_discriminators(discriminators_list):
return error, schemas
prop = evolve(prop, discriminators=discriminators_list)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using attrs.evolve is standard in this codebase; some classes do have mutable fields, but in general mutation is avoided.

# This is needed because, when we built the union list, $refs were changed into a copy of
# the type they referred to, without preserving the original name. We need to know that
# every type in the discriminator is a $ref to a top-level type and we need its name.
for prop in schemas.classes_by_reference.values():
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here:

The generator defines schemas.classes_by_reference as a map that keeps track of all named schemas— that is, ones that are defined in components.schemas rather than being anonymous inline schemas inside something else. So, for this spec—

components:
  schemas:
    ModelA:
      # ...
    ModelB:
      # ...
    UnionAB:
      oneOf:
      - $ref: "#/components/schemas/ModelA"
      - $ref: "#/components/schemas/ModelB"

—the map will end up looking like this:

{
  "/components/schemas/ModelA": model_property_instance_for_model_a,
  "/components/schemas/ModelB": model_property_instance_for_model_b,
  "/components/schemas/UnionAB": union_property_instance_for_union_ab,
}

It uses that map when it's resolving the $refs inside UnionAB. However, when it does that, the ModelProperty instances that are inside of the UnionProperty are not the same instances as model_property_instance_for_model_a, etc. They are newly parsed ones made from a copy of the YAML data of the original declaration. The only way to find the original instance is what I'm doing here: check the class_info attribute, which describes the Python class (ModelA or whatever) that will be used for this schema in the generated code.

@@ -1,3 +1,36 @@
{% macro construct_inner_property(inner_property) %}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro is factored out of old lines 27-32.

raise TypeError()
{{ inner_template.construct(inner_property, "data") | indent(8) }}
return {{ inner_property.python_name }}
{{ construct_inner_property(inner_property) | indent(8, True) }}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though you can see the logic here is slightly different from old lines 27-32 (it doesn't have the if/endif that are on old lines 27 and 30), I'm using the same macro for convenience. That works fine because we already know at this point, due to the test on old line 18, that inner_template.check_type_for_construct is true.


{% macro construct_discriminator_lookup(property) %}
{% set _discriminator_properties = [] -%}
{% for discriminator in property.discriminators %}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result of the new logic in union.py, `property.discriminators will look like this:

[
  {
    "property_name": <name of the discriminator property>,
    "value_to_model_map": {
      "somePossibleValue": <ModelProperty instance corresponding to that value>
    }
  }
]

There can be multiple discriminators due to the nested-unions situation I described in union.py.

Copy link

@damola-benchling damola-benchling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Especially this final iteration - much easier to follow, for me.

The, perhaps not very useful commentary I have is that, this code is definitely complex.
I know this is because the spec itself is implicitly complex with how discriminators are defined/supposed to be handled 😞 .
I don't have a recommendation on how to represent this better especially because it seems to me that this implementation pretty much follows the conventions of the rest of the code base and I think that's about as good as it will be.

Comment on lines +296 to +299
if data.mapping:
for discriminator_value, model_ref in data.mapping.items():
if "/" in model_ref:
ref_path = parse_reference_path(model_ref)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nit comment: while I understand the need for this logic of differentiating between the name of a model vs the path+name.... something about this feels brittle.
I wonder if this is a general thing that can be made into a utility (this is mostly vibes tho)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, there is no other area of OpenAPI/JSON Schema where you can reference a schema by its simple name instead of its path. It's not a general thing, and there is no syntactic way to disambiguate them except by checking for a slash.

@@ -24,6 +53,7 @@ class UnionProperty(PropertyProtocol):
description: str | None
example: str | None
inner_properties: list[PropertyProtocol]
discriminators: list[DiscriminatorDefinition] | None = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to confirm that this means that the discriminator logic defined here will automatically work for every where polymorphism is supported in a spec including in the request/response bodies?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There isn't separate deserialization logic for request vs. response bodies; they just call from_dict on the model class.

@eli-bl eli-bl merged commit d76c0fc into 2.x Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants