-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add discriminator property support #214
Conversation
@@ -2841,15 +2841,17 @@ | |||
"modelType": { | |||
"type": "string" | |||
} | |||
} | |||
}, | |||
"required": ["modelType"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary in order for the discriminator in the spec to really be valid: a discriminator property must be a required property in all of the variant schemas. My current implementation wouldn't actually catch a mistake like this, but I figured it was best to have the test spec be valid.
T = TypeVar("T", bound="ADiscriminatedUnionType1") | ||
|
||
|
||
@_attrs_define | ||
class ADiscriminatedUnionType1: | ||
""" | ||
Attributes: | ||
model_type (Union[Unset, str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this generated file and the next one are because I changed modelType
to be required in the test spec.
raise TypeError() | ||
componentsschemas_a_discriminated_union_type_1 = ADiscriminatedUnionType1.from_dict(data) | ||
|
||
return componentsschemas_a_discriminated_union_type_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 68-72 here are boilerplate that comes from the generator's template macro for "call the deserializer for some type". The weird function name on line 67 is due to how the generator makes unique Python names for stuff in its internal data structure.
return cast( | ||
Union["ADiscriminatedUnionType1", "ADiscriminatedUnionType2", None, Unset], _parse_fn(data) | ||
) | ||
raise TypeError("unrecognized value for property modelType") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 81-89 are from my new custom template logic for discriminators.
caa81a7
to
2136c14
Compare
@@ -67,16 +75,7 @@ def build( | |||
return PropertyError(detail=f"Invalid property in union {name}", data=sub_prop_data), schemas | |||
sub_properties.append(sub_prop) | |||
|
|||
def flatten_union_properties(sub_properties: list[PropertyProtocol]) -> list[PropertyProtocol]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this helper function to line 204 (it doesn't need to be an inner function because it doesn't rely on any closure variables).
return flattened | ||
|
||
sub_properties = flatten_union_properties(sub_properties) | ||
sub_properties, discriminators_list = _flatten_union_properties(sub_properties) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flattening logic is a pre-existing behavior of the generator, causing it to treat this:
oneOf:
- $ref: "#/components/schemas/TypeA"
oneOf:
- $ref: "#/components/schemas/TypeB"
- $ref: "#/components/schemas/TypeC"
—as exactly equivalent to this:
oneOf:
- $ref: "#/components/schemas/TypeA"
- $ref: "#/components/schemas/TypeB"
- $ref: "#/components/schemas/TypeC"
But now we have to also keep track of any discriminator definitions that might have existed in a nested union. For instance, it's totally valid to have something like this—
UnionType:
oneOf:
- oneOf:
- $ref: "#/components/schemas/TypeA"
- $ref: "#/components/schemas/TypeB"
discriminator:
property: prop1
- oneOf:
- $ref: "#/components/schemas/TypeC"
- $ref: "#/components/schemas/TypeD"
discriminator:
property: prop2
In this example, TypeA and TypeB both have the property prop1, whose value will be "TypeA" or "TypeB"; but with TypeC and TypeD, the relevant property is prop2. The discriminators_list
returned by _flatten_union_properties
in this case looks like: `[DiscriminatorDefinition(property_name="prop1", value_to_model_map={"TypeA": , "TypeB": }),DiscriminatorDefinition(property_name="prop1", value_to_model_map={"TypeA": , "TypeB": }), DiscriminatorDefinition(property_name="prop2", value_to_model_map={"TypeC": , "TypeD": })]. The Jinja template logic will use this to build an appropriate piece of Python deserializer code where first it checks for prop1, then for prop2, etc.
@@ -10,7 +10,14 @@ | |||
from ...utils import PythonIdentifier | |||
from ..errors import ParseError, PropertyError | |||
from .protocol import PropertyProtocol, Value | |||
from .schemas import Schemas | |||
from .schemas import Schemas, get_reference_simple_name, parse_reference_path | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file defines UnionProperty
, which is the generator's internal class for representing any kind of anyOf
or oneOf
situation.
When parsing a schema, the generator will call UnionProperty.build
(line 38), which returns a tuple of 1. either a valid UnionProperty
or an error, and 2. an updated version of the Schemas
object that keeps track of all parsed schemas so far. This is a standard pattern used with all of the generator's ___Property
types. In general, error handling is done via this pattern instead of using exceptions.
if discriminators_list: | ||
if error := _validate_discriminators(discriminators_list): | ||
return error, schemas | ||
prop = evolve(prop, discriminators=discriminators_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using attrs.evolve
is standard in this codebase; some classes do have mutable fields, but in general mutation is avoided.
# This is needed because, when we built the union list, $refs were changed into a copy of | ||
# the type they referred to, without preserving the original name. We need to know that | ||
# every type in the discriminator is a $ref to a top-level type and we need its name. | ||
for prop in schemas.classes_by_reference.values(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on here:
The generator defines schemas.classes_by_reference
as a map that keeps track of all named schemas— that is, ones that are defined in components.schemas
rather than being anonymous inline schemas inside something else. So, for this spec—
components:
schemas:
ModelA:
# ...
ModelB:
# ...
UnionAB:
oneOf:
- $ref: "#/components/schemas/ModelA"
- $ref: "#/components/schemas/ModelB"
—the map will end up looking like this:
{
"/components/schemas/ModelA": model_property_instance_for_model_a,
"/components/schemas/ModelB": model_property_instance_for_model_b,
"/components/schemas/UnionAB": union_property_instance_for_union_ab,
}
It uses that map when it's resolving the $ref
s inside UnionAB. However, when it does that, the ModelProperty
instances that are inside of the UnionProperty
are not the same instances as model_property_instance_for_model_a
, etc. They are newly parsed ones made from a copy of the YAML data of the original declaration. The only way to find the original instance is what I'm doing here: check the class_info
attribute, which describes the Python class (ModelA
or whatever) that will be used for this schema in the generated code.
@@ -1,3 +1,36 @@ | |||
{% macro construct_inner_property(inner_property) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This macro is factored out of old lines 27-32.
raise TypeError() | ||
{{ inner_template.construct(inner_property, "data") | indent(8) }} | ||
return {{ inner_property.python_name }} | ||
{{ construct_inner_property(inner_property) | indent(8, True) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though you can see the logic here is slightly different from old lines 27-32 (it doesn't have the if/endif that are on old lines 27 and 30), I'm using the same macro for convenience. That works fine because we already know at this point, due to the test on old line 18, that inner_template.check_type_for_construct
is true.
|
||
{% macro construct_discriminator_lookup(property) %} | ||
{% set _discriminator_properties = [] -%} | ||
{% for discriminator in property.discriminators %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a result of the new logic in union.py
, `property.discriminators will look like this:
[
{
"property_name": <name of the discriminator property>,
"value_to_model_map": {
"somePossibleValue": <ModelProperty instance corresponding to that value>
}
}
]
There can be multiple discriminators due to the nested-unions situation I described in union.py
.
2136c14
to
1956f68
Compare
1956f68
to
0496ff8
Compare
9e71936
to
2941451
Compare
344ec97
to
de1ddf3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! Especially this final iteration - much easier to follow, for me.
The, perhaps not very useful commentary I have is that, this code is definitely complex.
I know this is because the spec itself is implicitly complex with how discriminators are defined/supposed to be handled 😞 .
I don't have a recommendation on how to represent this better especially because it seems to me that this implementation pretty much follows the conventions of the rest of the code base and I think that's about as good as it will be.
if data.mapping: | ||
for discriminator_value, model_ref in data.mapping.items(): | ||
if "/" in model_ref: | ||
ref_path = parse_reference_path(model_ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a nit comment: while I understand the need for this logic of differentiating between the name of a model vs the path+name.... something about this feels brittle.
I wonder if this is a general thing that can be made into a utility (this is mostly vibes tho)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, there is no other area of OpenAPI/JSON Schema where you can reference a schema by its simple name instead of its path. It's not a general thing, and there is no syntactic way to disambiguate them except by checking for a slash.
@@ -24,6 +53,7 @@ class UnionProperty(PropertyProtocol): | |||
description: str | None | |||
example: str | None | |||
inner_properties: list[PropertyProtocol] | |||
discriminators: list[DiscriminatorDefinition] | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to confirm that this means that the discriminator logic defined here will automatically work for every where polymorphism is supported in a spec including in the request/response bodies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. There isn't separate deserialization logic for request vs. response bodies; they just call from_dict
on the model class.
See description in my PR against the upstream repo: openapi-generators#1154
Why we need this fix: Both the v3 and v2 API specs make heavy use of discriminators for polymorphic types.
Without discriminator support, polymorphism will only work for types that have mutually exclusive validation. That's because, without a discriminator, it'll just try deserializing as each of the possible types in the order that they're declared, until deserialization succeeds for one of them. For instance, something like this does work without a discriminator:
—because if you try to deserialize
{"modelType": "TypeA", "prop2": "x"}
asUnionType
, it will first try to deserialize it asTypeA
, which will fail because it doesn't have aprop1
; then it'll try to deserialize it asTypeB
, which will pass. But it would not work if you had this:—because then the same data would pass deserialization for either of those types and so you would always get the first one,
TypeA
. The correct way to do something like that with a discriminator is:I tested this by running the generator locally with this branch against the current v3 API draft spec from aurelia. The spec currently has a lot of invalid uses of
discriminator
which cause appropriate complaints (like "All schema variants must be objects when using a discriminator"), but it also has a valid one in the typeBenchling.CreateEntryContentInput
, which is used by the endpointBenchling.Entry.Update
. I verified that the generated code for that endpoint now has the same kind of logic in it that's in the example filemodel_with_discriminated_union.py
in this PR.