-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support generic dataclasses #172
base: master
Are you sure you want to change the base?
Conversation
@lovasoa does this seem like the right approach to you? I wanted to keep the previous behaviour of not generating a schema for a generic dataclass if it does not specify any type arguments, but if they do, it feels like it would be a good feature for marshmallow-dataclass to be able to create a schema, respecting the given type arguments |
I don't know how easy it is to do, but it would be nice to generalize some more. As an example, currently, with this PR, the following does not work, though there are no unfixed type parameters. from dataclasses import dataclass
from typing import Generic
from typing import TypeVar
from marshmallow_dataclass import class_schema
T = TypeVar("T")
@dataclass
class Simple(Generic[T]):
x: T
@dataclass
class Nested(Generic[T]):
nested: Simple[T]
schema_class = class_schema(Nested[int]) (⇒ |
Hello ! Sorry for the delay ! Looks like there is a type issue on python 3.7 |
Sorry for the delay here, I will take a look to the mypy complaint and will see if it is low lift to support nested generic dataclasses. It might take a while until I find the time so feel free to close till then |
Will this get merged? Is there a workaround without mergin this PR? |
I will be happy to merge it when conflicts are fixed and the tests pass. |
@@ -354,21 +356,27 @@ def class_schema( | |||
del current_frame | |||
_RECURSION_GUARD.seen_classes = {} | |||
try: | |||
return _internal_class_schema(clazz, base_schema, clazz_frame) | |||
return _internal_class_schema(clazz, base_schema, clazz_frame, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have to add None
here explicitly so we hit the cache for _internal_class_schema
. lru_cache uses all given params as key, not adding the default values for not specified params
@lovasoa should be good for a review, I haven't ran tests for older python versions though. Appreciate if you could kick off CI again |
I have also added a test to cover @dairiki 's comment. It should support nested generics now and it should only work when the same concrete type is used for the same TypeVar |
@onursatici Tests were failing due to pre-commit hook bitrot/breakage. I've fixed that and merged those fixes here. Now there is a test failure for python < 3.10. I haven't looked at it carefully yet, to see what the fix (or even the cause) might be. |
Thanks a lot @dairiki , it seems like |
Also this implementation has a bug for repeated use of the same generic dataclasses. Example: # class B is a generic dataclass
class A:
a: B[str]
a: B[int]
c: B[int] Because marshmallow_dataclass supports forward references, the field generated for c would be a one ugly fix would be reordering fields: class A:
a: B[int]
c: B[int]
a: B[str] now field Fixing this in general without breaking forward reference support is a bit tricky, because marshmallow's class registry is oblivious to generic arguments. We can accept that this is not something we support or log warning, but that would require keeping track of extra state (which variant of this generic class have we seen first, and is it different than the variant we are trying to use a forward reference for now, if so, log warning) |
@dairiki fixed conflicts and the failing tests for python < 3.10. Also added a test for my comment above |
@onursatici Thank you! The tests under python 3.6 had started failing due to the upgrade of the There's still a test failure under python 3.6. (I haven't looked at it in detail.) |
@dairiki py3.6 should be fine now based on my local testing. Would be nice to kick CI one more time 🤞 |
@onursatici I still haven't found time to dig into this deeply, so these may be stupid questions, but...
Does this example depend on having the repeated
Why doesn't this break the (second) What happens if there are multiple dataclasses with class A1:
x: B[int]
SchemaA1 = class_schema(A1)
class A2:
y: B[str]
SchemaA2 = class_schema(A2) Does the same issue pertain?
What would fixing that involve?
This does seem like creepy behavior, especially so if we can't issue a warning when someone bumps into it. |
@dairiki here is my understanding:
my bad, the issue happens when the fields have distinct names, so the example should have been: # class B is a generic dataclass
# this is not fine
class A:
a: B[str]
b: B[int]
c: B[int]
# this is fine
class A:
b: B[int]
c: B[int]
a: B[str]
As far as I understand it goes something like this: # comments explain the field schemas `class_schema` derives from the dataclass
class A:
a: B[str] # field schema correctly created as B[str]
b: B[int] # field schema correctly created as B[int]
c: B[int] # B[int] is in `seen_classes`, field schema is Nested(B)
# when we do this:
marshmallow_dataclass.class_schema(A)().load(a_object)
# marshmallow does:
# load a first, add its type `B` to class_registry, so the registry has the mapping `B -> schema of B[str]`
# load b, use given field schema, which is for B[int]
# load c, this has a Nested(B) schema, and the deserialiser stored in class_registry for B is for B[str], so this fails
# when we reorder fields like this:
class A:
b: B[int] # field schema correctly created as B[int]
c: B[int] # B[int] is in `seen_classes`, field schema is Nested(B)
a: B[str] # field schema correctly created as B[str]
# now when we do this:
marshmallow_dataclass.class_schema(A)().load(a_object)
# marshmallow does:
# load b first, add its type `B` to class_registry, so the registry has the mapping `B -> schema of B[int]`
# load c, this has a Nested(B) schema, and the deserialiser stored in class_registry for B is for B[int], so this succeeds
# load a, use given field schema, which is for B[str]
I don't think it will because none of these classes would have a field schema of
Some thoughts:
|
Aha! If I understand correctly, here's what's going on. Currently, we store the class name in The
Maybe the fix is to somehow cache the schema instance (rather than or in addition to schema class name) in the _RECURSION_GUARD thread-local. (Or maybe cache the Marshmallow field instance?) Also of note, is these problem cases seem to work in older python versions. (Or at least behave differently.) |
So the name under which a schema gets entered in Marshmallow's class_registry comes from the name specified when the schema class is constructed here. That is whatever "name" we picked up here, which is the same name that gets remembered in _RECURSION_GUARD.seen_classes. That "name" is, e.g., "B" in python >= 3.10 and "<module>.B[int]" in earlier pythons — the later (usually) works, since it includes the generic parameter types, the former doesn't. So, just including the generic type parameters within the "name" string is enough to mostly fix the issue. @onursatici I can look at this further to see what a clean fix might be, but I likely won't have time to get to that until after New Year's sometime. (In the meantime, if you figure out something, have at it!) |
@dairiki Thanks for having a look! Yea caching the schema class makes sense to me. I can have a go at it, hopefully soon if I can make the time, will let you know if I can't so we don't do double work |
Currently, this PR works, I think, if the dataclass fields have types that are one of:
It does not appear to work when fields have other generic types, e.g. T = TypeVar("T")
@dataclasses.dataclass
class Z(Generic[T]):
z: List[T]
class_schema(Z[int]) # -> TypeError("T is not a dataclass and cannot be turned into one.") Since we are already passing eta: also, what about: @dataclasses.dataclass
class ZZ(Generic[T]):
z: List[Tuple[T, T]]
t: Z[Tuple[T, T]] |
If generic parameters are not specified for a generic dataclass field types, should we assume E.g., with the current state of this PR: T = TypeVar("T")
@dataclasses.dataclass
class GenericDataclass(Generic[T]):
x: T
@dataclasses.dataclass
class Bar:
x: GenericDataclass[int]
y: List # (interpreted as List[Any])
class_schema(Bar) # this works
@dataclasses.dataclass
class Foo:
fu: GenericDataclass
class_schema(Foo) # throws => TypeError("T is not a dataclass and cannot be turned into one.") |
As the name (RECURSION_GUARD) suggests, we can't do exactly this, since the whole point here is to be able to deal with recursive schemas. At the time we create the Nested field, we have not necessarily finished compiled the schema for the nested type. One way around this is that I've forked your onursatici:os/generic-dataclasses branch and, I think fixed this issue here in dairiki:generic-dataclasses. |
Implements #230 |
Manually merge work in PR lovasoa#172 by @onursatici to support generic dataclasses Refactor _field_for_schema to reduce complexity.
Manually merge work in PR lovasoa#172 by @onursatici to support generic dataclasses Refactor _field_for_schema to reduce complexity.
Manually merge work in PR lovasoa#172 by @onursatici to support generic dataclasses Refactor _field_for_schema to reduce complexity.
Thanks for having a look, it looks neat. I think I can have a look at this next week, and add the tests for generic types other than dataclasses. Or If you wish you can also take this PR over, both work for me |
@onursatici So I've been drawn down the rabbit-hole and have started a larger refactor of the marshmallow_code. I've just created a draft PR for that (with a few notes) at #232. The work there pulls in the work here, as well as loads of other stuff. It also fixes things to work for @dataclasses.dataclass
class ZZ(Generic[T]):
z: List[Tuple[T, T]]
t: Z[Tuple[T, T]] as noted above by moving the resolution of TypeVars to If you're in a hurry to have support for generic dataclasses, I say we should fix this PR and merge it (I think it's almost there) rather than wait for #232. But if you're not in a big rush, let's wait. |
ah I see, yea that makes sense. Sure I can wait, thanks for doing this |
We would love for this to go in soon! Our use case is to have a generic paginated response shape that can be marshaled out to JSON Something like: import typing
from dataclasses import dataclass
Result = typing.TypeVar("Result")
@dataclass
class PaginatedResult(typing.Generic[Result]):
results: list[Result]
next_page_token: typing.Optional[str] |
Previously, generic dataclasses were not supported even if they had type arguments, now it is possible to do the following:
It is also possible to use such generic aliases nested in another dataclass: