Alternative approach to lazy type loading #1046

pvandommelen · 2021-12-29T16:35:13Z

pvandommelen
Dec 29, 2021

Hey,

Our current graphql endpoint using webonyx/graphql-php makes use of a large (partially generated) schema (1k types, 10k fields) using the standard approach where types are recursively defined. This has become a performance problem because of the initialization that happens on every request. We initialize the schema on every query, the graphql endpoint could run as a server ofcourse, but that has other challenges.

The suggested solution of using a type loader in this scenario is not great.

It's an all or nothing approach. When using the type loader, you have to be able to resolve all types in your schema. It is a lot of additional work to move all definitions into a TypeRepository instead of being able to make inline definitions.
There is also the issue of code coupling between different components of the schema. In the end they will all need to be loaded using a single loader.

Some developer-friendly solutions can be created to help with the second point. Different parts of the schema could be made responsible for loading different types. But this does not help with the first point and there is little help from the library as the type loader is not standardized. Understandable, because the current implementation takes the run-as-server case as a starting point where lazy type loading plays no role.

Looking through the code (which I'm quite impressed by, thanks!), I noticed that the use of the type loader is limited. Within a regular query (so no schema/type introspection) there are only a couple of usages. This makes me hopeful that we may be able to find a way to lazily load the necessary types during most queries without the use of the type loader. For those cases where it's still necessary, the user supplied loader would still be useful. This would remove the complexity that implementing a type loader in userland would have.

The most common usage (within Executor\ReferenceExecutor::completeValue) is a check that the type is the exact same object as the one currently available through the schema with its type loader. This is only relevant during development. I believe this runtime assertion can be removed as it is also duplicated within the schema validator step which can be executed statically.
Another (Executor\ReferenceExecutor::ensureValidRuntimeType, called from completeAbstractValue) loads a type if it is referenced as a string. When this occurs it is not strange that the type is loaded. The current documentation does not mention that types are (only when resolved in interfaces?) able to be referenced by their name, but it probably happens.
Two other calls within that same function assert that the resolved type from the interface exists within the schema, and then also asserts that it's the same instance. Only relevant during development. This one is problematic though as it cannot be changed to a static check without changing the API. A debug flag to turn this on/off could also be a solution, but I don't think this exists within the current library. Any ideas?
Then there are some calls within the Type\Definition\QueryPlan class. I'm not familiar with this code, but I think it's reasonable to expect that it will cause the type loader to be used.
Another usage is when resolving types from query fragments. Similar to the resolve of interfaces this is problematic as the dependent types can't statically be determined from the base type.
Then there is the schema/type introspection and full schema validations. It would be fine to use the type loader or load the full schema in those cases. Though in my POC I encountered some issues in tests when constructing a schema object from a schema definition, but I'm expecting any issues there can be resolved.

In a POC I've been able to replicate the benchmark numbers for the HugeSchemaBench::benchSmallQueryLazy benchmark without the use of a custom type loader.

My conclusion: It should be possible to remove the usages of the type loader within common queries. This would drastically improve the performance without the hit on developer experience in the situtation where the graphql endpoint is not used as a server process. However, this would lose some runtime checks which are only partially replaced by an existing static check. The method of resolving concrete types from abstract types/interfaces is especially problematic, but I expect it can be done without breaking backwards compatibility. Though it may be beneficial to consider making this resolve step statically available without runtime checks in the future (but this would break consistency with the graphql-js reference implementation).

Would it be useful to pursue this? I can turn my POC into a PR, but it's not entirely trivial and would have some impact for which it would be nice to have some support.

spawnia · 2021-12-30T11:22:46Z

spawnia
Dec 30, 2021
Maintainer

In order to have solid grounds for this discussion, let's disambiguate the terminology.

I consider a type loader to be a user defined function with the signature (string $name): ?Type, passed as the configuration option typeLoader to a Schema. It replaces or supplements the use of the configuration option types, removing the need to eagerly instantiate all possible types.

In the latter half of your post, you seem to refer to the method Schema::getType() as type loading, which may internally use the configured typeLoader, types or a combination thereof. This is essential functionality, types are required at runtime to resolve fields and to check their proper resolution.

Please reformulate your proposal by clearly disambiguating between the configuration option typeLoader and the built-in method Schema::getType().

It's an all or nothing approach.

You can combine both types and typeLoader.

In the end they will all need to be loaded using a single loader.

It would not be too difficult to wire up multiple separate type loaders into one.

function combinedTypeLoader(string $name): ?Type {
    foreach ($typeLoaders as $typeLoader) {
        if ($type = $typeLoader($name)) return $type;
    }

    return null;
}

there is little help from the library as the type loader is not standardized

The Schema already takes care of the persistence part of type loading, I don't know how a type loader could be any simpler than the used signature.

0 replies

pvandommelen · 2021-12-30T13:31:25Z

pvandommelen
Dec 30, 2021
Author

To clarify why I referred to Schema::getType() as type loading. For the purpose of getting a type from a name, the Schema works as a cache around the type loader. Excluding the predefined types from the configuration which is an uncommon source of types (and must be an uncommon source when combined with the goal of a lazily constructed schema). So in a lazily constructed schema, getType will almost always call the user defined type loader.

The signature of the user defined type loader is indeed simple. An implementation of an aggregate loader is also quite trivial, as you have shown. However, I think the userland implementation of a type loader is anything but simple. It requires a significant amount of code which mirrors the type definition elsewhere, violating the DRY principle.

This is obviously the trade-off that the current solution using the type loader makes. And that is absolutely fine, the benefit is there. But I do not think it is necessary.

Please reformulate your proposal by clearly disambiguating between the configuration option typeLoader and the built-in method Schema::getType().

I propose to remove the calls of Schema::getType() during the resolve of most queries. Since this call will internally call the type loader (if not already cached or loaded through the types configuration), it would become possible to have a fast lazy schema without any type loader implementation created by the user.

Entirely removing those calls is unlikely to be an option, but it would be nice to have the option available in production/non-debug. If it helps, I've attached a patch with the very rough direction I'm thinking of.
example.txt

Schema::getType() as ... . This is essential functionality, types are required at runtime to resolve fields and to check their proper resolution.

This is probably the core point of this discussion. I'm not convinced this is necessary. The getType method on the schema is absolutely vital when doing schema introspection. But during a "normal" query all the necessary types are already available! The types are already directly defined on their parent type. We would miss out on the "proper resolution" assertion, but I'm hoping it is enough for those to be part of the static assertion in Schema::assertValid.

0 replies

spawnia · 2021-12-30T17:22:53Z

spawnia
Dec 30, 2021
Maintainer

We can remove some redundant checks at runtime and minimize the calls to Schema::getType(), but we will never be able to get rid of them completely. For example, they will always be required in the validation rule KnownTypeNames to validate the types of variables in the query exist:

query ($foo: Foo <-- has to be looked up by name) { ... }

0 replies

pvandommelen · 2021-12-31T11:13:41Z

pvandommelen
Dec 31, 2021
Author

Good catch. Though it would probably be acceptable to remove that validation in production if you are not worried about creating nice error message for those using the api.

There are some other usages where I'm currently thinking it should be possible to change their logic to work with the available type object (fragment matching in resolvers).
Working with fragments in the QueryPlan can probably not avoid getting the type from the schema (unless we have a method of statically getting implementors from an abstract type). Another is schema extensions, which will not be possible, but probably less important,

I think it is possible, but any performance bugs from this proposal will be hard to catch. I also think that a method of statically definining implementors on an abstract type will become a necessity.

Do you see any way this approach would work out? I think it would help us and other users which are executing queries like this (would it be useful to demonstrate this through a screen sharing session?). Otherwise we'll just have to do the alternative using the current mechanism of the type loader.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative approach to lazy type loading #1046

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Alternative approach to lazy type loading #1046

pvandommelen Dec 29, 2021

Replies: 4 comments

spawnia Dec 30, 2021 Maintainer

pvandommelen Dec 30, 2021 Author

spawnia Dec 30, 2021 Maintainer

pvandommelen Dec 31, 2021 Author

pvandommelen
Dec 29, 2021

spawnia
Dec 30, 2021
Maintainer

pvandommelen
Dec 30, 2021
Author

spawnia
Dec 30, 2021
Maintainer

pvandommelen
Dec 31, 2021
Author