Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problem even for small query on big schema. #569

Open
l-you opened this issue Mar 8, 2023 · 34 comments
Open

Performance problem even for small query on big schema. #569

l-you opened this issue Mar 8, 2023 · 34 comments
Milestone

Comments

@l-you
Copy link
Contributor

l-you commented Mar 8, 2023

I widely use GraphQLite for admin console in my Symphony project. I really enjoy it!
The only reason I don't use it for public API is overhead in execution speed.

For example such simple query

query ProductFeedbackManager_feedback_count(
    $input: ProductFeedbackSearchInput!
) {
    productFeedbacksCount(input: $input)
}

Such simple query have overhead of ~100-140ms comparing to simple RESTful endpoint with the same logic.

Here is some screenshots of XDebug profiler output
Screenshot 2023-03-08 at 17 20 29

Screenshot 2023-03-08 at 17 21 50

I would like to discuss about ideas how codebase could be improved in direction of execution speed.

For example, if we look deeper into AggregateControllerQueryProvider->getQueries() we would see such picture.
Screenshot 2023-03-08 at 17 34 12
Method mapReturnType takes 9% of the execution time for the entire script.
Inside mapReturnType we have many nested toGraphQLOutputType() called by type mappers. Almost 5k following the screenshot.

As I know primary work for type mappers is converting docblocks/attibutes Type to GraphQL type.
So, is there a way we could move such logic to the compile time? Or, maybe some kind of generated code that is stored in cache like in overblog/GraphQLBundle

Im pretty sure there are many things that could be optimised. But seems it cannot be done without breaking changes.

@l-you l-you changed the title Performance https://github.com/thecodingmachine/graphqliteproblem even for small query Performance problem even for small query on big schema. Mar 8, 2023
@oojacoboo
Copy link
Collaborator

So, there seems to be a few ongoing discussions around performance. I'd really like to pull all of these together and come up with something actionable. See #566 & #562. @oprypkhantc you may also find interest in this discussion.

@oprypkhantc
Copy link
Contributor

I'm just starting to integrate GraphQL into our projects. We have over 300 models and 1.4k endpoints, so if there's a performance degradation with bigger projects we'll see it pretty quick.

Given we use Swoole and hold the entire Schema in memory, alongside all possible types & fields, it's likely we'll not see any performance hit because it's all already resolved prior to any requests.

But I'll take a look some time this week to see if there's really a problem here.

@oojacoboo
Copy link
Collaborator

oojacoboo commented Mar 9, 2023

I think we should look into offering a schema caching option in the SchemaFactory. In #562 @oprypkhantc outlined what annotations are dynamic for schema generation: #562 (comment).

#[Logged], #[Right], #[FailWith], #[HideIfUnauthorized] are all handled by AuthorizationFieldMiddleware which is partly dynamic:

  • it statically changes the type of the field to nullable
  • it dynamically sets field resolver based on the authorization (which comes from the request and hides the field when not authorized

I'm not sure how many people are using these annotations. I'd argue there are probably better ways of doing all these things and that they should generally be avoided in favor of a more native PHP userland approach.

For BC reasons, we should probably keep them and can simply throw an Exception if these are used in conjunction with schema caching being enabled.


I also think we should implement something for stateless routes, something I mentioned here: #125 (comment)


I think the combination of these two would offer significant performance increases while maintaining BC and relatively trivial to implement. Would love to hear other thoughts on this. I don't currently have the time to put together a PR for these, but happy to assist where I can.

@l-you
Copy link
Contributor Author

l-you commented Mar 13, 2023

I've spend some time on investigation considering some ideas.
As I see something like #562 (comment) is rejected because of webonyx/graphql-php#1329 ?

It's hard currently to see the complete picture.
"runtime" and "compile-time" code is mixed in every inch of library.

But there should be some starting point, that would help to see possible solutions.
So, I am wondering about some kind of differentiation for "compile-time" and "runtime" code execution starting with not a big changes (but potentially breaking).

Let's take a look at field middlewares. We can organise logic for static and dynamic code as following.
(dynamic code = runtime, static code = compile-time = cached code)

/**
 * @template T
 */
interface FieldMiddlewareInterface
{
    /**
     * @return T serializable output after compilation that will be stored in long live cache, where key consist of class and definition name
     */
    public function processCompile(QueryFieldDescriptor $queryFieldDescriptor,):mixed;

    /**
     * @param T $compileOutput
     */
    public function process(mixed $compileOutput, FieldHandlerInterface $fieldHandler): FieldDefinition|null;
}

Current AuthorizationFieldMiddleware looks like this

public function process(QueryFieldDescriptor $queryFieldDescriptor, FieldHandlerInterface $fieldHandler): FieldDefinition|null
{
$annotations = $queryFieldDescriptor->getMiddlewareAnnotations();
$loggedAnnotation = $annotations->getAnnotationByType(Logged::class);
assert($loggedAnnotation === null || $loggedAnnotation instanceof Logged);
$rightAnnotation = $annotations->getAnnotationByType(Right::class);
assert($rightAnnotation === null || $rightAnnotation instanceof Right);
// Avoid wrapping resolver callback when no annotations are specified.
if (! $loggedAnnotation && ! $rightAnnotation) {
return $fieldHandler->handle($queryFieldDescriptor);
}
$failWith = $annotations->getAnnotationByType(FailWith::class);
assert($failWith === null || $failWith instanceof FailWith);
$hideIfUnauthorized = $annotations->getAnnotationByType(HideIfUnauthorized::class);
assert($hideIfUnauthorized instanceof HideIfUnauthorized || $hideIfUnauthorized === null);
if ($failWith !== null && $hideIfUnauthorized !== null) {
throw IncompatibleAnnotationsException::cannotUseFailWithAndHide();
}
// If the failWith value is null and the return type is non-nullable, we must set it to nullable.
$type = $queryFieldDescriptor->getType();
if ($failWith !== null && $type instanceof NonNull && $failWith->getValue() === null) {
$type = $type->getWrappedType();
assert($type instanceof OutputType);
$queryFieldDescriptor->setType($type);
}
// When using the same Schema instance for multiple subsequent requests, this middleware will only
// get called once, meaning #[HideIfUnauthorized] only works when Schema is used for a single request
// and then discarded. This check is to keep the latter case working.
if ($hideIfUnauthorized !== null && ! $this->isAuthorized($loggedAnnotation, $rightAnnotation)) {
return null;
}
$resolver = $queryFieldDescriptor->getResolver();
$queryFieldDescriptor->setResolver(function (...$args) use ($rightAnnotation, $loggedAnnotation, $failWith, $resolver) {
if ($this->isAuthorized($loggedAnnotation, $rightAnnotation)) {
return $resolver(...$args);
}
if ($failWith !== null) {
return $failWith->getValue();
}
if ($loggedAnnotation !== null && ! $this->authenticationService->isLogged()) {
throw MissingAuthorizationException::unauthorized();
}
throw MissingAuthorizationException::forbidden();
});
return $fieldHandler->handle($queryFieldDescriptor);
}

And here is a AuthorizationMiddleware following approach mentioned earlier.

/**
 * Middleware in charge of managing "Logged" and "Right" annotations.
 */
class AuthorizationFieldMiddleware implements FieldMiddlewareInterface
{
    public function __construct(
        private AuthenticationServiceInterface $authenticationService,
        private AuthorizationServiceInterface  $authorizationService,
    )
    {
    }

    public function processCompile(QueryFieldDescriptor $queryFieldDescriptor): array
    {
        $annotations = $queryFieldDescriptor->getMiddlewareAnnotations();

        $loggedAnnotation = $annotations->getAnnotationByType(Logged::class);
        assert($loggedAnnotation === null || $loggedAnnotation instanceof Logged);
        $rightAnnotation = $annotations->getAnnotationByType(Right::class);
        assert($rightAnnotation === null || $rightAnnotation instanceof Right);
        $failWith = $annotations->getAnnotationByType(FailWith::class);
        assert($failWith === null || $failWith instanceof FailWith);
        $hideIfUnauthorized = $annotations->getAnnotationByType(HideIfUnauthorized::class);
        assert($hideIfUnauthorized instanceof HideIfUnauthorized || $hideIfUnauthorized === null);

        if ($failWith !== null && $hideIfUnauthorized !== null) {
            throw IncompatibleAnnotationsException::cannotUseFailWithAndHide();
        }
        // If the failWith value is null and the return type is non-nullable, we must set it to nullable.
        $type = $queryFieldDescriptor->getType();
        return [
            'skipMiddleware' => !$loggedAnnotation && !$rightAnnotation,
            'wrapInNullable' => $failWith !== null && $type instanceof NonNull && $failWith->getValue() === null,
            'hideIfUnauthorized' => null !== $hideIfUnauthorized,
            'right' => $rightAnnotation?->getName(),
            'logged' => $loggedAnnotation !== null,
            'failWith' => $failWith === null ? null : [
                'value' => $failWith->getValue()
            ]
        ];
    }

    public function process(array $compiledOutput, QueryFieldDescriptor $queryFieldDescriptor, FieldHandlerInterface $fieldHandler): FieldDefinition|null
    {
        // Avoid wrapping resolver callback when no annotations are specified.
        if ($compiledOutput['skipMiddleware']) {
            return $fieldHandler->handle($queryFieldDescriptor);
        }
        if ($compiledOutput['wrapInNullable']) {
            $type = $queryFieldDescriptor->getType()->getWrappedType();
            assert($type instanceof OutputType);
            $queryFieldDescriptor->setType($type);
        }

        // When using the same Schema instance for multiple subsequent requests, this middleware will only
        // get called once, meaning #[HideIfUnauthorized] only works when Schema is used for a single request
        // and then discarded. This check is to keep the latter case working.
        if ($compiledOutput['hideIfUnauthorized'] && !$this->isAuthorized($compiledOutput)) {
            return null;
        }

        $resolver = $queryFieldDescriptor->getResolver();

        $queryFieldDescriptor->setResolver(function (...$args) use ($compiledOutput, $resolver) {
            if ($this->isAuthorized($compiledOutput)) {
                return $resolver(...$args);
            }

            if ($compiledOutput['failWith'] !== null) {
                return $compiledOutput['failWith'];
            }

            if ($compiledOutput['logged'] && !$this->authenticationService->isLogged()) {
                throw MissingAuthorizationException::unauthorized();
            }

            throw MissingAuthorizationException::forbidden();
        });

        return $fieldHandler->handle($queryFieldDescriptor);
    }

    /**
     * Checks the @Logged and @Right annotations.
     */
    private function isAuthorized(array $compiledOutput): bool
    {
        if ($compiledOutput['logged'] && !$this->authenticationService->isLogged()) {
            return false;
        }

        return $compiledOutput['right'] === null || $this->authorizationService->isAllowed($compiledOutput['right']);
    }
}

So, method processCompile executed only once at production and cached its output with opcache (If you use PhpFiledAdapter).
On the other hand method process is triggered on each request.
Such approach reduces reflection calls and related parser stuff, storing output in cache with following format:

[
          'skipMiddleware' => bool,
          'wrapInNullable' => bool,
          'hideIfUnauthorized' => bool,
          'right' => bool,
          'logged' => bool,
          'failWith' => null | [
              'value' => mixed
         ]
]

My examples shows only a general idea.
Potentially we can do the similar approach with type mappers, FieldDefiniton's and other stuff that performs static data on each request.
I believe even many internal classes can be refactored. For publicly available API we can mark legacy interfaces as deprecated.

With time there will be many classes adopted in such way, so we can see the complete picture which code is "compiled" and which is executed on each request.

Seeing the complete picture, we can potentially extract such pieces into separate classes and combine their output (they all are executed only once, at the "compile-time").
This will reduce the frequency of cache calls caused by mentioned approach.

As a result, fast and contributor-friendly library :)

@oprypkhantc
Copy link
Contributor

@rusted-love I'm not an active maintainer so I'm not making any decisions here. But here are my two cents on this:

Yes, the approach you're proposed will certainly work. Many things in graphqlite are currently getting cached a similar way I believe.

But there are many problems with this:

  • major complexity of the library in every single part of it as practically everything needs to be cached. This means that every single piece of code must think through it's own way of caching & invalidating cache, which isn't as easy as it might sound. This affects maintenance efforts and adds a lot to the difficulty of adding new features
  • hard-couples it with caching, i.e. the code is built around caching, not caching is built on top of the code. Instead of having a single CachedTypeMapper, graphqlite has multiple type mapper which have a hard dependency on cache and an unpredictable set of local (in memory) caching properties, again leading to complexity of maintenance, understanding, testing etc
  • most importantly, it will not give you the absolute best performance you can get; only a moderate improvement

Given all the efforts that are needed to support this (not small) project and lack of maintainers, I don't think it's a good idea.

This might sound counterintuitive, but in the long run, I believe it'd better to instead drop support for PHPFiles/opcache caching altogether. The reason I'm saying this is I believe PHP is slowly moving towards long running servers which don't need quick boot times, like the rest of the web industry. If you look at practically any other language/technology/stack, they all have one thing in common: they don't start a new process for every request.

This has become somewhat of a trend in the recent years since the release of RoadRunner, Spiral and just now - Laravel Octane, which made it easy to implement this in real world apps.

When eventually this becomes the norm, graphqlite (and not only) could ditch file-level caching altogether and rely on a pre-initialized instance with everything pre-resolved and not having to worry about compilation or caching. This is esentially the same "compilation" step, but without having to do it manually, handle any of the "compiled" configs, do any of the "php files" caching or invalidating anything.

Might be a bold statement, but it might even be easier for you to move to Swoole/RoadRunner/AMPhp/React or other libraries than to fully cover graphqlite with compile caching everywhere. Our projects (only the .php part of it) is 700k lines and we've managed to fully move to Swoole in about 3 months start-to-finish with only me working on it part-time.

As a bonus, you'll definitely see a big increase in RPS and reduction of response times. For us it decreased the average response times about 2x and increased RPS about 2.5x without really having to do any fancy caching.

@l-you
Copy link
Contributor Author

l-you commented Mar 16, 2023

every single piece of code must think through it's own way of caching & invalidating cache, which isn't as easy as it might sound.

Those code pieces does not care about "own way of caching & invalidating cache". The only care is "what to cache".
Everything is stored in pool provided by developer. It can be any of Psr16Cache compatible adapters. Lifetime of that cache never ends until a new deployment.

hard-couples it with caching, i.e. the code is built around caching, not caching is built on top of the code

Actually, maybe it's better instead of 'caching' name it 'compilation' for better comprehension.

Instead of having a single CachedTypeMapper, graphqlite has multiple type mapper which have a hard dependency on cache and an unpredictable set of local (in memory) caching properties, again leading to complexity of maintenance, understanding, testing etc

Yes, I completely agree that current codebase is very complex and hard to understand.
Also, I believe that proposed separation of code will help to better understand library codebase and debug process may become less complex.

most importantly, it will not give you the absolute best performance you can get; only a moderate improvement

After deeper testing with X-Debug profile I even became more confident that such approach potentially fixes most unnecessary execution overhead. Even if we take into account frequent cache pool calls for some not performant cache adapters.

When eventually this becomes the norm, graphqlite (and not only) could ditch file-level caching altogether and rely on a pre-initialized instance with everything pre-resolved and not having to worry about compilation or caching. This is esentially the same "compilation" step, but without having to do it manually, handle any of the "compiled" configs, do any of the "php files" caching or invalidating anything.

"everything pre-resolved", what exactly?
It resolves php-fpm architecture problem, removes class initialisation overhead, maybe even class reflections performance (don't know exactly).
But how it affects performance of type resolvers similar stuff if its resolved during executeQuery?

As I see it, in combination with Swoole proposed idea of logic separation gives double profit.

P.S I suspect, maybe I didn't explain clearly enough the proposal.

@l-you
Copy link
Contributor Author

l-you commented Mar 16, 2023

@oprypkhantc
You mentioned somewhere that your company is migrating 1.4k endpoints to graphql.
Would you share some performance insights of your graphql after that migration? I suppose you have some high-load endpoints.
I'm interested in how Swoole + graphqlite with that amount of controllers would behave in some high load situation.

@oprypkhantc
Copy link
Contributor

Lifetime of that cache never ends until a new deployment.

It's not that simple, unless you want to kill local development experience.

Yes, I completely agree that current codebase is very complex and hard to understand.
Also, I believe that proposed separation of code will help to better understand library codebase and debug process may become less complex.

I don't think so. The thing with compilation (if you wish to call it that) is you still have to take care of it manually, so the whole codebase would be bloated with it. When doing the same sort of caching in memory only (i.e. as long running servers allow you to), you only need build the correct architecture to reuse as many object instances as possible. That's easier to achieve and certainly is much easier to understand.

"everything pre-resolved", what exactly?

Well, quite literally everything. Vendor auto-loaded, caches warmed up, some services pre-resolved, all the configuration applied. On the GraphQL side, the whole schema AST generated & sitting in memory, controllers found and resolved through DI with dependent services, types also found and fields resolved (with parsed annotations, attributes, docblocks, types), type mapper resolved every single type found in properties & parameters & return types, all field & parameter middleware called.

It resolves php-fpm architecture problem, removes class initialisation overhead, maybe even class reflections performance (don't know exactly).
But how it affects performance of type resolvers similar stuff if its resolved during executeQuery?

That's the point - it is not resolved during executeQuery :)

Basically everything that can be taken care of ahead-of-time - already is in graphqlite if you simply share the same Schema instance across requests. With all of that resolved, webonyx/graphql-php can go through schema AST and validate the payload (not having to resolve AST for every request), then resolve fields (again, not having to parse anything or go through middlewares) and just call the resolver - i.e. literally just call the controller method (once again, not having to parse docblocks or return type or handle attributes on parameters).

As I see it, in combination with Swoole proposed idea of logic separation gives double profit.

No, it actually doesn't bring any benefits at all aside from majorly complicating things. It might even make it slower if Swoole isn't taken into account.


Would you share some performance insights of your graphql after that migration? I suppose you have some high-load endpoints.
I'm interested in how Swoole + graphqlite with that amount of controllers would behave in some high load situation.

We just now started integrating GraphQL and only have migrated 10 so far. The rest will take years.

I haven't actually tested it, but the amount of controllers/types/fields shouldn't matter. As long as you "warm" your Swoole worker's Schema instance by resolving all types & fields ahead of time, actually getting & calling an already resolved controller should take the same amount of time regardless of their amount. I suspect you can warm the whole Schema easily by running an introspection (not real code):

// done before any requests
$schema = new Schema();
$schema->runIntrospection(); // resolves all types & fields

// request comes in
$schema->resolveType(AController::class) // instantly returns the resolved type
    ->getField('someField') // again instantly returns the field
    ->resolve($payload); // does validation, then deserializes $payload into a DTO and call `someField`

// another request comes in, same schema instance, same type instance, same fields instances
$schema->resolveType(AController::class)
    ->getField('someField')
    ->resolve($payload); // again instantly

@oojacoboo
Copy link
Collaborator

oojacoboo commented Mar 18, 2023

@oprypkhantc you make some good points and I'm certainly not in disagreement that, from a performance perspective, async servers are going to offer the best performance. I don't think this is really up for debate. And, I think with some of the PRs you've submitted, and have been merged, the lib is well on it's way to accomplishing those goals. Let's keep them coming!

Also, as an aside, if you'd do us a solid in writing a doc section for using an async server with GraphQLite, that'd be awesome! I'm sure others would find this very helpful going forward.

That said, I don't think it's practical to expect everyone to migrate their entire stack, or even just their GraphQL portion of their stack, to an async server right away. There are many reasons why this might not be possible, and it's still not a norm in PHP land. Therefore, I think trying to cache what we can, now, especially from a schema compilation standpoint, is a worthwhile endeavor.

The performance hits this might have from cache hits, can be minimized and should be a consideration and objective with any improvements to caching.

@rusted-love as opposed to trying to preserve the runtime field resolving made available by only a few annotations, why don't we consider throwing exceptions within those annotations if schema level caching is enabled - maybe even look at deprecation of these annotations in favor of separate schemas. See #562 for more discussion on this topic.

I think we can achieve both objectives here.

However, I do agree with @oprypkhantc that, within a dev environment, making changes to the schema and having to clear cache within workflows is certainly not desirable. But, even with an async server, you're still going to have to recompile the schema after any changes. So, while you may not have to explicitly clear the cache, you still have longer compilation times during development efforts and have to "reboot" the server or explicitly "recompile" the schemas.

IMHO, having to clear the cache/recompile schemas during dev efforts isn't that big of a deal. We already have to clear-cache for a number of different portions of our stack during development. That would just be a caveat you'd have to deal with for the added performance benefits - again totally optional.

@oojacoboo
Copy link
Collaborator

oojacoboo commented Mar 18, 2023

As a quick test I tried to throw the schema into a PSR16 cache, but of course that's not possible with closures. I did find this library to support closure serialization though: https://github.com/opis/closure. However, I'm guessing these would have to be wrapped in webonyx, in addition to graphqlite.

Created an issue regarding recursively wrapping closures within an object: opis/closure#131

@l-you
Copy link
Contributor Author

l-you commented Mar 18, 2023

As a quick test I tried to throw the schema into a PSR16 cache, but of course that's not possible with closures. I did find this library to support closure serialization though: https://github.com/opis/closure. However, I'm guessing these would have to be wrapped in webonyx, in addition to graphqlite.

@oojacoboo
I also tried to make some quick test, but after many failed tries I decided to consider another approach.

The issue with serialization is that closures rely on local variables that often are objects with same type of closures or have circular dependencies.
So I decided maybe some kind of such logic separation mentioned in #569 (comment) would help extract static things as a starting point. Those static things later would be combined and passed to methods that resolve stuff similar to QueryFieldResolver.

That way closures can become a static methods
(kind of [SomeResplver::class, 'someClosure'], and no need for https://github.com/opis/closure) .
Data that is currently accessed through use (...) will be serializable and stored in QueryFieldResolver if we are taking this as an example.

As a side note, I have tried for a 2 days to make deep serializations of QueryFieldResolver with https://github.com/opis/closure , deeper and deeper refactored a codebase for this to make some prototype, but I failed.

Maybe, we need to replace QueryFieldResolver with some serializable alternative that will be just a factory for QueryFieldResolver.
Those alternatives contain only a static function reference and static data.
I believe any occurrence of anonymous closures can be removed from graphqlite.
There can be workaround in case resolver rely on some non cacheable thing. For example there is $context in webonyx/graphql that is passed to each resolver. The only purpose of $context is to have static closures but do not lose access to such things as Request object, which is dynamic for each request.

@rusted-love as opposed to trying to preserve the runtime field resolving made available by only a few annotations, why don't we consider throwing exceptions within those annotations if schema level caching is enabled - maybe even look at deprecation of these annotations in favor of separate schemas. See #562 for more discussion on this topic.

Do I understand it right way? You propose to disable support for #[Right] and #[FailWith] annotation? Personally, I use them in ~ 70% of fields.

@oojacoboo
Copy link
Collaborator

oojacoboo commented Mar 18, 2023

@rusted-love good to know that you've already gone down this path. I guess we were thinking along the same lines - hoping for an easier solution.

Do I understand it right way? You propose to disable support for #[Right] and #[FailWith] annotation? Personally, I use them in ~ 70% of fields.

Based on our discussions in #562 and my understanding of these annotations (we don't currently use them):

  • [FailWith] would only require that the return type doesn't need to be modified. So, it cannot make the field nullable dynamically. It can only fail with whatever return types are typed.
  • [Right] would be problematic since it causes fields to be hidden from the schema.

The argument was made by @Lappihuan, which I agree with; that changing the actual schema dynamically, is really the wrong approach and in most cases. You're better off with an authorized and unauthorized schema, or multiple schemas per role type. This is something that we could add some support for as well - making it easier for multiple schemas to be compiled based on a specific "role" value. You can then deliver the appropriate schema based on your Authorization headers, and/or by url.

Any annotation that makes modifications to an otherwise static schema, is problematic. I believe that's

I'm suggesting that these annotations throw an Exception if schema caching is enabled, so as to avoid accidentally exposing fields to unauthorized API consumers.

@oprypkhantc also mentioned that SourceField and MagicField could be problematic, but had some ideas to resolve these. See that discussion here: #562 (comment). It'd be great to get a PR on this @oprypkhantc.

Again, I think this is all doable but multiple schemas is the most logical and fool proof design IMO.

@oojacoboo
Copy link
Collaborator

For supporting multiple schemas, I was thinking we could add an additional argument, schema, to the following annotations:

  • [Query]
  • [Mutation]
  • [Type]
  • [Input]
  • [Field]
  • [SourceField]
  • [MagicField]

Then, when compiling the schema, we'd simply check this schema value and only compile the appropriate types and fields.

@l-you
Copy link
Contributor Author

l-you commented Mar 18, 2023

@oojacoboo
Personally, I don't use [HideIfUnauthorized]. Introspection is only exposed to front-end through token.

Going back to serialization problem.
I have made some confusion mixing up contextValue and rootValue meanings in last reply.
I am convinced we should widely rely on $contextValue provided by webonyx/graphql. This is a possible solution to the serialization problem.
I want to provide a small example.

Take a look at this class.

class UnionType extends \GraphQL\Type\Definition\UnionType

Its initialised here

}
$graphQlType = new UnionType($nonNullableUnionTypes, $this->recursiveTypeMapper, $this->namingStrategy);
$graphQlType = $this->typeRegistry->getOrRegisterType($graphQlType);
assert($graphQlType instanceof UnionType);

Type resolver function relies on RecursiveTypeMapperInterface that is initialised in SchemaFactory.

'types' => $types,
'resolveType' =>
static function (mixed $value) use ($typeMapper): ObjectType {
if (! is_object($value)) {
throw new InvalidArgumentException('Expected object for resolveType. Got: "' . gettype($value) . '"');
}
$className = $value::class;
$result = $typeMapper->mapClassToInterfaceOrType($className, null);
assert($result instanceof ObjectType);
return $result;
},

As specified in documentation we can access context in resolveType .
So, if we will use power of $context argument this will look like this. Graphql type with completely static and serializable data.

<?php

declare(strict_types=1);

namespace TheCodingMachine\GraphQLite\Types;

use GraphQL\Type\Definition\NamedType;
use GraphQL\Type\Definition\ObjectType;
use InvalidArgumentException;
use TheCodingMachine\GraphQLite\Mappers\RecursiveTypeMapperInterface;
use TheCodingMachine\GraphQLite\NamingStrategyInterface;
use function array_map;
use function assert;
use function gettype;
use function is_object;

class UnionType extends \GraphQL\Type\Definition\UnionType
{
    /** @param array<int,ObjectType&NamedType> $types */
    public function __construct(
        array                        $types,
        RecursiveTypeMapperInterface $typeMapper,
        NamingStrategyInterface      $namingStrategy,
    )
    {
        // Make sure all types are object types
        foreach ($types as $type) {
            if (!$type instanceof ObjectType) {
                throw InvalidTypesInUnionException::notObjectType();
            }
        }

        $typeNames = array_map(static fn(ObjectType $type) => $type->name(), $types);
        $name = $namingStrategy->getUnionTypeName($typeNames);

        parent::__construct([
            'name' => $name,
            'types' => $types,
            'resolveType' => [self::class, 'resolveTypeNonClosure']

        ]);
    }
    public static function resolveTypeNonClosure(mixed $value,mixed $context): ObjectType
    {
        if (!is_object($value)) {
            throw new InvalidArgumentException('Expected object for resolveType. Got: "' . gettype($value) . '"');
        }

        $className = $value::class;
        
        //Somewhere in root we set our context with any desired format
        //yes, currently there is no public getRecursiveTypeMapper, but we can add it
        $typeMapper = $context['schemaFactory']->getRecursiveTypeMapper();
        
        $result = $typeMapper->mapClassToInterfaceOrType($className, null);
        assert($result instanceof ObjectType);
        return $result;
    }
}

Yes, maybe somewhere in middleware or other place resolver is modified, but I believe it has a solution too.

@l-you
Copy link
Contributor Author

l-you commented Mar 18, 2023

For supporting multiple schemas, I was thinking we could add an additional argument, schema, to the following annotations:

  • [Query]
  • [Mutation]
  • [Type]
  • [Input]
  • [Field]
  • [SourceField]
  • [MagicField]

Then, when compiling the schema, we'd simply check this schema value and only compile the appropriate types and fields.

Sounds great, but there will be much work to implement this in current state.
@oojacoboo Is this proposal mean to have multiple annotations to take control of type as following?

#[Field(schema:'public',outputType:"Foo"]
#[Field(schema:'admin',outputType:"AdminOnlyFoo | Foo")]
public function getFoo(){

}

@oojacoboo
Copy link
Collaborator

oojacoboo commented Mar 18, 2023

$typeMapper = $context['schemaFactory']->getRecursiveTypeMapper();

So, the recursiveTypeMapper would then be responsible for determining the cached status? That means we'd have a cache hit as every type is evaluated?

#[Field(schema:'public',outputType:"Foo"]
#[Field(schema:'admin',outputType:"AdminOnlyFoo | Foo")]
public function getFoo(){
}

I think maybe an array of schemas makes more sense. You could also add multiple in the case where additional customization is needed for a given schema.

@l-you
Copy link
Contributor Author

l-you commented Mar 18, 2023

$typeMapper = $context['schemaFactory']->getRecursiveTypeMapper();

So, the recursiveTypeMapper would then be responsible for determining the cached status? That means we'd have a cache hit as every type is evaluated?

What do you mean?

 $context['schemaFactory']->getRecursiveTypeMapper(); 

is replacement for

 use ($typeMapper): ObjectType

Logic of that class didn't change at all. We just made graphql type serializable.
I didn't provide a part where we handle cache.
If we are talking about type mappers and their "Graphql Types" output, this may happen in mapReturnType of TypeHandler class I guess.

Serializable output of mapNameToType, toGraphQLInputType and toGraphQLOutputType gives an ability to cache it with Psr16Cache.
So, thousands of method calls is optimised if cache handler placed in right place.

Second step is to make QueryFieldDescriptor cache-able.
Currently it has types in properties and cannot be serialized. So, after our types became serializable we could consider something with QueryFieldDescriptor.
@oojacoboo mentioned some ideas and discussions earlier.

@oojacoboo
Copy link
Collaborator

oojacoboo commented Mar 18, 2023

@rusted-love I see what you mean now. That could work. Obviously we're talking about a lot of cache hits, but that's going to be significantly faster than recompiling these types. I think it's worth giving it a shot - seems doable without too much complexity. You want to put together a PR on this?

@l-you
Copy link
Contributor Author

l-you commented Mar 18, 2023

I think it's worth giving it a shot - seems doable without too much complexity. You want to put together a PR on this?

Sure! I will have time for this. Hope there is no underwater rocks.

@oojacoboo
Question is about $context of webonyx, maybe you remember. Is it exposed to users anywhere? Or maybe already in use by some graphqlite internals.

Also, currently we use executeQuery of webonyx which accepts $contextValue. Should we make a wrapper for this and update documentation?
Edit: I see we already have a Context class for prefetching, so should we use it instead?

Also, I was thinking about some kind of "all types" cache. So that we will have only one cache hit.
We can afford it after all types become serializable.
Is it possible that this wil fix #531 ?

@oprypkhantc
Copy link
Contributor

That said, I don't think it's practical to expect everyone to migrate their entire stack, or even just their GraphQL portion of their stack, to an async server right away. There are many reasons why this might not be possible, and it's still not a norm in PHP land. Therefore, I think trying to cache what we can, now, especially from a schema compilation standpoint, is a worthwhile endeavor.

I think it's worth adding: we're not using async yet, and Swoole doesn't set you a requirement on that. It does allow async through coroutines, but Laravel currently doesn't support it. Requests are handled concurrently through multiple workers, but each worker only handles one request at a time and doesn't share any state with the rest. Any state is manually cleaned up after each request, there's no magic.

In our case the PR to migrate to Swoole was only ~70 files (incl. tests & infrastructure code) with mostly fixes and small refactors to avoid stateful singletons.

[Right]: maybe resolved in this PR #571 (@oprypkhantc confirm)
[Logged]: maybe resolved in this PR #571 (@oprypkhantc confirm)

Correct, these are now working properly with a shared Schema instance, no longer checking authorization during middleware processing.

@oprypkhantc also mentioned that SourceField and MagicField could be problematic, but had some ideas to resolve these. See that discussion here: #562 (comment). It'd be great to get a PR on this @oprypkhantc.

No, those aren't an issue at all as long as middlewares only run once when a field is needed.

@oprypkhantc
Copy link
Contributor

I am convinced we should widely rely on $contextValue provided by webonyx/graphql. This is a possible solution to the serialization problem.

How do you pass it through for controller objects for Query and Mutation types? I've hit this roadblock when refactoring for immutability: an object is passed through from FieldsBuilder::getFields into each QueryField's resolver as a callble (with use ($controller)).

To avoid that, you'd also need to refactor other parts of the graphqlite for them to pass a class name and then resolve objects as needed later on. For that changes to multiple places where FieldsBuilder is used would be needed. Thoughts on this @oojacoboo @rusted-love ? I can work on this in the scope of immutability since some kind of solution is needed anyway.

@oprypkhantc
Copy link
Contributor

Currently as a workaround for passing the controller object for immutability I added a public readonly object|null $fallbackSource = null, field into QueryFieldDescriptor, then passing it through to QueryField itself and using it if $rootValue is null.

I don't like this at all and would much prefer the QueryField to only have the name of the dependency (either class name or an id that you can pull from the container).

@oojacoboo
Copy link
Collaborator

oojacoboo commented Mar 18, 2023

Question is about $context of webonyx, maybe you remember. Is it exposed to users anywhere? Or maybe already in use by some graphqlite internals.

@rusted-love the WebonyxGraphqlMiddleware has the ServerConfig::getContext ($this-config->getContext()). I'm guessing this could get passed through somehow.

Is it possible that this wil fix #531 ?

I thought that was fixed in #532.

@oprypkhantc I believe this will only cache the types and not the operation fields. But, the types and compiling those are a large portion of the cycles. As I understand it, passing through won't be an issue since it'll still use the same schema compiling, it will just check to see if the type has already been cached and use it instead of recompiling.

@oojacoboo
Copy link
Collaborator

oojacoboo commented Mar 18, 2023

[Right]: maybe resolved in this PR #571 (@oprypkhantc confirm)
[Logged]: maybe resolved in this PR #571 (@oprypkhantc confirm)

Correct, these are now working properly with a shared Schema instance, no longer checking authorization during middleware processing.

@oprypkhantc [Right] would still be an issue though since it hides the field, right? That makes it dynamic and means you cannot cache the type - correct?

@l-you
Copy link
Contributor Author

l-you commented Mar 18, 2023

@oojacoboo
[Right] does not hide a field if no [HideIfUnauthorized] attribute provided as I know

It just throws an GraphQLException. It can be caught with #[FailWith] to return null.
Schema is not affected at all.

@oprypkhantc
Copy link
Contributor

@oprypkhantc I believe this will only cache the types and not the operation fields. But, the types and compiling those are a large portion of the cycles. As I understand it, passing through won't be an issue since it'll still use the same schema compiling, it will just check to see if the type has already been cached and use it instead of recompiling.

Well even for regular types $object is used. Take a look at TypeGenerator::extendAnnotatedObject or TypeAnnotatedObjectType::createFromAnnotatedClass.

@oprypkhantc [Right] would still be an issue though since it hides the field, right? That makes it dynamic and means you cannot cache the type - correct?

Only if used with #[HideIfUnauthorized] - #[Right] doesn't hide anything by itself. If used with #[HideIfUnauthorized] then yes, it will break. Honestly I'd drop this attribute altogether as to avoid confusion on why it doesn't work under some conditions.

@l-you
Copy link
Contributor Author

l-you commented Mar 18, 2023

@rusted-love the WebonyxGraphqlMiddleware has the ServerConfig::getContext ($this-config->getContext()). I'm guessing this could get passed through somehow.

$this-config->getContext() is instance of $contextValue passed in executeQuery.
Please look a this section provided by documentation.

Maybe it's better to force all users pass context in root. This will require documentation changes.
Therefore we already do it to support prefetching.

$result = GraphQL::executeQuery($schema, $query, null, new Context(), $variableValues);

For those who register some custom webonyx types, we can allow something like this.

$context = new Context(parentContext: new MyCustomContextObject());
$result = GraphQL::executeQuery($schema, $query, null, $context, $variableValues);


//So the usage is for those people is
'resolve'=>function (mixed $value,Context $context) {
       $context->getParentContext()->....
}

I was looking about some alternatives to inject it in middleware's, but this is bad idea, because if user passed his custom context we cannot modify it with our stuff.

@oojacoboo Please, approve that small breaking change.

@oojacoboo
Copy link
Collaborator

oojacoboo commented Mar 18, 2023

@rusted-love this is only for executing queries “manually” using webonyx, right? Or are you talking about having to pass Context elsewhere?

If only for “manual” or “custom” execution, I think this is a reasonable breaking change.

@l-you
Copy link
Contributor Author

l-you commented Mar 18, 2023

@rusted-love this is only for executing queries “manually” using webonyx, right? Or are you talking about having to pass Context elsewhere?

If only for “manual” or “custom” execution, I think this is a reasonable breaking change.

@oojacoboo No, passing context will be required for everyone.
Also, I remembered another thing. It will not be used as new Context(), because now our context need access to SchemaFactory stuff.
I will provide final example of what EVERY developer will be forced to do to after that breaking change.

Updated minimal example will look like following.

<?php
use GraphQL\GraphQL;
use GraphQL\Type\Schema;
use TheCodingMachine\GraphQLite\SchemaFactory;
use TheCodingMachine\GraphQLite\Context\Context;

// $cache is a PSR-16 compatible cache.
// $container is a PSR-11 compatible container.
$factory = new SchemaFactory($cache, $container);
$factory->addControllerNamespace('App\\Controllers\\')
        ->addTypeNamespace('App\\');

$schema = $factory->createSchema();

$rawInput = file_get_contents('php://input');
$input = json_decode($rawInput, true);
$query = $input['query'];
$variableValues = isset($input['variables']) ? $input['variables'] : null;
$context = $factory->getContext(); //Context is required to make graphqlite work
$result = GraphQL::executeQuery($schema, $query, null, $context, $variableValues);
$output = $result->toArray();

header('Content-Type: application/json');
echo json_encode($output);

@Lappihuan
Copy link
Contributor

looking at webonyx/graphql-php#104 caching the schema will probably be never supported by graphql-php since they are more focused on schema first.

perhaps the processCompile stage suggested by @rusted-love could be optionally called, so it can be used for oficiall middlewares and 3rd party middlewares have time to implement it while still working but less efficient than they could be.

@rusted-love if this change is made with the context, what can be achieved with it?

@oojacoboo
Copy link
Collaborator

oojacoboo commented Mar 27, 2023

I was wondering if there might be a way to use an AST as a compilation source for caching and re-initializing webonyx somehow. I don't think webonyx supports that currently, but an AST seems like a fairly ideal caching source. However, I assume some of the mapping might be problematic from an AST without additional mapping output.

@Lappihuan
Copy link
Contributor

Lappihuan commented Mar 27, 2023

@oojacoboo You mean this?
webonyx/graphql-php#104 (comment)
The link in the issue is dead, but i think it links to this: https://webonyx.github.io/graphql-php/schema-definition-language/#performance-considerations

@oojacoboo
Copy link
Collaborator

@Lappihuan yes - exactly that actually. Has anyone tried it?

@oojacoboo
Copy link
Collaborator

oojacoboo commented Sep 7, 2024

So, @frodeborli has released a library that may resolve the closure serialization issue: opis/closure#131 (comment)

I've tried testing it, but ran into some issues - created a ticket: frodeborli/serializor#3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants