-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance problem even for small query on big schema. #569
Comments
So, there seems to be a few ongoing discussions around performance. I'd really like to pull all of these together and come up with something actionable. See #566 & #562. @oprypkhantc you may also find interest in this discussion. |
I'm just starting to integrate GraphQL into our projects. We have over 300 models and 1.4k endpoints, so if there's a performance degradation with bigger projects we'll see it pretty quick. Given we use Swoole and hold the entire Schema in memory, alongside all possible types & fields, it's likely we'll not see any performance hit because it's all already resolved prior to any requests. But I'll take a look some time this week to see if there's really a problem here. |
I think we should look into offering a schema caching option in the
I'm not sure how many people are using these annotations. I'd argue there are probably better ways of doing all these things and that they should generally be avoided in favor of a more native PHP userland approach. For BC reasons, we should probably keep them and can simply throw an Exception if these are used in conjunction with schema caching being enabled. I also think we should implement something for stateless routes, something I mentioned here: #125 (comment) I think the combination of these two would offer significant performance increases while maintaining BC and relatively trivial to implement. Would love to hear other thoughts on this. I don't currently have the time to put together a PR for these, but happy to assist where I can. |
I've spend some time on investigation considering some ideas. It's hard currently to see the complete picture. But there should be some starting point, that would help to see possible solutions. Let's take a look at field middlewares. We can organise logic for static and dynamic code as following. /**
* @template T
*/
interface FieldMiddlewareInterface
{
/**
* @return T serializable output after compilation that will be stored in long live cache, where key consist of class and definition name
*/
public function processCompile(QueryFieldDescriptor $queryFieldDescriptor,):mixed;
/**
* @param T $compileOutput
*/
public function process(mixed $compileOutput, FieldHandlerInterface $fieldHandler): FieldDefinition|null;
} Current graphqlite/src/Middlewares/AuthorizationFieldMiddleware.php Lines 32 to 89 in 5a9c03b
And here is a /**
* Middleware in charge of managing "Logged" and "Right" annotations.
*/
class AuthorizationFieldMiddleware implements FieldMiddlewareInterface
{
public function __construct(
private AuthenticationServiceInterface $authenticationService,
private AuthorizationServiceInterface $authorizationService,
)
{
}
public function processCompile(QueryFieldDescriptor $queryFieldDescriptor): array
{
$annotations = $queryFieldDescriptor->getMiddlewareAnnotations();
$loggedAnnotation = $annotations->getAnnotationByType(Logged::class);
assert($loggedAnnotation === null || $loggedAnnotation instanceof Logged);
$rightAnnotation = $annotations->getAnnotationByType(Right::class);
assert($rightAnnotation === null || $rightAnnotation instanceof Right);
$failWith = $annotations->getAnnotationByType(FailWith::class);
assert($failWith === null || $failWith instanceof FailWith);
$hideIfUnauthorized = $annotations->getAnnotationByType(HideIfUnauthorized::class);
assert($hideIfUnauthorized instanceof HideIfUnauthorized || $hideIfUnauthorized === null);
if ($failWith !== null && $hideIfUnauthorized !== null) {
throw IncompatibleAnnotationsException::cannotUseFailWithAndHide();
}
// If the failWith value is null and the return type is non-nullable, we must set it to nullable.
$type = $queryFieldDescriptor->getType();
return [
'skipMiddleware' => !$loggedAnnotation && !$rightAnnotation,
'wrapInNullable' => $failWith !== null && $type instanceof NonNull && $failWith->getValue() === null,
'hideIfUnauthorized' => null !== $hideIfUnauthorized,
'right' => $rightAnnotation?->getName(),
'logged' => $loggedAnnotation !== null,
'failWith' => $failWith === null ? null : [
'value' => $failWith->getValue()
]
];
}
public function process(array $compiledOutput, QueryFieldDescriptor $queryFieldDescriptor, FieldHandlerInterface $fieldHandler): FieldDefinition|null
{
// Avoid wrapping resolver callback when no annotations are specified.
if ($compiledOutput['skipMiddleware']) {
return $fieldHandler->handle($queryFieldDescriptor);
}
if ($compiledOutput['wrapInNullable']) {
$type = $queryFieldDescriptor->getType()->getWrappedType();
assert($type instanceof OutputType);
$queryFieldDescriptor->setType($type);
}
// When using the same Schema instance for multiple subsequent requests, this middleware will only
// get called once, meaning #[HideIfUnauthorized] only works when Schema is used for a single request
// and then discarded. This check is to keep the latter case working.
if ($compiledOutput['hideIfUnauthorized'] && !$this->isAuthorized($compiledOutput)) {
return null;
}
$resolver = $queryFieldDescriptor->getResolver();
$queryFieldDescriptor->setResolver(function (...$args) use ($compiledOutput, $resolver) {
if ($this->isAuthorized($compiledOutput)) {
return $resolver(...$args);
}
if ($compiledOutput['failWith'] !== null) {
return $compiledOutput['failWith'];
}
if ($compiledOutput['logged'] && !$this->authenticationService->isLogged()) {
throw MissingAuthorizationException::unauthorized();
}
throw MissingAuthorizationException::forbidden();
});
return $fieldHandler->handle($queryFieldDescriptor);
}
/**
* Checks the @Logged and @Right annotations.
*/
private function isAuthorized(array $compiledOutput): bool
{
if ($compiledOutput['logged'] && !$this->authenticationService->isLogged()) {
return false;
}
return $compiledOutput['right'] === null || $this->authorizationService->isAllowed($compiledOutput['right']);
}
} So, method [
'skipMiddleware' => bool,
'wrapInNullable' => bool,
'hideIfUnauthorized' => bool,
'right' => bool,
'logged' => bool,
'failWith' => null | [
'value' => mixed
]
] My examples shows only a general idea. With time there will be many classes adopted in such way, so we can see the complete picture which code is "compiled" and which is executed on each request. Seeing the complete picture, we can potentially extract such pieces into separate classes and combine their output (they all are executed only once, at the "compile-time"). As a result, fast and contributor-friendly library :) |
@rusted-love I'm not an active maintainer so I'm not making any decisions here. But here are my two cents on this: Yes, the approach you're proposed will certainly work. Many things in But there are many problems with this:
Given all the efforts that are needed to support this (not small) project and lack of maintainers, I don't think it's a good idea. This might sound counterintuitive, but in the long run, I believe it'd better to instead drop support for PHPFiles/opcache caching altogether. The reason I'm saying this is I believe PHP is slowly moving towards long running servers which don't need quick boot times, like the rest of the web industry. If you look at practically any other language/technology/stack, they all have one thing in common: they don't start a new process for every request. This has become somewhat of a trend in the recent years since the release of RoadRunner, Spiral and just now - Laravel Octane, which made it easy to implement this in real world apps. When eventually this becomes the norm, Might be a bold statement, but it might even be easier for you to move to Swoole/RoadRunner/AMPhp/React or other libraries than to fully cover As a bonus, you'll definitely see a big increase in RPS and reduction of response times. For us it decreased the average response times about 2x and increased RPS about 2.5x without really having to do any fancy caching. |
Those code pieces does not care about "own way of caching & invalidating cache". The only care is "what to cache".
Actually, maybe it's better instead of 'caching' name it 'compilation' for better comprehension.
Yes, I completely agree that current codebase is very complex and hard to understand.
After deeper testing with X-Debug profile I even became more confident that such approach potentially fixes most unnecessary execution overhead. Even if we take into account frequent cache pool calls for some not performant cache adapters.
"everything pre-resolved", what exactly? As I see it, in combination with Swoole proposed idea of logic separation gives double profit. P.S I suspect, maybe I didn't explain clearly enough the proposal. |
@oprypkhantc |
It's not that simple, unless you want to kill local development experience.
I don't think so. The thing with compilation (if you wish to call it that) is you still have to take care of it manually, so the whole codebase would be bloated with it. When doing the same sort of caching in memory only (i.e. as long running servers allow you to), you only need build the correct architecture to reuse as many object instances as possible. That's easier to achieve and certainly is much easier to understand.
Well, quite literally everything. Vendor auto-loaded, caches warmed up, some services pre-resolved, all the configuration applied. On the GraphQL side, the whole schema AST generated & sitting in memory, controllers found and resolved through DI with dependent services, types also found and fields resolved (with parsed annotations, attributes, docblocks, types), type mapper resolved every single type found in properties & parameters & return types, all field & parameter middleware called.
That's the point - it is not resolved during Basically everything that can be taken care of ahead-of-time - already is in
No, it actually doesn't bring any benefits at all aside from majorly complicating things. It might even make it slower if Swoole isn't taken into account.
We just now started integrating GraphQL and only have migrated 10 so far. The rest will take years. I haven't actually tested it, but the amount of controllers/types/fields shouldn't matter. As long as you "warm" your Swoole worker's Schema instance by resolving all types & fields ahead of time, actually getting & calling an already resolved controller should take the same amount of time regardless of their amount. I suspect you can warm the whole Schema easily by running an introspection (not real code): // done before any requests
$schema = new Schema();
$schema->runIntrospection(); // resolves all types & fields
// request comes in
$schema->resolveType(AController::class) // instantly returns the resolved type
->getField('someField') // again instantly returns the field
->resolve($payload); // does validation, then deserializes $payload into a DTO and call `someField`
// another request comes in, same schema instance, same type instance, same fields instances
$schema->resolveType(AController::class)
->getField('someField')
->resolve($payload); // again instantly |
@oprypkhantc you make some good points and I'm certainly not in disagreement that, from a performance perspective, async servers are going to offer the best performance. I don't think this is really up for debate. And, I think with some of the PRs you've submitted, and have been merged, the lib is well on it's way to accomplishing those goals. Let's keep them coming! Also, as an aside, if you'd do us a solid in writing a doc section for using an async server with GraphQLite, that'd be awesome! I'm sure others would find this very helpful going forward. That said, I don't think it's practical to expect everyone to migrate their entire stack, or even just their GraphQL portion of their stack, to an async server right away. There are many reasons why this might not be possible, and it's still not a norm in PHP land. Therefore, I think trying to cache what we can, now, especially from a schema compilation standpoint, is a worthwhile endeavor. The performance hits this might have from cache hits, can be minimized and should be a consideration and objective with any improvements to caching. @rusted-love as opposed to trying to preserve the runtime field resolving made available by only a few annotations, why don't we consider throwing exceptions within those annotations if schema level caching is enabled - maybe even look at deprecation of these annotations in favor of separate schemas. See #562 for more discussion on this topic. I think we can achieve both objectives here. However, I do agree with @oprypkhantc that, within a dev environment, making changes to the schema and having to clear cache within workflows is certainly not desirable. But, even with an async server, you're still going to have to recompile the schema after any changes. So, while you may not have to explicitly clear the cache, you still have longer compilation times during development efforts and have to "reboot" the server or explicitly "recompile" the schemas. IMHO, having to clear the cache/recompile schemas during dev efforts isn't that big of a deal. We already have to clear-cache for a number of different portions of our stack during development. That would just be a caveat you'd have to deal with for the added performance benefits - again totally optional. |
As a quick test I tried to throw the schema into a PSR16 cache, but of course that's not possible with closures. I did find this library to support closure serialization though: https://github.com/opis/closure. However, I'm guessing these would have to be wrapped in webonyx, in addition to graphqlite. Created an issue regarding recursively wrapping closures within an object: opis/closure#131 |
@oojacoboo The issue with serialization is that closures rely on local variables that often are objects with same type of closures or have circular dependencies. That way closures can become a static methods As a side note, I have tried for a 2 days to make deep serializations of Maybe, we need to replace
Do I understand it right way? You propose to disable support for |
@rusted-love good to know that you've already gone down this path. I guess we were thinking along the same lines - hoping for an easier solution.
Based on our discussions in #562 and my understanding of these annotations (we don't currently use them):
The argument was made by @Lappihuan, which I agree with; that changing the actual schema dynamically, is really the wrong approach and in most cases. You're better off with an authorized and unauthorized schema, or multiple schemas per role type. This is something that we could add some support for as well - making it easier for multiple schemas to be compiled based on a specific "role" value. You can then deliver the appropriate schema based on your Any annotation that makes modifications to an otherwise static schema, is problematic. I believe that's
I'm suggesting that these annotations throw an @oprypkhantc also mentioned that Again, I think this is all doable but multiple schemas is the most logical and fool proof design IMO. |
For supporting multiple schemas, I was thinking we could add an additional argument,
Then, when compiling the schema, we'd simply check this |
@oojacoboo Going back to serialization problem. Take a look at this class. graphqlite/src/Types/UnionType.php Line 18 in 51a3011
Its initialised here graphqlite/src/Mappers/Root/CompoundTypeMapper.php Lines 151 to 155 in 51a3011
Type resolver function relies on RecursiveTypeMapperInterface that is initialised in SchemaFactory. graphqlite/src/Types/UnionType.php Lines 39 to 51 in 51a3011
As specified in documentation we can access context in <?php
declare(strict_types=1);
namespace TheCodingMachine\GraphQLite\Types;
use GraphQL\Type\Definition\NamedType;
use GraphQL\Type\Definition\ObjectType;
use InvalidArgumentException;
use TheCodingMachine\GraphQLite\Mappers\RecursiveTypeMapperInterface;
use TheCodingMachine\GraphQLite\NamingStrategyInterface;
use function array_map;
use function assert;
use function gettype;
use function is_object;
class UnionType extends \GraphQL\Type\Definition\UnionType
{
/** @param array<int,ObjectType&NamedType> $types */
public function __construct(
array $types,
RecursiveTypeMapperInterface $typeMapper,
NamingStrategyInterface $namingStrategy,
)
{
// Make sure all types are object types
foreach ($types as $type) {
if (!$type instanceof ObjectType) {
throw InvalidTypesInUnionException::notObjectType();
}
}
$typeNames = array_map(static fn(ObjectType $type) => $type->name(), $types);
$name = $namingStrategy->getUnionTypeName($typeNames);
parent::__construct([
'name' => $name,
'types' => $types,
'resolveType' => [self::class, 'resolveTypeNonClosure']
]);
}
public static function resolveTypeNonClosure(mixed $value,mixed $context): ObjectType
{
if (!is_object($value)) {
throw new InvalidArgumentException('Expected object for resolveType. Got: "' . gettype($value) . '"');
}
$className = $value::class;
//Somewhere in root we set our context with any desired format
//yes, currently there is no public getRecursiveTypeMapper, but we can add it
$typeMapper = $context['schemaFactory']->getRecursiveTypeMapper();
$result = $typeMapper->mapClassToInterfaceOrType($className, null);
assert($result instanceof ObjectType);
return $result;
}
} Yes, maybe somewhere in middleware or other place resolver is modified, but I believe it has a solution too. |
Sounds great, but there will be much work to implement this in current state. #[Field(schema:'public',outputType:"Foo"]
#[Field(schema:'admin',outputType:"AdminOnlyFoo | Foo")]
public function getFoo(){
} |
So, the recursiveTypeMapper would then be responsible for determining the cached status? That means we'd have a cache hit as every type is evaluated?
I think maybe an array of |
What do you mean? $context['schemaFactory']->getRecursiveTypeMapper(); is replacement for use ($typeMapper): ObjectType Logic of that class didn't change at all. We just made graphql type serializable. Serializable output of Second step is to make |
@rusted-love I see what you mean now. That could work. Obviously we're talking about a lot of cache hits, but that's going to be significantly faster than recompiling these types. I think it's worth giving it a shot - seems doable without too much complexity. You want to put together a PR on this? |
Sure! I will have time for this. Hope there is no underwater rocks. @oojacoboo Also, currently we use Also, I was thinking about some kind of "all types" cache. So that we will have only one cache hit. |
I think it's worth adding: we're not using async yet, and Swoole doesn't set you a requirement on that. It does allow async through coroutines, but Laravel currently doesn't support it. Requests are handled concurrently through multiple workers, but each worker only handles one request at a time and doesn't share any state with the rest. Any state is manually cleaned up after each request, there's no magic. In our case the PR to migrate to Swoole was only ~70 files (incl. tests & infrastructure code) with mostly fixes and small refactors to avoid stateful singletons.
Correct, these are now working properly with a shared Schema instance, no longer checking authorization during middleware processing.
No, those aren't an issue at all as long as middlewares only run once when a field is needed. |
How do you pass it through for controller objects for To avoid that, you'd also need to refactor other parts of the |
Currently as a workaround for passing the controller object for immutability I added a I don't like this at all and would much prefer the QueryField to only have the name of the dependency (either class name or an id that you can pull from the container). |
@rusted-love the
I thought that was fixed in #532. @oprypkhantc I believe this will only cache the types and not the operation fields. But, the types and compiling those are a large portion of the cycles. As I understand it, passing through won't be an issue since it'll still use the same schema compiling, it will just check to see if the type has already been cached and use it instead of recompiling. |
@oprypkhantc |
@oojacoboo It just throws an |
Well even for regular types
Only if used with |
Maybe it's better to force all users pass context in root. This will require documentation changes. $result = GraphQL::executeQuery($schema, $query, null, new Context(), $variableValues); For those who register some custom $context = new Context(parentContext: new MyCustomContextObject());
$result = GraphQL::executeQuery($schema, $query, null, $context, $variableValues);
//So the usage is for those people is
'resolve'=>function (mixed $value,Context $context) {
$context->getParentContext()->....
} I was looking about some alternatives to inject it in middleware's, but this is bad idea, because if user passed his custom context we cannot modify it with our stuff. @oojacoboo Please, approve that small breaking change. |
@rusted-love this is only for executing queries “manually” using webonyx, right? Or are you talking about having to pass If only for “manual” or “custom” execution, I think this is a reasonable breaking change. |
@oojacoboo No, passing context will be required for everyone. Updated minimal example will look like following. <?php
use GraphQL\GraphQL;
use GraphQL\Type\Schema;
use TheCodingMachine\GraphQLite\SchemaFactory;
use TheCodingMachine\GraphQLite\Context\Context;
// $cache is a PSR-16 compatible cache.
// $container is a PSR-11 compatible container.
$factory = new SchemaFactory($cache, $container);
$factory->addControllerNamespace('App\\Controllers\\')
->addTypeNamespace('App\\');
$schema = $factory->createSchema();
$rawInput = file_get_contents('php://input');
$input = json_decode($rawInput, true);
$query = $input['query'];
$variableValues = isset($input['variables']) ? $input['variables'] : null;
$context = $factory->getContext(); //Context is required to make graphqlite work
$result = GraphQL::executeQuery($schema, $query, null, $context, $variableValues);
$output = $result->toArray();
header('Content-Type: application/json');
echo json_encode($output); |
looking at webonyx/graphql-php#104 caching the schema will probably be never supported by graphql-php since they are more focused on schema first. perhaps the @rusted-love if this change is made with the context, what can be achieved with it? |
I was wondering if there might be a way to use an AST as a compilation source for caching and re-initializing webonyx somehow. I don't think webonyx supports that currently, but an AST seems like a fairly ideal caching source. However, I assume some of the mapping might be problematic from an AST without additional mapping output. |
@oojacoboo You mean this? |
@Lappihuan yes - exactly that actually. Has anyone tried it? |
So, @frodeborli has released a library that may resolve the closure serialization issue: opis/closure#131 (comment) I've tried testing it, but ran into some issues - created a ticket: frodeborli/serializor#3 |
I widely use GraphQLite for admin console in my Symphony project. I really enjoy it!
The only reason I don't use it for public API is overhead in execution speed.
For example such simple query
Such simple query have overhead of ~100-140ms comparing to simple RESTful endpoint with the same logic.
Here is some screenshots of XDebug profiler output
I would like to discuss about ideas how codebase could be improved in direction of execution speed.
For example, if we look deeper into
AggregateControllerQueryProvider->getQueries()
we would see such picture.Method
mapReturnType
takes 9% of the execution time for the entire script.Inside
mapReturnType
we have many nestedtoGraphQLOutputType()
called by type mappers. Almost 5k following the screenshot.As I know primary work for type mappers is converting docblocks/attibutes Type to GraphQL type.
So, is there a way we could move such logic to the compile time? Or, maybe some kind of generated code that is stored in cache like in overblog/GraphQLBundle
Im pretty sure there are many things that could be optimised. But seems it cannot be done without breaking changes.
The text was updated successfully, but these errors were encountered: