-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/scale-out: support for proxying GQL queries #2588
base: main
Are you sure you want to change the base?
Conversation
There's still quite a bit of work to be done on this change. Sharing an early draft for review on the approach and handling. I'll address the non-TODO style comments in the next commit. |
@@ -13,6 +11,7 @@ import ( | |||
"zotregistry.dev/zot/pkg/api/constants" | |||
"zotregistry.dev/zot/pkg/cluster" | |||
"zotregistry.dev/zot/pkg/common" | |||
"zotregistry.dev/zot/pkg/proxy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this split between pkg/api/proxy.go and pkg/proxy/proxy.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During development, there was a circular import for api and the new gql proxy in the extensions. This led me to break up the generic proxy logic into its own package and call it from the api package as well as from the gql proxy package.
That said, I do agree that the file naming could potentially be better.
StarCount int `json:"starCount"` | ||
DownloadCount int `json:"downloadCount"` | ||
NewestImage ImageSummary `json:"newestImage"` | ||
Name string `json:"Name"` //nolint:tagliatelle // graphQL schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the GQL server was handling serialization of the data for responding to the client. Now, since the proxy handler is responding on behalf of the GQL server (after proxying), the standard Golang serialization takes place.
GQL starts keys with an uppercase, but the standard struct annotations don't encourage the same - hence the nolint
.
Example:
{
"errors": [
{
"message": "unable to run vulnerability scan on tag v0.0.19.231225-squashfs in repo machine/bootkit/rootfs: error: image 'machine/bootkit/rootfs@sha256:9efb1b9dd349e3fc5fa2dd658354a395a28ac7e391c85a72b39384da7a5ec7a1' scanning is not supported for given image media type",
"path": [
"GlobalSearch"
]
},
{
"message": "unable to run vulnerability scan in repo machine/bootkit/rootfs: manifest digest: sha256:9efb1b9dd349e3fc5fa2dd658354a395a28ac7e391c85a72b39384da7a5ec7a1, error: image 'machine/bootkit/rootfs@sha256:9efb1b9dd349e3fc5fa2dd658354a395a28ac7e391c85a72b39384da7a5ec7a1' scanning is not supported for given image media type",
"path": [
"GlobalSearch"
]
}
],
"data": {
"GlobalSearch": {
"Page": {
"TotalCount": 37,
"ItemCount": 3
},
"Repos": [
{
"Name": "machine/bootkit/rootfs",
"LastUpdated": "2023-12-25T15:31:59.110429376Z",
"Size": "319472431",
"Platforms": [
{
"Os": "linux",
"Arch": "amd64"
}
],
"IsStarred": false,
"IsBookmarked": false,
"NewestImage": {
"Tag": "v0.0.19.231225-squashfs",
"Vulnerabilities": {
"MaxSeverity": "",
"Count": 0
},
"Description": "A minimal bootable root filesystem",
"IsSigned": false,
"SignatureInfo": [],
"Licenses": "GPLv2 and others",
"Vendor": "project-machine",
"Labels": ""
},
"StarCount": 0,
"DownloadCount": 5285
},
{
"Name": "c3/ubuntu/base-amd64",
"LastUpdated": "2024-03-01T00:46:16.186838886Z",
"Size": "273201849",
"Platforms": [
{
"Os": "linux",
"Arch": "amd64"
}
],
"IsStarred": false,
"IsBookmarked": false,
"NewestImage": {
"Tag": "jammy",
"Vulnerabilities": {
"MaxSeverity": "",
"Count": 0
},
"Description": "base is a minimal glibc-based Linux system",
"IsSigned": false,
"SignatureInfo": [],
"Licenses": "GPL-2.0-or-later",
"Vendor": "Cisco Systems, Inc.",
"Labels": ""
},
"StarCount": 0,
"DownloadCount": 260
},
{
"Name": "tools/busybox",
"LastUpdated": "2022-10-04T18:22:45.289257759Z",
"Size": "773920",
"Platforms": [
{
"Os": "linux",
"Arch": "amd64"
}
],
"IsStarred": false,
"IsBookmarked": false,
"NewestImage": {
"Tag": "1.34.1",
"Vulnerabilities": {
"MaxSeverity": "",
"Count": 0
},
"Description": "",
"IsSigned": false,
"SignatureInfo": [],
"Licenses": "",
"Vendor": "",
"Labels": ""
},
"StarCount": 0,
"DownloadCount": 112
}
]
}
}
}
} | ||
|
||
proxyBody, err := io.ReadAll(proxyResponse.Body) | ||
proxyResponse.Body.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close it right here? or defer this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is completely read after ReadAll
, we may not need to keep it open till the end of the function execution. We can close it right away.
} | ||
|
||
// aggregate errors | ||
collatedResult.Errors = append(collatedResult.Errors, proxyRespData.Errors...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we now have a situation where we may a good result mixed with errors. This could be problematic/confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, though - it may not be all response errors. For example, here's a snippet from zothub:
{
"errors": [
{
"message": "unable to run vulnerability scan on tag v0.0.19.231225-squashfs in repo machine/bootkit/rootfs: error: image 'machine/bootkit/rootfs@sha256:9efb1b9dd349e3fc5fa2dd658354a395a28ac7e391c85a72b39384da7a5ec7a1' scanning is not supported for given image media type",
"path": [
"GlobalSearch"
]
},
{
"message": "unable to run vulnerability scan in repo machine/bootkit/rootfs: manifest digest: sha256:9efb1b9dd349e3fc5fa2dd658354a395a28ac7e391c85a72b39384da7a5ec7a1, error: image 'machine/bootkit/rootfs@sha256:9efb1b9dd349e3fc5fa2dd658354a395a28ac7e391c85a72b39384da7a5ec7a1' scanning is not supported for given image media type",
"path": [
"GlobalSearch"
]
}
],
"data": {
"GlobalSearch": {
"Page": {
"TotalCount": 37,
"ItemCount": 3
},
"Repos": [
{
"Name": "machine/bootkit/rootfs",
"LastUpdated": "2023-12-25T15:31:59.110429376Z",
"Size": "319472431",
"Platforms": [
{
"Os": "linux",
"Arch": "amd64"
}
],
"IsStarred": false,
"IsBookmarked": false,
"NewestImage": {
"Tag": "v0.0.19.231225-squashfs",
"Vulnerabilities": {
"MaxSeverity": "",
"Count": 0
},
"Description": "A minimal bootable root filesystem",
"IsSigned": false,
"SignatureInfo": [],
"Licenses": "GPLv2 and others",
"Vendor": "project-machine",
"Labels": ""
},
"StarCount": 0,
"DownloadCount": 5285
},
The data is valid, but there are some errors.
Overall the patch is not super-complicated. |
@@ -214,14 +214,14 @@ type GlobalSearchResultResp struct { | |||
} | |||
|
|||
type GlobalSearchResult struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't these property renames mean we break backwards compatibility with older zots? Do we care about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I gathered, the GlobalSearch and other such payloads that are part of the GQL schema are handled entirely by the GQL server.
Ideally, this implementation should send the same payload as the GQL server except aggregated.
Not sure if I fully got your question though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean this code did not produce the same payload as the GQL server before the change?
In this case we should definitely fix this issue.
Update on this: However, the resolver has a signature like the following:
This appears to be called by the internal resolver from the GQL server where we don't have any control of the behaviour. Next items:
|
Started a discussion on the gqlgen repo: 99designs/gqlgen#3295 While looking through their Discord, I came across the following gqlparser library that gqlgen uses internally. I'll take a look and evaluate this for parsing the query and formatting the output. |
Previously, only dist-spec APIs were supported for scale-out as in a shared storage environment, the metadata was shared and any instance could correctly respond to the GQL queries as all the data is available. In a local scale-out cluster deployment, the metadata store, in addition to the file storage is isolated to each member in the cluster. Due to this, there is a need to proxy the GQL queries as well for UI and client requests to work as expected. This change introduces a new GQL proxy + a handler for the GlobalSearch request that proxies the request to all the members and collects them for response to the client. Support for other GQL queries is pending. Signed-off-by: Vishwas Rajashekar <[email protected]>
e723bf8
to
c4e11de
Compare
Signed-off-by: Vishwas Rajashekar <[email protected]>
c4e11de
to
bcf52c8
Compare
What was left partially handled was that the spec appears to allow for multiple GQL operations in a single request payload, but zot appears to only use one request at a time. TODO: need to verify this. If this is only one request at a time, then it will allow the current GQL proxy logic to function, otherwise, we may need to re-think the deserialization and aggregation part. I'm currently looking into making the output available exactly as the GQL spec would. |
What type of PR is this?
feature
Which issue does this PR fix:
Towards #2434
What does this PR do / Why do we need it:
Previously, only dist-spec APIs were supported for scale-out as in a shared storage environment, the metadata was shared and any instance could correctly respond to the GQL queries as all the data is available.
In a local scale-out cluster deployment, the metadata store, in addition to the file storage is isolated to each member in the cluster. Due to this, there is a need to proxy the GQL queries as well for UI and client requests to work as expected.
This change introduces a new GQL proxy + a handler for the GlobalSearch request that proxies the request to all the members and collects them for response to the client.
Support for other GQL queries is pending.
Testing done on this change:
Unit Tests added.
Will this break upgrades or downgrades?
No, there shouldn't be any impact to upgrades and downgrades.
Does this PR introduce any user-facing change?:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.