Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEP-1713: Standard Mechanism to Merge Multiple Gateways #1863
GEP-1713: Standard Mechanism to Merge Multiple Gateways #1863
Changes from 24 commits
056c52f
b0da061
3f50cea
09b29d1
f4f3928
4c5b38b
cbfea8c
832fa8f
492eff7
bc245cd
951081e
aa18975
ed2c90b
ba5a47d
e2331f0
e914896
b3f7113
e094dff
dcdd142
668af46
ec9ce63
d3ca6b1
5d6853d
aa2d649
df3b0dc
6e4e7ab
23f5e15
969c5a1
4e05526
a979bca
98eca50
0de79b6
9add2e6
01cb1fd
df249a5
43074ed
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the primary motivation here is to enable use cases that would rely on Listener generation. If that's correct, maybe there's a simpler solution than Gateway merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Care to provide some alternative solutions? I'm going to close this out end of week otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree there are lots of other reasons we'd want to specify gateway merging - the main one would be having fewer resources to manage. If multiple listeners shared an endpoint, may as well merge them to reduce traffic as well. If this proposal isn't really motivated by making things more efficient for those reasons, maybe gateway merging isn't the right solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started a doc for brainstorming here: https://docs.google.com/document/d/1qj7Xog2t2fWRuzOeTsWkabUaVeOF7_2t_7appe8EXwA/edit. I've added this proposal + a couple alternatives there, interested in any additional ideas people have. (They all include some form of Gateway merging for the record).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels logically safe because it gates the removal of the existing restriction on explicitly setting a new optional field, so my understanding is that this wouldn't risk breaking older implementations (a concern previously cited in #1817 (comment) and #1596 (comment)).
@robscott @youngnick thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely support of this but its worth pointing out that this does not represent a guarantee that the merged set produces MinItems=1 since there may be no children. Thus we cannot completely state that runtime merging is the exact equivalent of a single merged resource.
Given that runtime handling of empty lists by controllers was given as a reason for rejecting
#1596
#1817
we may have to reconsider the logic of that decision given the desire to support merging. E.g.
"if controllers want to support merging as per the spec they must tolerate empty listeners even if merging is not used"
It seems like we are making a similar constraint loosening decision around certs for TLS listeners
#2721
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @louiscryan here. I think it's important to note this tradeoff in the GEP. Even implementations that didn't support Gateway merging would still need to be aware of this potential configuration and ensure that it didn't break them. If we move forward with this approach, we'll need to take extra efforts to communicate the disruptive nature of this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I may toss in an opinion here.. I've been watching this proposal because I like the use case of service-based teams and trusting them to manage any needed domains. It would be ideal if these teams didn't have to disrupt core infrastructure resources (e.g. a top-level 80/443 gateway) before deploying their domain.
Could this be more specific about the way
allowedChildren
should be used? Could it specify an intent to support regex and/or broader matching, e.g. all gateways in a namespace.?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cornfeedhobo I'd expect
allowedChildren
to look very similar to AllowedRoutes, and specifically thenamespaces
field type RouteNamespaces, which yes would support matching all gateways in one or more namespaces (Same
namespace,All
namespaces, or by using a label selector match).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that just occurred to me when rereading this is that changing
minItems
to 0 for Listeners means that we are moving Listeners from required to optional, which is a breaking API change (because fields that you used to be able to assume would always have at least one entry now do not). Strictly, that would mean we'd have to do an API revision to v2, which does not seem likely.Realistically, it may not be too much of a problem for Go implementations, if they're doing a
range
acrossspec.listeners
, since that will execute the code 0 times if the list is empty. But it is a substantial UX change to be able to add Gateways with no Listeners. It's very likely that code will break as a result.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this could be solved conditionally for example we allow
minItems == 0
ifspec.infrastructure.allowedChildren
is not nil which implies the Gateway will be a parent. @robscott is that something that is possible with CEL?This doesn't break existing users or implementations. I'd say this is more analogous to supporting a new feature in a newer version of the API.
When an implementation adds 1.2 support they can reject Gateways with no listeners if they do not support gateway merging etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CEL was the intent here and how we implement "If, and only if, they have done so..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not keen on enforcing the parent/child relationship for this. It might not be a bad idea to consider a policy or other CRD to record the members of a group that can merge with each other. Then you are free to design a solution without the restrictions of breaking API changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside of the generated listeners use case, it seems like a primary use case here would be to delegate domains to different namespaces. Unfortunately this doesn't provide any kind of controls for that so it seems like the first namespace to claim a domain would win. I'm concerned that this would become rather chaotic and confusing in practice. A key idea of Gateway listeners was that they allowed you to delegate domains to routes in different namespaces, this seems to make that distinction less clear.
Are there any controls/guardrails we can offer here? Is cross-namespace delegation a key goal? If not, could/should we start without it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, given the following things, which seem on the face of it to be reasonable asks:
There's no way around the fact that conflicts are not predictable by looking at any single namespace plus the parent Gateway. There's no way to know if someone in a sibling Gateway in another namespace is already using the same port/protocol/hostname/TLS config you are, which will put those Listeners into conflict.
By the current rules, Listeners that are conflicting are both rejected, because they must necessarily be created at the same time (since they are in the same object). Assuming that we use our usual conflict-resolution rules, then, as @robscott says above, the first Gateway to claim the any distinct Listener combination will be the winner forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really - it's really more about splitting up the listener list across resources.
This doesn't change that in my opinion.
@howardjohn @mikemorris you have comments here regarding the cross namespace need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really any different than what we have now. eg. if Gateway's aren't visible to Application Developers then they wouldn't know. I agree with your point it becomes more difficult when the resource is split up.
The GEP tweaks the rules such that child Gateways can be rejected and it doesn't affect siblings. So like you said conflict resolution rules apply.
Yup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's two aspects to this:
allowedChildren
up in GEP-1713: Standard Mechanism to Merge Multiple Gateways #1863 (comment), but we should probably consider this directly - it may add more complexity than value.allowedRoutes
config on the parent Gateway apply here, such as if the parent specifiedallowedRoutes
with a selector for only a different namespace? Should we specifyallowedRoutes
as scoped to only directly attached routes?https://docs.google.com/document/d/1-0mgRRAY784OgGQ1_LCOshpLLbeAtIr4eXd0YVYK4RY/edit has some context on cross-namespace delegation being an explicit goal for the prior route delegation proposal, but I'm not sure if there's a specific need for it with Gateway merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for at least some of the use cases, cross-namespace attachment seems important - which implies that we should have a two-way handshake for safety. Which is difficult with the current object. Sigh. Another reason that @robscott and I have been talking about a new object type instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you enforce this? In the event of conflicts, which one wins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the question - what conflict are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required or can the child simply omit. Omission is likely better if the root gateway is infrastructure that is being managed centrally. The children don't actually need to know or care about the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would great to not require it.
I'm not familiar if it's possible with CEL to make a field optional if some other property is set (eg. infrastructure.attachTo`
@robscott do you know - I'd consider you the CEL expert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thing to note is that getting the gateway would show an empty class
gateway-api/apis/v1/gateway_types.go
Line 27 in df978ef
Unless we made a change so that it appears in the status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do it, you just need to attach the rule to some common root (here that would be
spec
), rather than ongatewayClassName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make sense that the class needs to be managed centrally (by the parent Gateway administrator), but
spec.gatewayClassName
is a required field on Gateway and while the field doesn't appear to actually be immutable, it's recommended that the specified GatewayClass be used as a template when creating a Gateway, so while it's a bit of extra boilerplate I don't expect it's something that should ever be changing dynamically.What we may want to add is a clarification that about the GatewayClass "template snapshotting" recommendation - this should likely be tied to the parent Gateway rather than attempting to reconcile potentially drifting GatewayClass configuration snapshots from multiple child Gateways created at different points in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I get the templating recommendation in principle I think it presents a toil problem for users when you attempt to apply it to the specification of Gateway.gatewayClassName.
Since the compositional model for routes & policies is back-reference the toil fanout of forcing Gateway immutability is high on users. In this situation any class transition (infrastructure, version etc.) would force a reference update on every inbound policy since the user would have to create a new Gateway resource.
Even immutability is somewhat of an illusion since the recommendation can't prevent the deletion & recreation of the resource with the same name and a different value. The user is simply left with the side-effects of their workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, I would be in favor of allowing an empty
gatewayClassName
contingent onattachTo
being configured if that's permissible - I just think it's a nicety rather than a necessity because changing agatewayClassName
will often be a destructive action and as such not a significantly different, safer, or less downtime operation than deleting and recreating all child and parent resources.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again,
gatewayClassName
is a required field, and changing it to optional is a breaking API change.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can remove
gatewayClassName
because if we try to attach to a non-existent parent then who would be responsible to update the status of the Gateway saying invalid references etc?I'm going to resolve this thread end of week unless someone has an idea on solving the situation I just mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes a lot of sense to me, hadn't considered that situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other scenario I think we need to work through here:
parent
,child-1
,child-2
, andchild-3
.child-2
andchild-3
have one listener that conflicts.child-2
was created first.What happens?
How do the owners of each child know what's happened? (Preferably without looking at the parent's status)
What actions do the owners of
child-2
andchild-3
need to take to fix this? Can they know that which other sibling Gateway has caused the problem? Ifchild-2
wins precedence because of age, how do they know that something tried to claim their config and failed?Note that in this case, if
child-2
ever deletes and recreates their Gateway, they will now be in the same position as the owner ofchild-3
.How does the owner of the parent know if any children are bad, and is it possible for them to know which ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
child-3
would be rejected andchild-2
wouldn't change. I'll add a condition about said conflict - I thought this was covered elsewhere.The status on the child could state an appropriate error message.
Gateway owners (cluster operator) would have to address this by tweaking listeners.
I think it's worth the error message saying there's a conflict and providing the more detailed message about siblings on the parent's status message.
Do you have a use case why child-2 would want to know this information?
Yup
It's mentioned that
ListenersNotValid
will appear on the parent's status. A more detailed message can be presented.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the scenario here 0de79b6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to consider or rule out an alternative that is defined in a policy that could be attached to the parent Gateway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate what you mean? I don't understand.