Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft of changes to simplify the recording mechanism #2005

Closed
wants to merge 1 commit into from

Conversation

shawkins
Copy link
Collaborator

@shawkins shawkins commented Aug 8, 2023

cc @csviri @metacosm It seems like we can simply some of the mechanics discussed on fabric8io/kubernetes-client#4077 what are your thoughts on the following

  • Filtering your own creates / updates:

Add an annotation to the resource

https://github.com/operator-framework/java-operator-sdk/compare/main...shawkins:eventing?expand=1#diff-aa20588ab4b1ff4f171a897d4042d1b055c02737b067d09aaeb9cfd770adf3a0R132

This is checked later - if it matches, then we skip the event. If another actor modifies the resource and leaves this annotation alone or removes it, then we'll know it's an event that this KubernetesDependentResource was not responsible for.

https://github.com/operator-framework/java-operator-sdk/compare/main...shawkins:eventing?expand=1#diff-5947c6b7b60c804e59df2f5cfe62719dc51d3cc50b5e6cb1ddd1bc41323d3b77R162

Pros: Removes complexity associated with event recording.

Cons: Could be seen as "polluting" managed resources. Someone could also be "adversaral" and update this annotation for their own create / updates to make the operator ignore them.

Alternative: Don't worry about doing this - just allow the temporary cache check and the desired matcher minimize the api server calls. If the event comes before you get a chance to update the temporary cache, then you'll just end up processing it.

  • Update the temporary cache based solely upon the resourceVersion

https://github.com/operator-framework/java-operator-sdk/compare/main...shawkins:eventing?expand=1#diff-5947c6b7b60c804e59df2f5cfe62719dc51d3cc50b5e6cb1ddd1bc41323d3b77R267

Pros: Again removes the complexity associated with event recording.

Cons: The logic may move further ahead by basing a change on what is in the temporary cache - without first receiving the event. Once the original event is recieved, that will move the cache state backwards (although that same problem exists currently I believe).

Alternative: Parse the resourceVersion to determine the actual latest state - but this may break at any time if the api server either obfuscates or moves away from the etcd incrementing resourceVersions.

  • All operations are locked

https://github.com/operator-framework/java-operator-sdk/compare/main...shawkins:eventing?expand=1#diff-aa20588ab4b1ff4f171a897d4042d1b055c02737b067d09aaeb9cfd770adf3a0R131

I changed the non-SSA case to be locked, but I'm guessing it was probably unlocked for a reason.

Cons: If the Watch lags, or you mishandle the temporary cache, and some other actor makes change to the resource, you will not be able to reconcile that resource until the Watch is re-established. Also for SSA and a reconstructive controller it mostly shouldn't matter if you know the current state, you just need to assert your desired state.

@csviri
Copy link
Collaborator

csviri commented Aug 8, 2023

This is checked later - if it matches, then we skip the event. If another actor modifies the resource and leaves this annotation alone or removes it, then we'll know it's an event that this KubernetesDependentResource was not responsible for.

This makes sense, seems radically simplifies the solution. However there might be some edge cases when this might not work. For example if an event received from previous update later (this probably a extremely rare case) but resource was already used from the cache during the update. (Although this is probably an issue also now).

Regarding polluting of the resource tbh I would not worry to much, this is a higher level abstraction, such an inconvenience I think is acceptable, at least I don't see any issues why it would not be. But, yes the current solution is nicer in this regard.

Alternative: Don't worry about doing this - just allow the temporary cache check and the desired matcher minimize the api server calls. If the event comes before you get a chance to update the temporary cache, then you'll just end up processing it.

I think this is a valid approach too, although once I measured this, and roughly in 50% cases I actually received the event before the resource ended up in cache. What was very surprising for me. Might worth to just repeat the experiment and see if this is the case also for DR (I was doing it for primary resources). If this works in most of the cases, we could just do it this way.

Cons: The logic may move further ahead by basing a change on what is in the temporary cache - without first receiving the event. Once the original event is recieved, that will move the cache state backwards (although that same problem exists currently I believe).

This is not the case, there is now guaranteed "monotonity". But for that need the optimistic locking. Actually if case of event recording we might not need that, might be possible without (for updates), but something have to think about.

I changed the non-SSA case to be locked, but I'm guessing it was probably unlocked for a reason.

I don't get this, it is locked now.

@csviri
Copy link
Collaborator

csviri commented Aug 8, 2023

Alternative: Parse the resourceVersion to determine the actual latest state - but this may break at any time if the api server either obfuscates or moves away from the etcd incrementing resourceVersions.

The first implementations were doing this, not sure about solution like k3s and others if still use resourceVersion as a number in the backgroun, rather would keep to the contract.

@csviri
Copy link
Collaborator

csviri commented Aug 8, 2023

But overall, I think we can simplify this, on the other hand it is working now. All the other solutions are simpler, but makes compromises that the event recording does not. And it's implemented already. Soo 🤷

@metacosm
Copy link
Collaborator

metacosm commented Aug 8, 2023

But overall, I think we can simplify this, on the other hand it is working now. All the other solutions are simpler, but makes compromises that the event recording does not. And it's implemented already. Soo 🤷

While I agree on the general approach that if it's not broken, we shouldn't fix it, I also think there's value, moving forward, to simplify things if we can as this will make it easier for people to contribute and maintain the code.

@shawkins
Copy link
Collaborator Author

shawkins commented Aug 8, 2023

This is not the case, there is now guaranteed "monotonity". But for that need the optimistic locking. Actually if case of event recording we might not need that, might be possible without (for updates), but something have to think about.

That's not what I'm saying. If you have the sequence:

  • update temporary cache to v1

  • some other event triggers reconcilitation, you work off of v1 without yet seeing the v1 event and create version v2 - arguably this may be indicative of problem with the matching logic, but as we have seen this does now happen with server side apply

  • now you get the event for v1 - you will remove the temporary cache entry and the actual cache will be at v1.

until you get the event for v2, you are operating off of a stale version.

I don't get this, it is locked now.

It is not -

is calling replace, not update. Is there an invocation of lockResourceVersion that I'm missing?

@csviri
Copy link
Collaborator

csviri commented Aug 8, 2023

is calling replace, not update. Is there an invocation of lockResourceVersion that I'm missing?

will create a PR to replace this to update

@csviri
Copy link
Collaborator

csviri commented Aug 8, 2023

But overall, I think we can simplify this, on the other hand it is working now. All the other solutions are simpler, but makes compromises that the event recording does not. And it's implemented already. Soo shrug

While I agree on the general approach that if it's not broken, we shouldn't fix it, I also think there's value, moving forward, to simplify things if we can as this will make it easier for people to contribute and maintain the code.

I actually don't like that proverb :) I mean that we do some other compromises.

@csviri
Copy link
Collaborator

csviri commented Aug 8, 2023

That's not what I'm saying. If you have the sequence:

  • update temporary cache to v1
  • some other event triggers reconcilitation, you work off of v1 without yet seeing the v1 event and create version v2 - arguably this may be indicative of problem with the matching logic, but as we have seen this does now happen with server side apply
  • now you get the event for v1 - you will remove the temporary cache entry and the actual cache will be at v1.

until you get the event for v2, you are operating off of a stale version.

yes, this is true, we could avoid this just by parsing the resource version.

@csviri
Copy link
Collaborator

csviri commented Aug 8, 2023

Maybe we can discuss this on next community / fabric8 meeting? regarding what approach to take?

@shawkins
Copy link
Collaborator Author

shawkins commented Aug 8, 2023

Made a couple of changes:

  • aligned with fix: use update instead of replace in DR #2006 (it still makes sense to me though to just move up the call that is setting the resourceVersion for SSA - even if you change how the matcher works later that would still be correct)
  • because of where the tracking annotation is set, it needs filtered out of the SSA matching logic - that was causing the test failure as it would keep thinking the configmap needed updated, which had the side effect of deleting the pods. There's probably a better way to do this.

The integration tests are failing due to the changes to CreateUpdateEventFilterTestReconciler - that was manually exercising the dependent logic. With that removed there are 2 executions because we can't tell without the annotation that we're responsible for the create - but as expected the second execution does nothing.

As for the issue of putting the entry unconditionally into cache: #2006 (comment) that's done here conditionally - although with slightly different logic than what is in TemporaryResourceCache.putUpdatedResource In fact I don't think the code in this pr requires explicit optomistic locking for correctness - you should see the same eventual correctness either way.

While I agree on the general approach that if it's not broken, we shouldn't fix it, I also think there's value, moving forward, to simplify things if we can as this will make it easier for people to contribute and maintain the code.

I wouldn't say that things are broken, but there are implications to locking, temporary cache, and the SSA matching that don't degrade nicely - if the temporary cache gets off it blocks reconciliation when locked, and if the SSA matching is off then we can get stuck in a reconciliation loop. Ideally we want things to degrade as much as possible to simply an unomptimized state - I'm thinking that reducing some of the complexity may help with that.

if (oldObject == null && parts.length == 1) {
known = true;
} else if (oldObject != null && parts.length == 2
&& oldObject.getMetadata().getResourceVersion().equals(parts[1])) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe I'm not following, but this shouldn't be newObject here what is compared with previous. So that we know that we just received has the same version as in the annotation (own). Basically saying that if the resource version did not change since we updated the resource we can skip.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry ignore this, brain lag from my side.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation added to what should become v2 will reference reference v1. If we are getting the event before the blocking update has completed, then newObject here is v2 and as long as the old is v1 (our basis for the change), we'll skip propagation.

The problematic case here (which we talked about a little with the current code, but with respect to the temporaryCache) is if the event for v2 is skipped. Normally this shouldn't happen. However it's possible when the informer must fully relist and thus skips ahead, which will generate an update event where the old is the current cache item v1 and the new is the latest, v3. That event would still be seen as "known". I don't think there's a great way to overcome that with local knowledge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a great way to overcome that with local knowledge.

What I mean by that is that there is missing information from the event. Just like onDelete provides unknownFinalState, onUpdate/onAdd could convey whether they are from a relist such that we know the prior state isn't exact.

Copy link
Collaborator

@csviri csviri Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. theoretically we could cache v2 version (we know that from update response) , and if the oldVersion == annotation version but newObjectVersion != v2 ==> emit the event

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining - exceedingly narrow case where this doesn't work:

  • starting with the temporary cache item as null, and informer cache state at v1
  • a full relist of the informer needs to be initiated - just a restart of the watch shouldn't cause a problem
  • the dependent resource initiates an update of v1, which will result in v2 - but the blocking call does not complete yet
  • on the api server in response to v2, v3 is generated by some other actor
  • the relist is processed such that it starts at v3, and emits an add event where the old state is v1 and the new is v3. The logic will detect this as "known", since it matches state expectations - temporary cache item is null and v1 matches the annotation.

This is what I mean to cover by this:

Yes. theoretically we could cache v2 version (we know that from update response) , and if the oldVersion == annotation version but newObjectVersion != v2 ==> emit the event

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. theoretically we could cache v2 version (we know that from update response)

You won't cache that because the update call is completing at the end of the above sequence and by that time the informer cache has already moved ahead to v3.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean caching it explicitly from the response.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there would be still a minor chance I guess that we receive v3 before the v2 is cached, but that is extremely low. (Probably won't happen in our lifetime :) )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there would be still a minor chance I guess that we receive v3 before the v2 is cached, but that is extremely low. (Probably won't happen in our lifetime :) )

Yes this is what I mean by extremely narrow.

@csviri
Copy link
Collaborator

csviri commented Aug 9, 2023

Have to take a look on the test, but in general I like this idea very much. Simplifies the code, there is overhead with the annotation but that is negligible. Added some points to clarify, have to think through some cases, but I think we should go this way.

@csviri
Copy link
Collaborator

csviri commented Aug 9, 2023

One additional though regarding optimistic locking (ssa version) , we saw sometimes that users use some data from the actual resource to create the desired for update. Without the optimistic locking, this might lead to some unexpected behavior, if some would need more strict consistency. So thinking about a feature flag for this (will create a separate issue). Just wanted to hear some opinions in case.

@shawkins shawkins force-pushed the eventing branch 2 times, most recently from f93fddc to 6c4b1d0 Compare August 9, 2023 12:51
@shawkins
Copy link
Collaborator Author

To move this out of draft there are a couple of things to cover:

@csviri
Copy link
Collaborator

csviri commented Aug 10, 2023

@shawkins one more note, it would be better to target the next branch with this PR, there is already a small feature there where will be probably conflicts but nothing too serious, in next we are developing the next minor/major version usually. Would like to have this rather as part of next minor version.

@shawkins
Copy link
Collaborator Author

Moving the last comment over to the new pr and closing this one.

@shawkins shawkins closed this Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants