Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Worker Versioning API #337

Closed
wants to merge 8 commits into from
Closed

Conversation

ShahabT
Copy link
Contributor

@ShahabT ShahabT commented Dec 12, 2023

What changed?

  • Deprecating UpdateWorkerBuildIdCompatibility and GetWorkerBuildIdCompatibility and replacing them with UpdateWorkerVersioningRules and ListWorkerVersioningRules.
  • Deprecating the concept of Version Set and "default Version Set", replacing them with more flexible BuildIdAssignmentRule and CompatibleBuildIdRedirectRule concepts.
  • The assignment rules assign a Build ID to new (Unassigned) Workflow Executions and Activities.
  • The redirect rules are only used in context of compatible Build IDs. These rules apply to tasks assigned to a particular Build ID (source_build_id) to redirect them to another compatible Build ID (target_build_id).

Why?

  • Be able to support Gradual rollout for Canary, Blue/Green, Rolling deployments (and alike)
    • Rolling: ability to send new and/or existing executions to the new build based on the worker count of new vs old build.
    • Blue/Green: ability to send a particular percentage of the new executions to the new build
    • Canary: ability to filter new executions based on “VersioningHint” Search Attribute set by Starter
  • Version Set is not the right abstraction for the Build ID compatibility relationships, replaced it with CompatibleBuildIdRedirectRule which:
    • Do not loose the directionality: a redirect is from a source Build ID to a target Build ID
    • Are removable (useful in cases that the compatibility was claimed by mistake)
    • Are more powerful: redirect rules support worker-ratio-based ramps. Other ramp types, dry run mode, and filter can be added in the future.
  • Making the API Incompatible-first: The primary beneficiary of WV are use cases with presumed-incompatible build IDs because WV frees them from caring about compatibility. They should be able to use WV without having to learn or think about Version Sets which currently takes the center stage of the API. Now these use cases can use BuildIdAssignmentRule with independent Build IDs.

Breaking changes
The UpdateWorkerBuildIdCompatibility and GetWorkerBuildIdCompatibility become deprecated, but supported for one version beside the new API.

@ShahabT ShahabT requested review from a team as code owners December 12, 2023 19:24
@@ -92,6 +92,12 @@ message ActivityType {
string name = 1;
}

message KeywordFilter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly recommend we use search attributes for this and would also prefer if we don't introduce this in the initial version of this API to minimize the scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll use search attributes. this is to be able to filter based on those search attributes. Given that the search attribute eliminates the extra SDK side work for this feature, I though we can add it to initial version to have a stronger message for supporting canary deployments. but I'm open to commenting it out rn and add it later.

Copy link
Member

@dnr dnr Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify this even more: each assignment rule has a single optional string, that's an exact match for the hint search attribute? No excludes, no multiple includes. You can get the same effect as multiple includes by adding multiple assignment rules.

temporal/api/common/v1/message.proto Outdated Show resolved Hide resolved
temporal/api/common/v1/message.proto Outdated Show resolved Hide resolved
temporal/api/taskqueue/v1/message.proto Outdated Show resolved Hide resolved
temporal/api/workflowservice/v1/request_response.proto Outdated Show resolved Hide resolved
temporal/api/workflowservice/v1/request_response.proto Outdated Show resolved Hide resolved
temporal/api/workflowservice/v1/request_response.proto Outdated Show resolved Hide resolved
@@ -92,6 +92,12 @@ message ActivityType {
string name = 1;
}

message KeywordFilter {
Copy link
Member

@dnr dnr Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify this even more: each assignment rule has a single optional string, that's an exact match for the hint search attribute? No excludes, no multiple includes. You can get the same effect as multiple includes by adding multiple assignment rules.

temporal/api/taskqueue/v1/message.proto Outdated Show resolved Hide resolved
temporal/api/workflowservice/v1/request_response.proto Outdated Show resolved Hide resolved
temporal/api/workflowservice/v1/request_response.proto Outdated Show resolved Hide resolved
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: My approval is purely on syntax. Would appreciate not using it as the "SDK team approve" and get another, more-versioning-familiar SDK team member to provide logical approval.

// The subset is defined by:
// - Including or excluding particular values of the `VersioningHint` Search
// Attribute. This is useful for Canary deployments, for example, when
// only the executions with `versioning_hint="canary"` should be sent to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this comment should be updated.


// Filter executions based on the "VersioningHint" Search Attribute set by
// the Starter or parent Workflow.
string hint_filter = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how this is supposed to be used? Is this a visibility query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is to have a pre-defined SA such as "VersioningHint" that user can use. History sends only that SA to matching (if present). If we support any SA, then History has to send all the SA's to matching.

message BuildIdAssignmentRule {
string target_build_id = 1;

// Filter executions based on the "VersioningHint" Search Attribute set by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we want to filter on any search attribute. Not on a specific one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a note against search attributes, we can use this to assign activity tasks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this feature requires more thought and I would remove it from the API entirely for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll do this at the very end. in the meanwhile we can finalize the design. the current thought is to use a pre-defined SA specific for this purpose.

For activities, maybe we can send the wf's hint to matching automatically. not sure if there is any need to set a different hint for the activity when it comes to deployment use cases.

rpc UpdateWorkerVersioningRules (UpdateWorkerVersioningRulesRequest) returns (UpdateWorkerVersioningRulesResponse) {}

// Fetches the Build ID assignment and redirect rules for a Task Queue
rpc ListWorkerVersioningRules (ListWorkerVersioningRulesRequest) returns (ListWorkerVersioningRulesResponse) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if Get is a more appropriate verb than list here.
Our other list APIs imply pagination of identifiable objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an opinion. will wait to see what others have to say.

@bergundy
Copy link
Member

Not sure if mentioned before, please do not merge this until the feature is implemented.

// is automatically set as the ratio of the target Build ID's active
// workers over all active workers for any Build ID used in the
// assignment rules.
RampByWorkerRatio worker_ratio_ramp = 4;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this part to support a ramp based on worker count instead of the contingent mode. Feeding target build ID as much as it can could have undesired consequences. The ramp based on ratio of new vs old workers is better in the following ways:
◦ does not overwhelm new workers when they are not tuned well
◦ when operating with a lot of headroom, ramp % still remains close to upgrade %
◦ you don’t end up in a situation that more wfs are moved to the new build than it can really handle. For example, if wf pollers are plenty but activity pollers are not (which may often be the case) you’d still end up moving a lot of workflows without having capacity to process their activities.

@ShahabT
Copy link
Contributor Author

ShahabT commented Apr 10, 2024

API Changes merged from versioning-2 feature branch to main in #393

@ShahabT ShahabT closed this Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants