Version Drain is a workflow for dynamically migrating long-running Temporal workflows to a new versioned task queue to reduce risk and toil from backwards incompatibility.
Please read Temporal worker versioning docs if you are not already familiar.
Our team treats developer efficiency and deployment risk as the top priority. To enable quick iteration in Temporal with our month-long workflows, we tried other solutions which had too many failure modes
- Patching with
if/else
is not comprehensive and developers make mistakes - Using
replayer.ReplayWorkflowHistoryFromJSONFile
in CI is great but has race conditions if a new workflow comes after CI passes but before your new code executes - Versioning entire workflows leaves toilsome cleanup, especially with our rate of CD iteration (10 commits per day)
The Version Drain workflow has been in production successfully for 5 months, performing hundreds of live workflow drains. Developers choose to version the task queue when they want to reduce risk, but since the drain workflow is idempotent, we run without bumping the version to avoid the risk of a race condition (ex. CI goes down for hours during the middle of deployment and new code gets pushed on top of the old).
- Set the new compatible build version in the Temporal server
BuildIDOpAddNewIDInNewDefaultSet
for new versionsBuildIDOpPromoteSet
for existing versions (ex. reverting to an old version)
- Use a query to find the running workflows with a specific
WorkflowType
. Filter out workflows that are already on the new version (maintains idempotency) - Execute
ContinuanceWorkflow
to ContinueAsNew all running workflows in parallel - Poll checking if the workflow exits with
ContinuedAsNew
status
The following are requirements of your system before invoking QueueDrainWorkflow
WorkflowType
must be able to receiveContinueAsNewSignal
and checkpoint itself for continuing- The drain workflow must be called separately for each WorkflowType you want to version
See the ExampleContinueWorkflow and worker to get started.
- History size is clipped as the mechanism uses ContinueAsNew which helps workflow performance
- There is only ever at most 2 versions running in production concurrently (during a migration) so cognitive complexity is very low compared to other versioning solutions