You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Storing metadata about the cluster — what processes live where, what role they are currently serving, how they should be configured, etc. These are the Cells, Keyspaces, Shards, Tablets and associated metadata (e.g. vschema) which make up the cluster.
A distributed lock/coordination service — a means for these loosely coupled processes within the Vitess cluster to safely coordinate on tasks.
Simple examples are updating Keyspace configuration options where a Keyspace lock is used.
More complex examples are reparenting a shard where a Shard lock is taken and a failover performed where we update the tablet types and shard configuration atomically and another is performing a traffic switching operation in a VReplication workflow where a Keyspace lock is taken on the source and target keyspaces so that routing rules, shard records, vreplication state, etc are updated atomically. The latter one in particular being a good example of cases affecting various keys across the topology server so it's not simply a matter of locking a specific key.
It's these more complex cases — where we have to wait for states across processes to converge (e.g. replication to catch up on N shards where N can be thousands) before proceeding to the next step — that are problematic as they can easily take 30-60+ seconds to complete, during which time the lock may actually be lost and the behavior becomes undefined.
Within Vitess we use TTL (time-to-live) values with locks which differ across the topo implementations:
ZooKeeper has no lock TTLs so this is not an issue there. You already hold the lock until you release it or your session ends.
Consul has session TTLs that we specify when requesting the lock. The lock is held until you release it, the session TTL is reached, or your session ends. With Consul, the default TTL is 15 seconds which comes from the --topo_consul_lock_session_ttl flag.
With Etcd we specify a TTL for a lease that we get — which we tie to the lock (which in turn is an ephemeral KV) — when getting the lock and we auto-extend the lease via client/server keep-alive cycles until you release the lock, the context used is cancelled, the TTL is reached, or the session ends. With Etcd, the default TTL is 30 seconds which comes from the --topo_etcd_lease_ttl flag.
All three topo server implementations implement client/server sessions which implement some form of keep-alive cycles and when the client/session is lost — e.g. the client process, e.g. vtctld or vttablet, which took the lock crashes — the lock is removed/released as the client performs no more keep-alive work.
Problems
Locks are sometimes not held long enough. The lock is lost and the caller is unaware, thus leading to N processes performing actions where they each assume they are holding an exclusive lock on related resources. This leads to undefined behavior and can cause very serious problems.
There are various related timeouts in place across processes and actions. For example vtctld and vttablet have the --topo_etcd_lease_ttl flag which determines the TTL for any lock they take while the SwitchTraffic command has a --timeout flag and VDiff has a --filtered-replication-wait-time flag, both of which determine how long to wait for replication to catch up. These interrelated timeouts exist throughout the code base and it's not clear to the user when they are putting consistency at risk by e.g. using a command timeout value larger than the TTL when doing a traffic switch. The full set of lock related flags and behaviors are not well understood or documented — which poses challenges for Vitess developers and users alike. Only the caller knows all of this context and thus the caller needs a method to override the default TTL for the given lock.
You may need/want to coordinate on work that is not related to data stored directly in the topology server. Today you can only lock a topo entity/key (e.g. a Keyspace record).
Proposal
Provide the following new mechanisms to improve topology server locking:
Provide a way to coordinate on work that is not directly related to a topo entity/key.
Provide a way to allow a caller to override any default lock TTL.
Adding LockWithTTL to the topology server interface.
Both of those are then used to improve the locking done by VReplication. We leverage the named locks to take locks on a workflow to coordinate across the VReplication and VDiff engines — now VDiff no longer blocks any unrelated operations (the Keyspace lock blocks work on other workflows in the keyspace, schema changes, keyspace config changes, etc) and the lock is held as long as needed. We leverage LockWithTTL to ensure that during traffic switching operations we hold the Keyspace lock as long as needed based on the command's timeout.
The text was updated successfully, but these errors were encountered:
Description
The Topology Service provides two core features in Vitess:
--topo_consul_lock_session_ttl
flag.--topo_etcd_lease_ttl
flag.vtctld
orvttablet
, which took the lock crashes — the lock is removed/released as the client performs no more keep-alive work.Problems
vtctld
andvttablet
have the--topo_etcd_lease_ttl
flag which determines the TTL for any lock they take while theSwitchTraffic
command has a--timeout
flag and VDiff has a--filtered-replication-wait-time
flag, both of which determine how long to wait for replication to catch up. These interrelated timeouts exist throughout the code base and it's not clear to the user when they are putting consistency at risk by e.g. using a command timeout value larger than the TTL when doing a traffic switch. The full set of lock related flags and behaviors are not well understood or documented — which poses challenges for Vitess developers and users alike. Only the caller knows all of this context and thus the caller needs a method to override the default TTL for the given lock.Proposal
Provide the following new mechanisms to improve topology server locking:
Related Issues
Proof-Of-Concept
That PR implements the two proposals by:
LockWithTTL
to the topology server interface.Both of those are then used to improve the locking done by VReplication. We leverage the named locks to take locks on a workflow to coordinate across the VReplication and VDiff engines — now VDiff no longer blocks any unrelated operations (the Keyspace lock blocks work on other workflows in the keyspace, schema changes, keyspace config changes, etc) and the lock is held as long as needed. We leverage
LockWithTTL
to ensure that during traffic switching operations we hold the Keyspace lock as long as needed based on the command's timeout.The text was updated successfully, but these errors were encountered: