Event-based subscriptions on storage status #321

Gerold103 · 2022-02-23T23:37:52Z

Gerold103
Feb 23, 2022
Maintainer

The related issue is #313. The discussion starts with a description of how the task looks in my understanding. Then I provide my vision of API and behaviour, some insights at internals, frequent questions, alternatives.

Problems with how it works now

Router needs to find which replica in each replicaset is a master. It used to be done manually via router's config, but it is too fragile. In case of any config update issue the router would be stuck having wrong information about who is the master. The least of problems would be that it wouldn't be able to execute write-requests to that replicaset.

Then master = 'auto' feature was introduced. Router became able to discover master automatically. It solves the problem on a high-level, but has an issue with being polling-based. It means, that discovery happens with a certain period by calling a function on the storages.

If a change happens between the discovery calls, the router won't see it. To mitigate the problem the router does implicit master discovery hacks. For example, an error about a replica being read-only also contains who the replica considers master. The router then might be able to switch it and retry the original user's request transparently.

Besides, polling-based solution is just complicated code-wise. Need to send the discovery requests in a fiber, because netbox doesn't support async requests with callbacks. Additionally, even if netbox would support it, that would be expensive. Because such request would need to be a long-polling request sleeping on the storage waiting for changes. It would be -1 to box.cfg.net_msg_max and +1 fiber not doing anything for each router in the cluster on every replica.

How it should work

Since introduction of box watchers feature (appears in >= 2.10.0-beta2) Tarantool is able to provide subscription endpoints. An arbitrary key-value pair can be set on Tarantool instance using box.broadcast(key, value) API. The local users and remote connections can subscribe on the key and get updates every time when value changes. These APIs are netbox_conn:watch() and box.watch().

Tarantool will provide some built-in events in the future, but currently it is not done even in master.

The idea is that vshard should define its own events. They would contain vshard-specific info based on the planned built-in events. The first proposed one is:

vshard.storage.election = {
    term = box.info.election.term,
    role = box.info.election.state,
    leader = box.info.election.leader,
    is_ro = box.info.ro,
    is_master_cfg = vshard.cfg.this_replica.master,
}

term, role, leader, is_ro are related to automatic leader election. It is not supported for now in vshard, but it will be in the future. Better expose these fields right away. They repeat the future planned built-in event called box.election.

The field is_master_cfg is vshard-specific. It is not the same as 'leader' or even as being actually writable. It is just what is specified in storage's config in its [replica_uuid] = {master = <bool>, ...}.

Router will use this field to detect who is the master when it is present. For automatic elections the idea is that is_master_cfg will be just nil.

API and behaviour

Storage

vshard.storage.election is public storage event available for subscriptions. It will be documented and will be used by the routers. Other connectors are free to use it as well. Although some fields might be documented as 'risky' to rely on. For example, is_master_cfg might disappear in some version far in the future or become optional.

The event will work for versions >= 2.10.0-beta2. Fired every time when any of the event fields change.

Before vshard storage is configured the event will return nil. Like documented in box.broadcast/box.watch. Connectors just need to be ready to that and treat it like the event never happened yet. Or like the storage simply is of an older version and does not have this event at all.

When subscription is tried to be done on an instance < 2.10.0-beta2, it will fail with an unknown request type error. Netbox is able to handle it. Other connectors should adapt too if they want subscriptions.

Router

Router can't drop polling-based master discovery. Because it supports Tarantool versions >= 1.10.1 and vshard storage versions <= 0.1.19. The polling is not going anywhere in the foreseeable future.

It means the router will need to support both polling and events. However it might be not as complicated as it sounds.

Consider connections to one replicaset. They have on_connect triggers installed by the router. In the trigger it is possible to check conn.peer_protocol_features.watchers field. By the time the trigger is called the protocol features are already revealed.

For each connection in on_connect check if peer_protocol_features.watchers is true, then subscribe on vshard.storage.election. If any replica does not support watchers and until any non-nil data is received for this event for each connection, the entire replicaset uses polling for master discovery. Having it working on a subset of nodes wouldn't simplify anything. Now assume it is true on all connections.

Assume now that all replicas received something for vshard.storage.election. Then the last replica will switch the replicaset to event-based master discovery. Polling won't be used for it.

In case of a disconnect the replica is considered in the same state until a reconnect happens. Except that it probably will stop being considered master.

When a new event arrives and the replica becomes a master or stops being a master, it changes replicaset.master field right from the event callback.

The same procedure happens for each replicaset.

To sum up:

Event-based master discovery is transparent to user. He only needs to set master = 'auto' in the router's config, like before. The router will choose polling vs events internally.
Event-based discovery is turned on per-replicaset. For that all its replicas must be vshard >= 0.1.20 and Tarantool >= 2.10.0-beta2.

FAQ

Why is the event called vshard.storage.election instead of vshard.election?

I think the router in the future might want to expose own events too. One of their features is that they don't need box.cfg to be called. Which means the router could potentially have own events. Given that router and storage can be hosted in the same process, it seems logical to split their event namespaces: vshard.storage.* and vshard.router.*.

Alternatives

Netbox async call with a callback

There was an idea (still is) to make netbox able to take a callback along with is_async option and call it when the request is finished. It would allow to send long-polling requests to storages waiting for changes and doing return when something happens. On the router the result would be processed in a callback.

There are problems with that when try to use it as subscriptions:

Each such request wastes 1 fiber and 1 message from box.cfg.net_msg_max on the storage;
It can't be canceled without reconnect;
It needs to wakeup sometimes on the storage's side to see if the connection is still alive;

Subscriptions don't have any of these problems.

Split vshard versions like Tarantool does

Develop vshard in 2 branches. First for Tarantool < 2.10.0-beta2. Second for >= the version. The first one won't have event support in its code at all. This is the same what is in master now. The second won't have polling at all. All will be event-based.

This might simplify the code. However it is a critical measure in case of serious complications with polling and events cooperation. It might simplify each version, but in total it means twice more code to support, more packages to produce, more CI work.

R-omk · 2022-02-24T09:06:51Z

R-omk
Feb 24, 2022

Related issues from cartridge project: Raft failover, vshard router master=auto

0 replies

sergos · 2022-03-01T11:42:10Z

sergos
Mar 1, 2022
Maintainer

Better expose these fields right away. They repeat the future planned built-in event called box.election

Why do you need this duplication then? You can subscribe for the box.election instead?
Do you expect further leader identification activities at the storage, such as term comparison, ro status and its relevance to the configuration? If yes, then probably it'll be better to send another event, making all of the above opaque to the client/router?

The field is_master_cfg

It will be changed only when one change the config on the node, isn't it? And you propose to check if it is nil every time election happens, messaging from all nodes to all nodes, while no changes to this state at all.
I bet a separate event vshard.storage.master_cfg should work way better?

1 reply

Gerold103 Mar 1, 2022
Maintainer Author

Why do you need this duplication then? You can subscribe for the box.election instead?

I don't want to subscribe on 2 election events. That is for simplicity This is why, for instance, we don't broadcast term, role, is_ro, and is_ro_cfg as 4 separate events. To make it simpler to just use a single event and process it in a single callback. Not only it reduces number of public events and makes it easier to wrap your mind around what is tied with what, but also should reduce amount of code. You will have just a single handler for a single election-related event.

Do you expect further leader identification activities at the storage, such as term comparison, ro status and its relevance to the configuration?

Yes, sure. When auto-election is enabled, it will be essential. Term comparison, for instance, is a key to ignore stale masters which didn't realise that they are demoted yet, but their outdated term might be already visible to the routers.

If yes, then probably it'll be better to send another event, making all of the above opaque to the client/router?

It can't be opaque to the router, because the router is the one who will need to handle the events. If you mean opaque to the clients - yes, it will be. But it will be regardless of how we implement the events. The events are under the hood. As for one event vs multiple events: I tried to elaborate it above. I decided, that it is much easier to subscribe on single vshard.storage.election than on both the latter and the built-in box.election event.

It will be changed only when one change the config on the node, isn't it?

Correct.

And you propose to check if it is nil every time election happens, messaging from all nodes to all nodes, while no changes to this state at all.

Correct as well. But you make it sound like it is something expensive. We don't expect that many master changes so as a single if would affect anything.

I bet a separate event vshard.storage.master_cfg should work way better?

It would certainly work. But I wouldn't say much better or even just better. I will need to have 2 callbacks to process 2 election-related events: vshard.storage.master_cfg and box.election. My bet is that in the code it will be just notably easier to have a single function like process_election(event) which would do all the checks and arrangements, instead of 2 functions.

Gerold103 · 2022-03-11T23:35:03Z

Gerold103
Mar 11, 2022
Maintainer Author

It was decided not to patch vshard's replicaset module except for bugfixes.

Instead, it is going to be extracted into an external module net.replicaset usable without vshard, like net.box. It will rely entirely on built-in subscriptions. In scope of this ticket that would be box.election which is yet to be submitted to main repo's master.

The good side is that it will probably improve code re-usage between vshard and net.replicaset. Bad sides are:

VShard will get a new dependency. In fact, that would be the first vshard dependency except tarantool core itself. It in turn means that vshard will need to become even more flexible when it comes to compatibility with external 2 modules.
There won't be vshard.storage.election event. That in turn means that box.cfg.read_only will be the only source of info about a storage being a master until automatic election support is added. The repercussion is inability to find who is the master in case the user called box.cfg{read_only = true} on all nodes manually or specified read_only = true in vshard storage config. Probably won't matter much though. Read-only requests will still work. It will affect only requests marked as read-write but doing only reads.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Event-based subscriptions on storage status #321

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Event-based subscriptions on storage status #321

Gerold103 Feb 23, 2022 Maintainer

Problems with how it works now

How it should work

API and behaviour

Storage

Router

FAQ

Alternatives

Replies: 3 comments · 1 reply

R-omk Feb 24, 2022

sergos Mar 1, 2022 Maintainer

Gerold103 Mar 1, 2022 Maintainer Author

Gerold103 Mar 11, 2022 Maintainer Author

Gerold103
Feb 23, 2022
Maintainer

Replies: 3 comments 1 reply

R-omk
Feb 24, 2022

sergos
Mar 1, 2022
Maintainer

Gerold103 Mar 1, 2022
Maintainer Author

Gerold103
Mar 11, 2022
Maintainer Author