You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are situations where we need to perform a large number of puts, for instance:
khepri_mnesia_migration when it copies tables from Mnesia to Khepri.
RabbitMQ when the user does a definition import or when thousands of queues start at once.
Currently, this means a lot of commands are committed to the underlying Ra server. This puts load on the server itself and may trigger snapshots frequently.
One solution is to use Khepri transactions. However, transactions have contraints that may not be acceptable for the kind of uses listed at the beginning.
How
It would be nice to have an API to batch many puts and/or deletes and send them to the Ra server in a single command.
Here is a proposal through an example:
NewBatch=khepri_batch:new(StoreId, Options),
%% Add put and delete commands to the batch.%%%% This doesn't have the properties of a transaction: if the code reads from%% the database, the returned values won't take the commands in the batch into%% account.%%%% The code could perform any calls with side effects if it needs to.FinalBatch=lists:foldl(
fun({Path, Term}, Batch0) ->
Batch1=khepri_batch:put(Batch0, Path, Term),
%% Or, delete:%% Batch1 = khepri_batch:delete(Batch0, Path),Batch1end, NewBatch, ListOfTerms),
%% The set of commands can be committed partially, or entirely/not at all%% (configurable behavior).
{ok, PerCommandRets} =khepri_batch:commit(FinalBatch).
%% In the case of the batched simple puts above, the list of return values would be:%% [ok, ok, ok, ...] = PerCommandRets
Therefore, the idea would be to introduce a khepri_batch module. The caller would use it to prepare a batch of commands; only puts and deletes, no reads and commit it in a single command.
The caller could choose the atomicity: either batched commands are applied individually and a failure of one command doesn't affect the others, or batched commands are applied in an "all or nothing" way.
Also, to avoid batching millions of commands, the caller could set an option telling "commit automatically every N commands".
The text was updated successfully, but these errors were encountered:
dumbbell
changed the title
Add a batch API to push many puts/deletes in a single call
Add a batch API to push many serial puts/deletes from a single process
May 2, 2023
I just updated the issue description with the current state of the API I'm working on.
Like the regular and transaction APIs, the batch API would offer both simple (khepri_batch) and advanced (khepri_batch_adv) APIs. The only difference will be the return values, exactly like the other APIs.
What do you think of the general API? Does it fit comfortably current use cases we have?
For the concept of atomicity, I wonder if instead of a boolean flag, we could offer three options:
The batch is successfully applied entirely or everything is reverted.
The batch is applied entirely, regardless of a specific command return value; all successes and failures are returned.
The batch is applied until a command fails: previously applied commands are kept, but following commands are ignored.
Why
There are situations where we need to perform a large number of puts, for instance:
khepri_mnesia_migration
when it copies tables from Mnesia to Khepri.Currently, this means a lot of commands are committed to the underlying Ra server. This puts load on the server itself and may trigger snapshots frequently.
One solution is to use Khepri transactions. However, transactions have contraints that may not be acceptable for the kind of uses listed at the beginning.
How
It would be nice to have an API to batch many puts and/or deletes and send them to the Ra server in a single command.
Here is a proposal through an example:
Therefore, the idea would be to introduce a
khepri_batch
module. The caller would use it to prepare a batch of commands; only puts and deletes, no reads and commit it in a single command.The caller could choose the atomicity: either batched commands are applied individually and a failure of one command doesn't affect the others, or batched commands are applied in an "all or nothing" way.
Also, to avoid batching millions of commands, the caller could set an option telling "commit automatically every N commands".
The text was updated successfully, but these errors were encountered: