Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a batch API to push many serial puts/deletes from a single process #203

Open
dumbbell opened this issue May 2, 2023 · 1 comment
Open
Assignees
Labels
enhancement New feature or request

Comments

@dumbbell
Copy link
Member

dumbbell commented May 2, 2023

Why

There are situations where we need to perform a large number of puts, for instance:

  • khepri_mnesia_migration when it copies tables from Mnesia to Khepri.
  • RabbitMQ when the user does a definition import or when thousands of queues start at once.

Currently, this means a lot of commands are committed to the underlying Ra server. This puts load on the server itself and may trigger snapshots frequently.

One solution is to use Khepri transactions. However, transactions have contraints that may not be acceptable for the kind of uses listed at the beginning.

How

It would be nice to have an API to batch many puts and/or deletes and send them to the Ra server in a single command.

Here is a proposal through an example:

NewBatch = khepri_batch:new(StoreId, Options),

%% Add put and delete commands to the batch.
%%
%% This doesn't have the properties of a transaction: if the code reads from
%% the database, the returned values won't take the commands in the batch into
%% account.
%%
%% The code could perform any calls with side effects if it needs to.

FinalBatch = lists:foldl(
               fun({Path, Term}, Batch0) ->
                   Batch1 = khepri_batch:put(Batch0, Path, Term),
                   %% Or, delete:
                   %% Batch1 = khepri_batch:delete(Batch0, Path),
                   Batch1
               end, NewBatch, ListOfTerms),

%% The set of commands can be committed partially, or entirely/not at all
%% (configurable behavior).

{ok, PerCommandRets} = khepri_batch:commit(FinalBatch).

%% In the case of the batched simple puts above, the list of return values would be:
%% [ok, ok, ok, ...] = PerCommandRets

Therefore, the idea would be to introduce a khepri_batch module. The caller would use it to prepare a batch of commands; only puts and deletes, no reads and commit it in a single command.

The caller could choose the atomicity: either batched commands are applied individually and a failure of one command doesn't affect the others, or batched commands are applied in an "all or nothing" way.

Also, to avoid batching millions of commands, the caller could set an option telling "commit automatically every N commands".

@dumbbell dumbbell added the enhancement New feature or request label May 2, 2023
@dumbbell dumbbell self-assigned this May 2, 2023
@dumbbell dumbbell changed the title Add a batch API to push many puts/deletes in a single call Add a batch API to push many serial puts/deletes from a single process May 2, 2023
@dumbbell
Copy link
Member Author

I just updated the issue description with the current state of the API I'm working on.

Like the regular and transaction APIs, the batch API would offer both simple (khepri_batch) and advanced (khepri_batch_adv) APIs. The only difference will be the return values, exactly like the other APIs.

What do you think of the general API? Does it fit comfortably current use cases we have?

For the concept of atomicity, I wonder if instead of a boolean flag, we could offer three options:

  • The batch is successfully applied entirely or everything is reverted.
  • The batch is applied entirely, regardless of a specific command return value; all successes and failures are returned.
  • The batch is applied until a command fails: previously applied commands are kept, but following commands are ignored.

Any opinion on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant