Currently the library contains a query planner and an executor for the distributed SQL in a Tarantool cartridge
cluster.
Sbroad is only supported by Picodata Tarantool currently. You can download it here
First you need to have rust
, tarantool
and cartridge-cli
CLI tools
installed and available in your environment, plus some other development
packages Sbroad depends on, including openssl
, libicu
and
libreadline
An example of the sbroad
integration with Tarantool can be found in the test_app
folder.
Check out the separate README.md for using Sbroad with the test Cartridge application.
tarantoolctl rocks --only-server https://download.picodata.io/luarocks/ install sbroad <version>
(use one of the version numbers in Tags)
For a Cartridge application add the command above into cartridge.pre-build
file and sbroad roles into your role's dependencies (for example see test_app)
git clone https://gitlab.com/picodata/picodata/sbroad.git
cd sbroad
make test_all
cartridge.roles.sbroad-storage
role initializes functions that accept a local SQL query from the router for execution. Depends on the vshard-storage
role.
cartridge.roles.sbroad-router
role initializes functions that transform a distributed SQL into a sequence on local SQL queries and dispatch them to the storage. Depends on the vshard-router
role.
The sbroad
library consists of three main parts:
- SQL frontend
- planner
- executor
We try to keep the planner independent from other modules. For example, we can implement some other frontend and use the already implemented planner and executor.
More information about the ideas behind sbroad
can be found in the Sbroad internal design presentation. You can also find out more details in the built-in module documentation:
cargo doc --open
We use a custom PEG for the pest
parser generator to compile an SQL query and a vshard
cluster schema into the planner's intermediate representation (IR).
The SQL query passes three main steps:
- the parse tree (PT) iterator, which is produced by the
pest
parser generated for our grammar. - abstract syntax tree (AST). It is a wrapper over the PT nodes to transform them into a more IR-friendly tree. Also we have our own iterator to traverse the AST in the convenient order.
- planning intermediate representation (IR), which is a complete self-describing tree with an information derived from the SQL query and the table schema. It can be used by the planner for further transformations.
The main goal of the planner is to insert the data motion nodes in the IR tree to instruct the executor where and what portions of data to transfer from one Tarantool instance to another within a cluster.
To make these motions efficient (i.e. transfer as less data amount as possible) we rely on the tuple distribution information. In case of a distribution conflict we decide to solve it by inserting a motion node. The better information about distribution we have in every tuple of the IR tree, the less data we move over the cluster. To save on data transfer the planner applies many IR transformations other than the motion derivation.
The executor is located on the coordinator node in the vshard
cluster. It collects all the intermediate results of the plan execution in memory and executes the IR plan tree in the bottom-up manner. It goes like this:
- The executor collects all the motion nodes from the bottom layer. In theory all the motions in the same layer can be executed in parallel (this feature is yet to come).
- For every motion the executor:
- inspects the IR sub-tree and detects the buckets to execute the query for.
- performs map-reduce for that IR subtree (we send it to the shards deduced from the buckets):
- serialize IR subtree to send with message pack
- deserialize it on the executor and build a valid SQL
- execute SQL and return results
- builds a virtual table with query results that correspond to the original motion.
- Moves to the next motion layer in the IR tree.
- For every motion the executor then:
- links the virtual table results of the motion from the previous layer we depend on.
- inspects the IR sub-tree and detects the buckets to execute the query.
- performs map-reduce for that IR sub-tree
- on the map stage we use virtual table to build a temporary space (and populate it with dispatched virtual table's data)
- build and execute a valid SQL from IR sub-tree and this new temporary space
- remove temporary space
- builds a virtual table with query results that correspond to the original motion.
- Repeats step 3 till we are done with motion layers.
- Executes the final IR top subtree and returns the final result to the user.
The most complicated logic here can be found in the IR to SQL serialization (SQL backend) and bucket deduction (to execute the resulting SQL query on).