Skip to content
This repository has been archived by the owner on Dec 7, 2018. It is now read-only.

Functional specification

mdcallag edited this page Oct 24, 2014 · 8 revisions

Overview

This document explains the behavior of the RocksDB/MySQL engine similar to what has been done for InnoDB in the MySQL manual.

Other documents have been written to discuss issues or define features for early versions of the project including:

This is an extremely rough draft. The following points require much more detail. I doubt that one huge wiki page is the solution so one problem is to figure out how to use multiple wiki pages. As part of this effort we must also write regression tests to confirm that the behavior works as described:

  • multi-statement transactions (TODO - does pending commit have to fit in the memtable). Transaction sees its own changes prior to commit. See ms2 for max_row_locks.
  • index only scans of secondary indexes
    • max length of an index key (single or multi-part) is ...
    • single-part index on (a)
    • multi-part index on (a,b) or (a,b,c,...)
    • datatypes for which this is supported include all int-based types, char(n), CHAR(n) COLLATE latin1_bin, VARCHAR(n) COLLATE latin1_bin, possibly support utf8_bin.
    • charset/collation for which this is supported. What is done for case-insensitive collations as the index must have upper-case version of column values. See ms2 for more details to explain what has been done.
  • replication - statement or row based?
  • SELECT
    • when is snapshot vs iterator from RocksDB used?
  • SELECT ... FOR UPDATE
  • SELECT ... IN SHARE MODE
  • REPLACE
  • INSERT
  • INSERT ... SELECT
  • UPDATE
  • INSERT or UPDATE (upsert) that avoids doing a disk read (optional). Number of affected rows cannot be returned.
  • UPDATE ... SELECT
  • DELETE
  • CREATE TABLE
  • DROP TABLE
  • TRUNCATE TABLE
  • LOAD DATA
  • column datatypes (list supported types, is blob/clob supported)
  • online DROP/ADD INDEX
  • online DROP/ADD COLUMN
  • HANDLER
  • RocksDB
    • how are column families managed -- CF per index, per table or many indexes per CF?
    • Keys are encoded as indexid | col1 | col2. TODO - explain how variable length columns are encoded. Confirm that varchar is not padded to max length.
    • Keys are compared via memcmp. A custom comparator is not needed.
    • how are bloom filters managed and used?
  • indexes
    • use DB::GetApproximateSizes for ha_rocksdb::records_in_range
    • how do we get rec_per_key, index cardinality stats?
    • primary
    • unique
    • non-unique - stores {key values, PK values} to make the key unique
    • secondary - composite? which datatypes?
    • can "" or a NULL value be in the index
  • FOREIGN KEYS
  • TRIGGERS - OSC uses triggers so if we need OSC then we need triggers
  • partitioning
  • crash safe replication on a master
  • crash safe replication on a slave
  • backup:
    • physical full hot backup
    • physical incremental hot backup
    • logical full backup
  • auto increment
    • if there is an auto-inc lock per table for how long is it held (end of statement, end of transaction, ...)
    • what index is required to support this?
    • is SELECT max(a) FROM TABLE used at process start to get next value for auto-inc columns
  • concurrency
    • what cursor isolation levels are provided: read committed, repeatable read, serializable?
    • is gap or next-key locking used
    • can an INSERT, UPDATE, DELETE statement see committed rows that are more recent than read snapshot used by the transaction
    • Is record locking used to prevent concurrent changes to a record from uncommitted transactions? What is locked -- rows by PK, rows by secondary index key? If rows are locked by PK then the standard pattern is 1) do consistent read to get data and then 2) lock data by PK. Given that the PK columns are in both the secondary and primary indexes this can be done without extra index searches
  • transactions
    • Do we need statement rollback within a transaction? Has it been implemented?
  • Configuration
    • what my.cnf parameters are supported - see RocksDB/RocksDB Storage Engine parameters
  • Monitoring
    • what show commands are supported - SHOW STATUS, SHOW VARIABLES, SHOW ENGINE ROCKSDB STATUS?
Clone this wiki locally