Skip to content
This repository has been archived by the owner on Dec 7, 2018. It is now read-only.

Functional specification

mdcallag edited this page Nov 26, 2014 · 8 revisions

Overview

This document explains the behavior of the RocksDB/MySQL engine similar to what has been done for InnoDB in the MySQL manual.

Other documents have been written to discuss issues or define features for early versions of the project including:

This is an extremely rough draft. The following points require much more detail. I doubt that one huge wiki page is the solution so one problem is to figure out how to use multiple wiki pages. As part of this effort we must also write regression tests to confirm that the behavior works as described.

Features

Multi statement transactions

  • Must all changes from a pending commit fit in the memtable? See ms2 for max_row_locks.
  • A transaction sees its own changes prior to commit.
  • Do we need statement rollback within a transaction? Has it been implemented?

Secondary indexes

  • What is the max length of an index key (single or multi-part) is ...
  • Are index-only scans supported for single-part index on (a), for multi-part index on (a,b) or (a,b,c,...)?
  • For which datatypes are index only scans supported? All int-based types, char(n),
  • For which charset/collation for which this is supported -- CHAR(n) COLLATE latin1_bin, VARCHAR(n) COLLATE latin1_bin, possibly support utf8_bin?
  • What is done for case-insensitive collations as the index must have upper-case version of column values. See ms2 for more details to explain what has been done.

Replication

  • Statement or row based?
  • Supports GTID?
  • Supports crash safe slave state?
  • Supports crash safe master (uses XA with binlog and does rollback of prepared on recovery)?
  • Supports multi-threaded slave apply?

SELECT

  • When is snapshot vs iterator vs neither from RocksDB used?
  • SELECT ... FOR UPDATE
  • SELECT ... IN SHARE MODE
  • Is there an option to prevent block cache from getting wiped when full table scan done for logical backup?
  • HANDLER

Row change statements

  • REPLACE
  • INSERT
  • INSERT ... SELECT
  • UPDATE
  • INSERT or UPDATE (upsert) that avoids doing a disk read (optional). Number of affected rows cannot be returned.
  • UPDATE ... SELECT
  • DELETE
  • LOAD DATA

DDL

  • CREATE TABLE
  • DROP TABLE
  • TRUNCATE TABLE
  • column datatypes (list supported types, is blob/clob supported)
  • online DROP/ADD INDEX
  • online DROP/ADD COLUMN

RocksDB

  • How are column families managed -- CF per index, per table or many indexes per CF?
  • Keys are encoded as indexid | col1 | col2.
  • Variable length fields in an index are encoded using a control byte every 9th byte. The control byte lists the number of bytes that are used in the 8 bytes that follow.
  • NULL bytes used 1 byte in indexes and 1 bit in the row header
  • Keys are compared via memcmp. A custom comparator is not needed.
  • How are bloom filters managed and used?
  • use DB::GetApproximateSizes for ha_rocksdb::records_in_range
  • How do we get rec_per_key, index cardinality stats?

Indexes

  • primary
  • unique
  • non-unique - stores {key values, PK values} to make the key unique
  • secondary - composite? which datatypes?
  • can "" or a NULL value be in the index

Other

  • FOREIGN KEYS
  • TRIGGERS - OSC uses triggers so if we need OSC then we need triggers
  • partitioning

Backup

  • physical full hot backup
  • physical incremental hot backup
  • logical full backup

Auto increment

  • if there is an auto-inc lock per table for how long is it held (end of statement, end of transaction, ...)
  • what index is required to support this?
  • is SELECT max(a) FROM TABLE used at process start to get next value for auto-inc columns

Concurrency

  • Read committed and repeatable read cursor isolation will be provided. See Cursor Isolation for more details.
  • If repeatable-committed is provided are rows locked during an index scan unlocked immediately if row doesn't match other predicates in WHERE clause?
  • is gap or next-key locking used
  • can an INSERT, UPDATE, DELETE statement see committed rows that are more recent than read snapshot used by the transaction
  • Is record locking used to prevent concurrent changes to a record from uncommitted transactions? What is locked -- rows by PK, rows by secondary index key? If rows are locked by PK then the standard pattern is 1) do consistent read to get data and then 2) lock data by PK. Given that the PK columns are in both the secondary and primary indexes this can be done without extra index searches

Configuration

  • what my.cnf parameters are supported - see RocksDB/RocksDB Storage Engine parameters

Monitoring

  • what show commands are supported - SHOW STATUS, SHOW VARIABLES, SHOW ENGINE ROCKSDB STATUS?