ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-text search.
To use ReefDB, you can choose between an in-memory storage (InMemoryReefDB
) or on-disk storage (OnDiskReefDB
).
use reefdb::InMemoryReefDB;
fn main() {
let mut db = InMemoryReefDB::new();
// Basic SQL operations
db.query("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)");
db.query("INSERT INTO users VALUES (1, 'Alice')");
db.query("SELECT * FROM users WHERE id = 1");
// Full-text search
db.query("CREATE TABLE books (title TEXT, description TSVECTOR)");
db.query("INSERT INTO books VALUES ('Book 1', 'A book about computer science')");
db.query("SELECT title FROM books WHERE description MATCH 'computer'");
// Joins
db.query("CREATE TABLE authors (id INTEGER PRIMARY KEY, name TEXT)");
db.query("CREATE TABLE books (id INTEGER PRIMARY KEY, title TEXT, author_id INTEGER)");
db.query("SELECT authors.name, books.title FROM authors INNER JOIN books ON authors.id = books.author_id");
}
use reefdb::OnDiskReefDB;
fn main() {
let mut db = OnDiskReefDB::new("db.reef".to_string(), "index.bin".to_string());
// Use the same SQL queries as with InMemoryReefDB
}
- ✅ In-Memory and On-Disk storage modes
- ✅ Basic SQL statements (CREATE, INSERT, SELECT, UPDATE, DELETE)
- ✅ ALTER TABLE with ADD/DROP/RENAME column support
- ✅ DROP TABLE functionality
- ✅ INNER JOIN support
- ✅ Primary key constraints
- ✅ Basic error handling system
- ✅ TSVECTOR data type
- ✅ Inverted index implementation
- ✅ Basic tokenization
- ✅ Memory and disk-based index storage
- ✅ MATCH operator for text search
- ✅ Basic transaction structure
- ✅ Transaction isolation levels (ReadUncommitted, ReadCommitted, RepeatableRead, Serializable)
- ✅ Write-Ahead Logging (WAL)
- ✅ Transaction manager with locking mechanism
- ✅ Full ACID compliance
- ✅ Deadlock detection
- ✅ MVCC implementation
- ✅ B-Tree index implementation
- ✅ CREATE INDEX and DROP INDEX support
- ✅ Index persistence for on-disk storage
- ✅ Basic query optimization with indexes
- Real-time index updates with ACID compliance
- Index consistency verification
- Atomic index operations
- Index recovery mechanisms
- Concurrent index access
- Basic aggregate functions (COUNT, SUM)
- ORDER BY implementation
- LIMIT and OFFSET support
- LEFT JOIN support
- Query timeout mechanism
- Memory-mapped storage
- Page-level operations
- Buffer management
- Memory-mapped file handling
- Crash recovery
- Basic query optimization
- Statistics-based planning
- Index usage optimization
- Join order optimization
- Index compression
- Parallel query execution
- Basic query metrics
- Index usage statistics
- Transaction monitoring
- Error logging and tracing
- Multi-column indexes
- Hash indexes for equality comparisons
- Bitmap indexes for low-cardinality columns
- Cost-based optimizer
- Query plan visualization
- Incremental indexing
- RIGHT JOIN
- OUTER JOIN
- CROSS JOIN
- FULL JOIN
- NATURAL JOIN
- SELF JOIN
- Additional aggregate functions (AVG, MIN, MAX)
- GROUP BY and HAVING clauses
- Index statistics
- Query explanation
- Performance metrics
- Index health checks
- Query profiling
-
Advanced Index Types
- BM25 scoring with configurable parameters
- TF-IDF with normalization options
- Custom scoring functions
- Position-aware indexing
- Field norms support
-
Query Features
- Fuzzy matching with configurable distance
- Regular expression support
- Range queries
- Boolean queries with minimum match
- Phrase queries with slop
- Query rewriting and optimization
- Query expansion
- Prefix matching (e.g.,
web:*
) - Complex boolean expressions with parentheses
- Result ranking with
ts_rank
- Text highlighting with
ts_headline
-
Vector Data Types and Operations
- VECTOR(dimensions) data type
- Vector similarity operators (<->, <=>, <#>)
- Configurable distance metrics (L2, Cosine, Dot Product)
- Vector normalization options
-
Dimension-Optimized Indexes
- KD-Tree for low dimensions (≤ 8)
- HNSW for medium dimensions (≤ 100)
- Brute Force with SIMD for high dimensions
- Index selection based on dimensionality
-
Advanced Vector Search Features
- Approximate Nearest Neighbors (ANN)
- Hybrid search (combine with text/filters)
- Batch vector operations
- Vector quantization
- Dynamic index rebuilding
- Multi-vector queries
-
Vector Search Optimizations
- SIMD acceleration
- Parallel search
- Memory-mapped vectors
- Vector compression
- Incremental index updates
- Cache-friendly layouts
-
Multiple analyzer support
-
Custom token filters
-
Token position tracking
-
SIMD-accelerated processing
-
Phonetic matching
-
Configurable tokenization pipelines
-
CJK (Chinese, Japanese, Korean) Support
- Character-based tokenization
- N-gram tokenization
- Dictionary-based word segmentation
- Language-specific stop words
- Unicode normalization
- Ideograph handling
- Reading/pronunciation support
- Pinyin for Chinese
- Hiragana/Katakana for Japanese
- Hangul/Hanja for Korean
- Mixed script handling
- CJK-specific scoring adjustments
- Compound word processing
- Character variant normalization
-
Faceted Search
- Hierarchical facets
- Dynamic facet counting
- Custom facet ordering
- Multi-value facets
-
Enhanced Scoring & Ranking
- Configurable scoring algorithms
- Score explanation
- Custom boosting factors
- Field-weight customization
- Position-based scoring
-
Search Quality
- Highlighting with snippets
- Relevance tuning tools
- Search quality metrics
- UNIQUE constraints
- CHECK constraints
- NOT NULL constraints
- DEFAULT values
- Enhanced FOREIGN KEY support with ON DELETE/UPDATE actions
- Subqueries
- Common Table Expressions (CTEs)
- Views
- Stored procedures
- User-defined functions
- DATE and TIME types
- DECIMAL/NUMERIC types
- BOOLEAN type
- BLOB/BINARY types
- Array types
- JSON type
- User-defined types
- User authentication
- Role-based authorization
- Row-level security
- Column-level security
- Audit logging
- SSL/TLS support
- Replication using raft-rs
- Master-slave configuration
- Sharding support
- Distributed transactions
- Failover support
- Query result cache
- Index page cache
- Buffer pool management
- Connection pooling
- Query plan optimization
- Command-line interface
- Web-based admin interface
- Query visualization
- Performance monitoring dashboard
- Schema visualization
- Comprehensive documentation
This project is licensed under the MIT License. See LICENSE for more information.