Skip to content

vijayee/WaveDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

745 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WaveDB Logo

WaveDB

A hierarchical key-value database with MVCC concurrency, WAL durability, and schema layer access patterns. Inspired by MUMPS multi-dimensional subscripts and ForestDB's copy-on-write HB+trie architecture.

Features

  • Hierarchical key paths — Store and retrieve values at paths like users/alice/name
  • MVCC — Lock-free concurrent reads with snapshot isolation
  • WAL durability — Thread-local write-ahead log with configurable sync modes
  • Schema Layers — GraphQL and Graph access today, SQL roadmap
  • Graph Layer — Triple-store with Gremlin-style traversal, schema-driven indexing, and cost-based query optimization
  • Language bindings — C, Node.js, Dart
  • High throughput — Sharded LRU cache with memory pooling, inline key comparison, MVCC fast-path

Quick Start

#include "Database/database.h"
#include "Database/database_config.h"

int main() {
    int error = 0;

    // Create with default config (in-memory, ASYNC WAL)
    database_config_t* config = database_config_default();
    database_t* db = database_create_with_config("/tmp/mydb", config, &error);
    database_config_destroy(config);

    if (!db) { fprintf(stderr, "Error: %d\n", error); return 1; }

    // Write
    path_t* key = path_create_from_string("/", "users/alice/name", 0);
    identifier_t* val = identifier_create_from_string("Alice", 0);
    database_put_sync(db, key, val);
    path_destroy(key);
    identifier_destroy(val);

    // Read
    path_t* key2 = path_create_from_string("/", "users/alice/name", 0);
    identifier_t* result = NULL;
    int rc = database_get_sync(db, key2, &result);
    if (rc == 0 && result) {
        // use result...
        identifier_destroy(result);
    }
    path_destroy(key2);

    // Batch
    batch_t* batch = batch_create(0);
    path_t* p1 = path_create_from_string("/", "users/bob/age", 0);
    identifier_t* v1 = identifier_create_from_string("30", 0);
    batch_add_put(batch, p1, v1);  // ownership transfers on success
    database_write_batch_sync(db, batch);
    batch_destroy(batch);

    // Iterate
    path_t* start = path_create_from_string("/", "users/alice", 0);
    path_t* end = path_create_from_string("/", "users/alice/~", 0);
    database_iterator_t* it = database_scan_start(db, start, end);
    path_t* out_path = NULL;
    identifier_t* out_val = NULL;
    while (database_scan_next(it, &out_path, &out_val) == 0) {
        // process out_path, out_val
        path_destroy(out_path);
        identifier_destroy(out_val);
    }
    database_scan_end(it);
    path_destroy(start);
    path_destroy(end);

    database_destroy(db);
    return 0;
}

Architecture

WaveDB stores values at hierarchical key paths using an HBTrie — a B+tree where each level is itself a B+tree. A path like ["users", "alice", "name"] traverses three levels of B+trees, one per path component.

                     root bnode
                    /    |    \
              "users"   "posts"   ...
              /  |  \      |
           bnode  ...  bnode  ...
           /  \          |
     "alice" "bob"   "title"
        |       |        |
      bnode   bnode    value
      / | \     |
   "name" "age" ...
     |      |
   value  value

Concurrency: MVCC (Multi-Version Concurrency Control) provides snapshot isolation — readers never block writers and never see partial updates. Each write creates a versioned entry; reads see the latest committed version at their transaction ID.

Durability: Thread-local WAL files eliminate write contention. Three sync modes trade durability for throughput: IMMEDIATE (fsync every write), DEBOUNCED (batched fsync every 250ms, recommended), ASYNC (buffer flushed to kernel on 250ms idle timer, survives process crash but not power failure).

Schema Layers: Access the same data through different query paradigms. The GraphQL layer maps type definitions to hierarchical paths and resolves queries with scan plans. The Graph layer provides a triple-store with Gremlin-style traversal and cost-based optimization. Multiple layers can coexist on the same database using subtrees for namespace isolation.

Configuration

database_config_t* config = database_config_default();

// Mutable — can change when reopening
config->lru_memory_mb = 100;       // LRU cache size (default: 50 MB)
config->lru_shards = 64;           // LRU shards (default: 64, 0 = auto-scale)
config->wal_config.sync_mode = WAL_SYNC_DEBOUNCED;  // WAL mode
config->wal_config.debounce_ms = 250;                // fsync window (default: 250ms)
config->worker_threads = 4;        // Worker pool size (default: 4)

// Immutable — set at creation, persisted
config->chunk_size = 4;            // HBTrie chunk size (default: 4)
config->btree_node_size = 4096;    // B+tree node size (default: 4096)
config->enable_persist = 1;        // 0 = in-memory, 1 = persistent (default: 1)

database_t* db = database_create_with_config("/path/to/db", config, &error);
database_config_destroy(config);

Config is persisted as CBOR at <db_path>/.config and automatically loaded on reopen. database_config_merge reconciles saved and passed configs — immutable settings always use the on-disk values.

WAL Sync Modes

Mode Behavior Durability Performance
WAL_SYNC_IMMEDIATE fsync after every write Highest ~1K ops/sec
WAL_SYNC_DEBOUNCED batched fsync (default 250ms) High ~300K ops/sec
WAL_SYNC_ASYNC buffered write, idle drain every 250ms Process crash only ~400K ops/sec

Database Operations

Operation Async (promise) Sync
Put database_put(db, path, val, promise) database_put_sync(db, path, val)
Get database_get(db, path, promise) database_get_sync(db, path, &result)
Delete database_delete(db, path, promise) database_delete_sync(db, path)
Batch database_write_batch(db, batch, promise) database_write_batch_sync(db, batch)
Increment database_increment_sync(db, path, delta)
Snapshot database_snapshot(db)

Encryption

WaveDB supports AES-256-GCM encryption at rest for both WAL files and persisted pages. Two key modes are available:

Mode Key Use case
ENCRYPTION_SYMMETRIC 32-byte user-supplied key Simple setups, embedded devices
ENCRYPTION_ASYMMETRIC RSA key pair (DER-encoded) Key rotation, write-only nodes

Encryption is set at database creation and cannot be changed on an existing database. Key material is never persisted — only a salt and verification check are stored, allowing key verification on re-open.

Symmetric Encryption

#include "Database/database.h"
#include "Database/database_config.h"
#include "Storage/encryption.h"

uint8_t key[32] = { /* your 32-byte AES-256 key */ };

encrypted_database_config_t* config = encrypted_database_config_default();
encrypted_database_config_set_type(config, ENCRYPTION_SYMMETRIC);
encrypted_database_config_set_symmetric_key(config, key, sizeof(key));
database_config_set_enable_persist(&config->config, 1);

int error = 0;
database_t* db = database_create_encrypted("/path/to/db", config, &error);
if (db == NULL) {
    // error == DATABASE_ERR_ENCRYPTION_KEY_INVALID if key is wrong
    // error == DATABASE_ERR_ENCRYPTION_REQUIRED if DB exists but is unencrypted
    fprintf(stderr, "Error: %d\n", error);
    return 1;
}
encrypted_database_config_destroy(config);

// Use normally — encryption is transparent
database_put_sync_raw(db, "users/alice/name", 16, '\037',
                      (const uint8_t*)"Alice", 5);

// Reopen with the same key
encrypted_database_config_t* config2 = encrypted_database_config_default();
encrypted_database_config_set_type(config2, ENCRYPTION_SYMMETRIC);
encrypted_database_config_set_symmetric_key(config2, key, sizeof(key));
database_config_set_enable_persist(&config2->config, 1);
database_t* db2 = database_create_encrypted("/path/to/db", config2, &error);
encrypted_database_config_destroy(config2);

database_destroy(db);

Asymmetric Encryption

// Load DER-encoded keys (generated externally, e.g. OpenSSL)
uint8_t private_key_der[1217] = { /* DER RSA private key */ };
uint8_t public_key_der[294]  = { /* DER RSA public key */ };

encrypted_database_config_t* config = encrypted_database_config_default();
encrypted_database_config_set_type(config, ENCRYPTION_ASYMMETRIC);
encrypted_database_config_set_asymmetric_private_key(
    config, private_key_der, sizeof(private_key_der));
encrypted_database_config_set_asymmetric_public_key(
    config, public_key_der, sizeof(public_key_der));
database_config_set_enable_persist(&config->config, 1);

int error = 0;
database_t* db = database_create_encrypted("/path/to/db", config, &error);
encrypted_database_config_destroy(config);

In asymmetric mode, a random data encryption key (DEK) is generated per session and wrapped with the RSA public key. The private key is required for decryption; a write-only node can omit it.

Error Codes

Code Meaning
DATABASE_ERR_ENCRYPTION_REQUIRED Database is encrypted but no key was provided
DATABASE_ERR_ENCRYPTION_KEY_INVALID Wrong key or key verification failed
DATABASE_ERR_ENCRYPTION_UNSUPPORTED Cannot add encryption to an existing unencrypted DB

Schema Layers

GraphQL

The GraphQL layer maps SDL type definitions to hierarchical paths and resolves queries with scan plans.

#include "Layers/graphql/graphql.h"

graphql_layer_t* layer = graphql_layer_create("/path/to/db", NULL);

// Register schema
const char* sdl =
    "type User { name: String age: Int }"
    "type Query { user(id: ID!): User }";
graphql_schema_parse(layer, sdl, NULL);

// Query
graphql_result_t* result = graphql_query_sync(layer, "{ user(id: \"alice\") { name } }");
// result contains { "data": { "user": { "name": "Alice" } } }
graphql_result_destroy(result);

graphql_layer_destroy(layer);

Query plans are compiled from SDL + query and executed as scan, get, or batch-get operations against the database. Custom resolvers can be registered for computed fields.

Graph

The Graph layer provides a triple-store (subject-predicate-object) with Gremlin-inspired traversal, schema-driven automatic indexing, and cost-based query optimization.

#include "Layers/graph/graph.h"

graph_layer_t* layer = graph_layer_create("/path/to/db", NULL);

// Define schema with index hints
graph_schema_parse(layer,
    "type Clip @index(spo, pos) {"
    "  tagged_with: [Tag];"
    "  name: String @index(pos);"
    "}", NULL);

// Insert triples
graph_insert_sync(layer, "clip_abc", "tagged_with", "gaming");
graph_insert_sync(layer, "clip_abc", "name", "My Clip");
graph_insert_sync(layer, "clip_xyz", "tagged_with", "gaming");

// Query with Gremlin-style DSL
graph_result_t* r1 = graph_parse_execute(
    "g.V(\"clip_abc\").Out(\"tagged_with\")", layer, NULL);
// r1 → ["gaming"]

// Intersection
graph_result_t* r2 = graph_parse_execute(
    "g.V(\"gaming\").In(\"tagged_with\")"
    ".And(g.V(\"clip_abc\").Out(\"tagged_with\"))", layer, NULL);

// Range predicates
graph_result_t* r3 = graph_parse_execute(
    "g.Has(\"age\", >=, \"25\").Out(\"follows\")", layer, NULL);

// Morphisms (reusable query fragments)
graph_morphism_define(layer, "friends_content",
    "g.Morphism(\"friends_content\").Out(\"follows\").Out(\"likes\")", NULL);
graph_result_t* r4 = graph_parse_execute(
    "g.V(\"alice\").Follow(\"friends_content\")", layer, NULL);

graph_result_destroy(r1);
graph_result_destroy(r2);
graph_result_destroy(r3);
graph_result_destroy(r4);
graph_layer_destroy(layer);

The optimizer automatically reorders Has filters by selectivity and sorts Intersect children by estimated cardinality, using statistics computed from PSO index scans. See docs/graph-db-schema-layer.md for the full design.

Subtrees

Subtrees provide scoped views into a database with automatic prefix isolation. This enables multiple schema layers to coexist in different namespaces on the same database.

#include "Database/database_subtree.h"

// Open subtrees with different prefixes
database_subtree_t* gql_st = database_subtree_open(db, "graphql", '/');
database_subtree_t* graph_st = database_subtree_open(db, "graph", '/');

// Operations are scoped — keys are automatically prefixed
database_subtree_put_sync_raw(gql_st, "Users/1/name", 12, '/',
                              (const uint8_t*)"Alice", 5);
database_subtree_put_sync_raw(graph_st, "triples/s/p/o", 14, '/',
                              (const uint8_t*)"data", 4);

// Each subtree only sees its own data
size_t gql_count = database_subtree_count(gql_st);   // 1
size_t graph_count = database_subtree_count(graph_st); // 1

// Delete all data under a prefix
database_subtree_delete_prefix(db, "graphql", '/');

// Close subtrees (does not destroy the database)
database_subtree_close(gql_st);
database_subtree_close(graph_st);

Layer Coexistence

Create multiple layers on the same database by passing a subtree:

database_subtree_t* gql_sub = database_subtree_open(db, "layer/graphql", '/');
database_subtree_t* graph_sub = database_subtree_open(db, "layer/graph", '/');

int error_code = 0;
graphql_layer_t* gql = graphql_layer_create(db_path, gql_config, gql_sub, &error_code);
graph_layer_t* graph = graph_layer_create(db_path, graph_config, graph_sub, &error_code);

// Both layers operate independently on the same database

Without a subtree, layer creation fails with error_code = -3 if a different layer type already owns the database.

Language Bindings

Node.js

npm install wavedb
const { WaveDB } = require('wavedb');

const db = new WaveDB('/path/to/db', {
  delimiter: '/',
  lruMemoryMb: 50,
  wal: { syncMode: 'debounced' }
});

// Async (Promise)
await db.put('users/alice/name', 'Alice');
const name = await db.get('users/alice/name');

// Sync (blocking)
db.putSync('users/bob/name', 'Bob');
const name2 = db.getSync('users/bob/name');

// Callback style
db.putCb('users/charlie/name', 'Charlie', (err) => { /* ... */ });
db.getCb('users/charlie/name', (err, val) => { /* ... */ });

// Object operations (nested data → flattened paths)
await db.putObject('users/alice', { name: 'Alice', age: 30 });
const user = await db.getObject('users/alice'); // { name: 'Alice', age: 30 }

// Batch
await db.batch([
  { type: 'put', path: 'users/alice/name', value: 'Alice' },
  { type: 'del', path: 'users/bob/name' }
]);

// Iterator
const iter = new Iterator(db, { start: 'users/alice', end: 'users/alice~' });
let entry;
while ((entry = iter.read())) {
  console.log(entry.key, entry.value);
}
iter.end();

// Graph Layer
const { GraphLayer, g } = require('wavedb');
const graph = new GraphLayer();
graph.parseSchema('type Clip @index(spo, pos) { tagged_with: [Tag]; }');
graph.insertSync('clip_abc', 'tagged_with', 'gaming');
const tags = g.V('clip_abc').Out('tagged_with').All(); // ['gaming']
graph.close();

// Subtrees — namespace isolation for multiple layers on one database
const gqlSub = db.openSubtree('graphql');
const graphSub = db.openSubtree('graph');
gqlSub.putSync('Users/1/name', 'Alice');
graphSub.putSync('triples/s/p/o', 'data');
console.log(gqlSub.count());  // 1
console.log(graphSub.count()); // 1
gqlSub.close();
graphSub.close();

// Create layers with subtrees on the same database
const { GraphQLLayer } = require('wavedb');
const gqlLayer = new GraphQLLayer(dbPath, { subtree: gqlSub });
const graphLayer = new GraphLayer(dbPath, { subtree: graphSub });

See bindings/nodejs/README.md for full API reference.

Dart

# pubspec.yaml
dependencies:
  wavedb: ^0.1.0
import 'package:wavedb/wavedb.dart';

final db = WaveDB('/path/to/db', delimiter: '/');

// Sync
db.putSync('users/alice/name', 'Alice');
final name = db.getSync('users/alice/name');

// Async
await db.put('users/bob/name', 'Bob');
final name2 = await db.get('users/bob/name');

// Object operations
await db.putObject('users/alice', { 'name': 'Alice', 'age': 30 });
final user = await db.getObject('users/alice');

// Streaming
db.createReadStream(start: 'users/alice', end: 'users/alice~').listen((kv) {
  print('${kv.key}: ${kv.value}');
});

// GraphQL
final layer = GraphQLLayer();
layer.create(GraphQLLayerConfig(path: '/path/to/db'));
layer.parseSchema('type User { name: String } type Query { user(id: ID!): User }');
final result = layer.querySync('{ user(id: "alice") { name } }');

// Graph Layer
final graph = GraphLayer();
graph.parseSchema('type Clip @index(spo, pos) { tagged_with: [Tag]; }');
graph.insertSync('clip_abc', 'tagged_with', 'gaming');
final clips = g().V('clip_abc').Out('tagged_with').All(); // ['gaming']

// Subtrees — namespace isolation for multiple layers on one database
final gqlSub = db.openSubtree('graphql');
final graphSub = db.openSubtree('graph');
gqlSub.putSync('Users/1/name', 'Alice');
graphSub.putSync('triples/s/p/o', 'data');
print(gqlSub.count());  // 1
print(graphSub.count()); // 1
gqlSub.close();
graphSub.close();

// Create layers with subtrees on the same database
final gqlLayer = GraphQLLayer()..create(GraphQLLayerConfig(path: dbPath, subtree: gqlSub));
final graphLayer = GraphLayer(subtree: graphSub);

db.close();

See bindings/dart/README.md for full API reference.

Performance

Benchmarks on AMD Ryzen 9700X (8-core Zen 5), Windows 11, 50MB LRU cache, ASYNC WAL mode, GCC 15.1 -O3. 5-run averages, CV <6% unless noted.

Sync Operations (single-threaded)

Operation Throughput P50 Latency P99 Latency
Get 2.61M ops/sec 474 ns 490 ns
Put 498K ops/sec 2.02 µs 4.80 µs
Delete 183K ops/sec 3.52 µs 7.33 µs
Mixed (70% read) 2.94M ops/sec 459 ns 490 ns

Concurrent Throughput (sync API, 4 worker threads)

Threads Write Read Mixed (70R/20W/10D)
1 275K ops/sec 1.48M ops/sec 312K ops/sec
2 484K ops/sec 2.48M ops/sec 535K ops/sec
4 647K ops/sec 4.59M ops/sec 709K ops/sec
8 811K ops/sec 4.47M ops/sec 873K ops/sec
16 949K ops/sec 2.85M ops/sec 982K ops/sec
32 1.02M ops/sec 2.88M ops/sec 1.06M ops/sec

Building

git clone --recursive https://github.com/vijayee/WaveDB.git
cd WaveDB
mkdir build && cd build
cmake ..
make -j$(nproc)

# Run tests
ctest --output-on-failure

# Build shared library for Dart/FFI bindings
cmake -DBUILD_DART_BINDINGS=ON ..
make -j$(nproc)

Installation

cmake -DCMAKE_INSTALL_PREFIX=/usr/local ..
make
sudo make install

Installs headers, static library (libwavedb.a), and CMake config.

Using as a CMake Dependency

# After install
find_package(WaveDB REQUIRED)
target_link_libraries(myapp WaveDB::wavedb)

# Or embedded
add_subdirectory(path/to/WaveDB)
target_link_libraries(myapp WaveDB::wavedb)

Dependencies

Bundled — no external dependencies required:

  • xxhash — fast hashing
  • hashmap — hash map implementation
  • libcbor — CBOR serialization
  • OpenSSL — AES-256-GCM encryption (required for encrypted databases)

Roadmap

  • P2P Database Replication — Distributed multi-master replication with conflict resolution
  • SQL Schema Layer — SQL query interface over hierarchical data

License

MIT

About

A Hierarchical Key Value Store

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors