A lightweight, distributed SQL database engine. Designed for cloud environments with a focus on simplicity, type safety, and PostgreSQL compatibility. cloudSQL bridges the gap between single-node databases and complex distributed systems by providing horizontal scaling with a familiar interface.
- Modern C++ Architecture: High-performance, object-oriented codebase using C++17.
- Distributed Consensus (Raft): Global metadata and catalog consistency powered by a custom Raft implementation.
- Horizontal Sharding: Hash-based data partitioning across multiple Data Nodes.
- Distributed Query Optimization:
- Shard Pruning: Intelligent routing to avoid cluster-wide broadcasts.
- Aggregation Merging: Global coordination for
COUNT,SUM, and other aggregates. - Broadcast Joins: Optimized cross-shard joins for small-to-large table scenarios.
- Multi-Node Transactions: ACID guarantees across the cluster via Two-Phase Commit (2PC).
- Type-Safe Value System: Robust handling of SQL data types using
std::variant. - Volcano Execution Engine: Iterator-based execution supporting sequential scans, index scans, filtering, projection, hash joins, sorting, and aggregation.
- PostgreSQL Wire Protocol: Handshake and simple query protocol implementation for tool compatibility.
include/: Header files defining the core engine and distributed API.src/: implementations modules.catalog/: Metadata and schema management.distributed/: Raft consensus, shard management, and distributed execution.executor/: Volcano operators and local query coordination.network/: PostgreSQL server and internal cluster RPC.parser/: Lexical analysis and SQL parsing.storage/: Paged storage, heap files, and B+ tree indexes.
docs/: Technical documentation and Phase-by-Phase Roadmap.tests/: Comprehensive test suite including simulation-based Raft tests and distributed scenarios.
- CMake (>= 3.16)
- C++17 compatible compiler (Clang or GCC)
mkdir build
cd build
cmake ..
make -j$(nproc)# Run all tests
./build/sqlEngine_tests
# Run distributed-specific tests
./build/distributed_tests
./build/distributed_txn_testsStart a Coordinator:
./build/sqlEngine --mode coordinator --port 5432 --cluster-port 6432 --data ./coord_dataStart a Data Node:
./build/sqlEngine --mode data --cluster-port 6433 --data ./data_node_1Ensures that all Coordinator nodes share an identical view of the database schema and shard mappings. DDL operations are replicated and committed via the Raft log before being applied to the local catalog.
Orchestrates query fragments across the cluster. It performs plan splitting, dispatches sub-queries to relevant Data Nodes, and merges partial results (e.g., summing partial counts) before returning the final set to the client.
Data is persisted in fixed-size pages (default 4KB) using a slot-based layout. The StorageManager coordinates access, while the BufferPoolManager provides an LRU-K caching layer to minimize disk I/O.
MIT