Skip to content

poyrazK/cloudSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

324 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cloudSQL

A lightweight, distributed SQL database engine. Designed for cloud environments with a focus on simplicity, type safety, and PostgreSQL compatibility. cloudSQL bridges the gap between single-node databases and complex distributed systems by providing horizontal scaling with a familiar interface.

Key Features

  • Modern C++ Architecture: High-performance, object-oriented codebase using C++17.
  • Distributed Consensus (Raft): Global metadata and catalog consistency powered by a custom Raft implementation.
  • Horizontal Sharding: Hash-based data partitioning across multiple Data Nodes.
  • Distributed Query Optimization:
    • Shard Pruning: Intelligent routing to avoid cluster-wide broadcasts.
    • Aggregation Merging: Global coordination for COUNT, SUM, and other aggregates.
    • Broadcast Joins: Optimized cross-shard joins for small-to-large table scenarios.
  • Multi-Node Transactions: ACID guarantees across the cluster via Two-Phase Commit (2PC).
  • Type-Safe Value System: Robust handling of SQL data types using std::variant.
  • Volcano Execution Engine: Iterator-based execution supporting sequential scans, index scans, filtering, projection, hash joins, sorting, and aggregation.
  • PostgreSQL Wire Protocol: Handshake and simple query protocol implementation for tool compatibility.

Project Structure

  • include/: Header files defining the core engine and distributed API.
  • src/: implementations modules.
    • catalog/: Metadata and schema management.
    • distributed/: Raft consensus, shard management, and distributed execution.
    • executor/: Volcano operators and local query coordination.
    • network/: PostgreSQL server and internal cluster RPC.
    • parser/: Lexical analysis and SQL parsing.
    • storage/: Paged storage, heap files, and B+ tree indexes.
  • docs/: Technical documentation and Phase-by-Phase Roadmap.
  • tests/: Comprehensive test suite including simulation-based Raft tests and distributed scenarios.

Building and Running

Prerequisites

  • CMake (>= 3.16)
  • C++17 compatible compiler (Clang or GCC)

Build Instructions

mkdir build
cd build
cmake ..
make -j$(nproc)

Running Tests

# Run all tests
./build/sqlEngine_tests
# Run distributed-specific tests
./build/distributed_tests
./build/distributed_txn_tests

Starting the Cluster

Start a Coordinator:

./build/sqlEngine --mode coordinator --port 5432 --cluster-port 6432 --data ./coord_data

Start a Data Node:

./build/sqlEngine --mode data --cluster-port 6433 --data ./data_node_1

Core Components

1. Raft Consensus

Ensures that all Coordinator nodes share an identical view of the database schema and shard mappings. DDL operations are replicated and committed via the Raft log before being applied to the local catalog.

2. Distributed Executor

Orchestrates query fragments across the cluster. It performs plan splitting, dispatches sub-queries to relevant Data Nodes, and merges partial results (e.g., summing partial counts) before returning the final set to the client.

3. Storage Layer

Data is persisted in fixed-size pages (default 4KB) using a slot-based layout. The StorageManager coordinates access, while the BufferPoolManager provides an LRU-K caching layer to minimize disk I/O.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages