Skip to content

feat: Implement LEFT JOIN and fix NULL handling in PostgreSQL protocol#12

Merged
poyrazK merged 15 commits intomainfrom
feature/fix-joins-and-nulls
Mar 9, 2026
Merged

feat: Implement LEFT JOIN and fix NULL handling in PostgreSQL protocol#12
poyrazK merged 15 commits intomainfrom
feature/fix-joins-and-nulls

Conversation

@poyrazK
Copy link
Owner

@poyrazK poyrazK commented Mar 9, 2026

This PR introduces several improvements to the execution engine and network layer:

  1. Join Support: Implemented LEFT JOIN logic in HashJoinOperator, ensuring non-matching left tuples are preserved with NULL values for right-side columns.
  2. NULL Handling: Fixed the PostgreSQL wire protocol implementation to correctly transmit NULL values (length -1).
  3. Qualified Column Resolution: Improved ColumnExpr to correctly resolve columns qualified with table names.
  4. Test Automation: Integrated SLT (SqlLogicTest) suite into run_test.sh and added initial test cases for joins, aggregates, and expressions.
  5. Core Types: Added JoinType enum to unify join representation across parser and executor.

Summary by CodeRabbit

New Features

  • Added support for LEFT, RIGHT, and FULL join operations alongside INNER joins; output columns are appropriately nullified based on join type

Bug Fixes

  • Improved column resolution to properly handle table-qualified column names in queries
  • Enhanced NULL value handling in protocol message encoding

Tests

  • Added comprehensive test coverage for joins, aggregates, expressions, and fundamental SQL operations
  • Implemented automated SLT test runner for SQL logic verification

@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

This pull request extends HashJoinOperator to support LEFT, RIGHT, and FULL outer joins alongside INNER joins. It introduces a JoinType enum, modifies the operator's internal structure to track tuple match states, adjusts output schema nullability based on join type, enhances column resolution logic, and updates NULL encoding in network protocol. Comprehensive test coverage is added including new SLT test suite and runner framework.

Changes

Cohort / File(s) Summary
Join Type Support
include/executor/types.hpp, include/executor/operator.hpp, src/executor/operator.cpp, src/executor/query_executor.cpp
Added JoinType enum (Inner, Left, Right, Full) and extended HashJoinOperator with join_type parameter. BuildTuple struct wraps right tuples with matched flag. Constructor updated to accept join_type, schema columns become nullable based on join type semantics, and operator now emits NULL-extended tuples for unmatched rows in outer joins.
Column Resolution Enhancement
src/parser/expression.cpp
ColumnExpr evaluation now performs two-step column index lookup: first attempts fully-qualified name via to_string(), then falls back to unqualified name if table-qualified reference exists. Aligns column resolution with aliased table references.
Network Protocol
src/network/server.cpp
Changed NULL value encoding from zero-length marker to 0xFFFFFFFF (4-byte) placeholder. Non-NULL values now include 4-byte length prefix before data payload.
Test Framework
tests/logic/slt_runner.py, tests/run_test.sh
Introduced new CloudSQLClient class and SLT (SQL Logic Test) runner in Python. Test script now supports absolute path resolution, dynamic CPU-based parallelism, and conditional SLT test execution.
Test Coverage
tests/logic/aggregates.slt, tests/logic/basic.slt, tests/logic/expressions.slt, tests/logic/joins.slt, tests/cloudSQL_tests.cpp
Added comprehensive SLT test suites covering aggregates, basic SQL operations, expressions, and join scenarios. Updated existing test to use explicit table aliases.

Sequence Diagram

sequenceDiagram
    participant QE as Query Executor
    participant HOp as HashJoinOperator
    participant Left as Left Operator
    participant Right as Right Operator
    participant Schema as Output Schema

    QE->>HOp: Constructor(left, right, left_key, right_key, JoinType::Left)
    HOp->>Schema: Adjust nullability (right columns nullable)
    
    QE->>HOp: Open()
    HOp->>Left: Open()
    HOp->>Right: Open()
    HOp->>HOp: Build hash_table with BuildTuple{tuple, matched=false}
    
    QE->>HOp: Next()
    HOp->>Left: Next() - get left_tuple
    HOp->>HOp: Lookup in hash_table using left_key
    
    alt Match found
        HOp->>HOp: Iterate matches, emit joined tuples
        HOp->>HOp: Mark BuildTuple.matched = true
    else No match (LEFT join)
        HOp->>HOp: Emit NULL-extended tuple (right side NULL)
    end
    
    HOp->>HOp: Check right_idx_iter for unmatched right tuples
    alt Unmatched right exists (LEFT join doesn't emit)
        HOp->>HOp: Continue to next phase
    end
    
    QE->>HOp: Close()
    HOp->>Left: Close()
    HOp->>Right: Close()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement

Poem

🐰 In hash tables deep, where tuples dance,
Left and right joins now get their chance,
With BuildTuple flags and nulls so bright,
Our outer joins bloom—plain sight!
Tests multiply, SLT runners stride,
Cloud SQL grows with humble pride! 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the primary changes: implementing LEFT JOIN support and fixing NULL handling in PostgreSQL protocol, which are the main objectives reflected across the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/fix-joins-and-nulls

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
src/network/server.cpp (1)

418-423: Reserve col_vals up front.

num_cols is already known here, so reserving once avoids repeated reallocations and string moves while building each row.

♻️ Small cleanup
                                 struct ColValue {
                                     bool is_null;
                                     std::string val;
                                 };
                                 std::vector<ColValue> col_vals;
+                                col_vals.reserve(num_cols);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/network/server.cpp` around lines 418 - 423, The vector col_vals is
constructed without reserved capacity even though num_cols is available; to
avoid repeated reallocations and string moves, call col_vals.reserve(num_cols)
immediately after declaring col_vals (before populating it) so the ColValue
entries can be emplaced/pushed without growing the buffer repeatedly.
tests/logic/slt_runner.py (1)

125-129: Don't silently ignore the SLT sort modifier.

Line 129 parses the optional [sort] token and then drops it. rowsort / valuesort queries will still be compared in wire order, which makes the runner incompatible with valid SLT files and can produce false failures.

Do you want me to sketch a minimal rowsort / valuesort implementation, or at least a fail-fast check for unsupported modifiers?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/logic/slt_runner.py` around lines 125 - 129, The code currently drops
the optional SLT sort modifier after parsing types; update the parsing in the
query handling (where line, parts, and types are used) to detect a third token
and handle 'rowsort' and 'valuesort' (e.g., set a rowsort/valuesort flag or
mode) instead of ignoring it; then propagate that flag into the comparison logic
that verifies query results so results are either sorted appropriately before
comparison (rows sorted by entire row, values sorted per-row) or, if you prefer
a simpler change, raise a clear error for unsupported modifiers when the third
token is present so the runner fails fast for 'rowsort'/'valuesort'.
src/executor/operator.cpp (1)

581-588: Consider extending nullable marking for Right and Full joins.

Currently, only LEFT joins mark right-side columns as nullable. For completeness, RIGHT joins should mark left-side columns as nullable, and FULL joins should mark both sides as nullable. This is relevant if/when Right and Full joins are implemented.

♻️ Proposed schema construction for all join types
     if (left_ && right_) {
         for (const auto& col : left_->output_schema().columns()) {
-            schema_.add_column(col);
+            auto col_meta = col;
+            if (join_type_ == executor::JoinType::Right || join_type_ == executor::JoinType::Full) {
+                col_meta.set_nullable(true);
+            }
+            schema_.add_column(col_meta);
         }
         for (const auto& col : right_->output_schema().columns()) {
             /* Left Join results in nullable columns for the right side */
             auto col_meta = col;
-            if (join_type_ == executor::JoinType::Left) {
+            if (join_type_ == executor::JoinType::Left || join_type_ == executor::JoinType::Full) {
                 col_meta.set_nullable(true);
             }
             schema_.add_column(col_meta);
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/executor/operator.cpp` around lines 581 - 588, The schema construction
only marks right-side columns nullable for JoinType::Left; update the logic so
JoinType::Right marks left-side columns nullable and JoinType::Full marks both
sides nullable: when iterating right_->output_schema().columns() set
col_meta.set_nullable(true) if join_type_ == executor::JoinType::Left ||
join_type_ == executor::JoinType::Full, and when iterating
left_->output_schema().columns() set col_meta.set_nullable(true) if join_type_
== executor::JoinType::Right || join_type_ == executor::JoinType::Full; keep
using schema_.add_column(...) and the existing col/col_meta variables and
join_type_ checks to implement this.
tests/run_test.sh (1)

22-22: Consider cross-platform compatibility for CPU count detection.

sysctl -n hw.ncpu is macOS-specific. On Linux, nproc is the equivalent command. If CI runs on Linux, this will fail.

♻️ Cross-platform suggestion
-make -j"$(sysctl -n hw.ncpu)"
+if command -v nproc &> /dev/null; then
+    make -j"$(nproc)"
+else
+    make -j"$(sysctl -n hw.ncpu)"
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/run_test.sh` at line 22, The current line using "make -j$(sysctl -n
hw.ncpu)" is macOS-specific; replace it with a cross-platform CPU detection
pattern: determine CPU_COUNT by trying "nproc", then "sysctl -n hw.ncpu", then
"getconf _NPROCESSORS_ONLN", falling back to 1 if none succeed, and use "make
-j$CPU_COUNT" instead; update tests/run_test.sh to set that CPU_COUNT variable
before invoking make so the script works on both Linux and macOS CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/executor/query_executor.cpp`:
- Around line 651-663: The join mapping in query_executor.cpp correctly produces
JoinType::Right and JoinType::Full, but HashJoinOperator::next() (in class
HashJoinOperator) only implements Left and Inner semantics; update
HashJoinOperator::next() to explicitly handle JoinType::Right and JoinType::Full
by tracking matched build/probe rows (use the existing hash table entries or
matched flags), emitting unmatched right-side tuples padded with NULLs on the
left for Right joins, and for Full joins emit both unmatched left-side tuples
padded with NULLs on the right and unmatched right-side tuples padded with NULLs
on the left (preserving existing matched emission for inner matches). Ensure you
use the same internal data structures and matching flags used for Left joins so
behavior is consistent with memory/accounting already present in
HashJoinOperator.

In `@src/parser/expression.cpp`:
- Around line 237-245: The vectorized lookup in
ColumnExpr::evaluate_vectorized() still only calls schema->find_column(name_)
and must be updated to mirror the scalar fix: first attempt
schema->find_column(this->to_string()) for an exact/qualified match, and only if
that returns not-found (static_cast<size_t>(-1)) and has_table() is true, fall
back to schema->find_column(name_); update the lookup logic in
evaluate_vectorized() to use the same two-step resolution (to_string() then
name_) referencing ColumnExpr::has_table(), name_, and to_string().

In `@tests/cloudSQL_tests.cpp`:
- Around line 786-788: The test's SQL string passed to the Lexer uses undeclared
aliases `a` and `b`, so update the literal in the Lexer instantiation (the call
creating variable `lexer` via std::make_unique<Lexer>) to use real table
qualifiers `t1` and `t2` instead of `a`/`b` (e.g., replace occurrences of `a.id`
with `t1.id` and `b.val`/`b.x` with `t2.val`/`t2.x`) so the test actually
exercises the table-qualified column path logic added in this PR.

In `@tests/run_test.sh`:
- Line 22: The make invocation uses unquoted command substitution: make
-j$(sysctl -n hw.ncpu); update the command to quote the substitution to prevent
word-splitting by changing the invocation that contains "$(sysctl -n hw.ncpu)"
(or assign the output to a variable like NC and use "$NC") so the -j argument is
always a single token; locate the line with make -j$(sysctl -n hw.ncpu) in
tests/run_test.sh and replace it with a quoted substitution.

---

Nitpick comments:
In `@src/executor/operator.cpp`:
- Around line 581-588: The schema construction only marks right-side columns
nullable for JoinType::Left; update the logic so JoinType::Right marks left-side
columns nullable and JoinType::Full marks both sides nullable: when iterating
right_->output_schema().columns() set col_meta.set_nullable(true) if join_type_
== executor::JoinType::Left || join_type_ == executor::JoinType::Full, and when
iterating left_->output_schema().columns() set col_meta.set_nullable(true) if
join_type_ == executor::JoinType::Right || join_type_ ==
executor::JoinType::Full; keep using schema_.add_column(...) and the existing
col/col_meta variables and join_type_ checks to implement this.

In `@src/network/server.cpp`:
- Around line 418-423: The vector col_vals is constructed without reserved
capacity even though num_cols is available; to avoid repeated reallocations and
string moves, call col_vals.reserve(num_cols) immediately after declaring
col_vals (before populating it) so the ColValue entries can be emplaced/pushed
without growing the buffer repeatedly.

In `@tests/logic/slt_runner.py`:
- Around line 125-129: The code currently drops the optional SLT sort modifier
after parsing types; update the parsing in the query handling (where line,
parts, and types are used) to detect a third token and handle 'rowsort' and
'valuesort' (e.g., set a rowsort/valuesort flag or mode) instead of ignoring it;
then propagate that flag into the comparison logic that verifies query results
so results are either sorted appropriately before comparison (rows sorted by
entire row, values sorted per-row) or, if you prefer a simpler change, raise a
clear error for unsupported modifiers when the third token is present so the
runner fails fast for 'rowsort'/'valuesort'.

In `@tests/run_test.sh`:
- Line 22: The current line using "make -j$(sysctl -n hw.ncpu)" is
macOS-specific; replace it with a cross-platform CPU detection pattern:
determine CPU_COUNT by trying "nproc", then "sysctl -n hw.ncpu", then "getconf
_NPROCESSORS_ONLN", falling back to 1 if none succeed, and use "make
-j$CPU_COUNT" instead; update tests/run_test.sh to set that CPU_COUNT variable
before invoking make so the script works on both Linux and macOS CI.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 763e14a2-069e-4057-a7f0-560ed3cf9a88

📥 Commits

Reviewing files that changed from the base of the PR and between 0f756c3 and eb4b5cc.

📒 Files selected for processing (15)
  • include/executor/operator.hpp
  • include/executor/types.hpp
  • src/executor/operator.cpp
  • src/executor/query_executor.cpp
  • src/network/server.cpp
  • src/parser/expression.cpp
  • src/parser/parser.cpp
  • tests/cloudSQL_tests.cpp
  • tests/logic/aggregates.slt
  • tests/logic/basic.slt
  • tests/logic/expressions.slt
  • tests/logic/joins.slt
  • tests/logic/slt_runner.py
  • tests/raft_simulation_tests.cpp
  • tests/run_test.sh

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/executor/operator.cpp`:
- Around line 609-610: The code currently calls right_key_->evaluate(...) and
then uses key.to_string() to build the hash_table_ entry, which treats SQL NULL
like an ordinary string and causes NULL=NULL matches; change both the build side
(where hash_table_.emplace(...) is called with key.to_string()) and the
probe/lookup side (the symmetric block around the 661-675 region) to detect SQL
NULL via the common::Value null-check (e.g., key.is_null() or equivalent) and
handle it specially: do not convert NULL to a string or insert it into the
normal string-keyed hash (skip inserting NULL keys or store them using a
distinct null marker/optional type), and ensure lookups treat NULL keys as
non-matching so NULLs never match each other or collide with the literal "NULL"
string. Ensure you update both BuildTuple insertion sites and the corresponding
lookup logic to use the same null-handling convention.

In `@tests/logic/slt_runner.py`:
- Around line 155-163: The rowsort branch fails when comparing SQL NULLs (None)
with string "NULL" because actual_rows contains None while expected_rows
contains the string token; modify the rowsort handling in the block that checks
sort_mode == 'rowsort' to normalize NULLs the same way valuesort does—convert
None to "NULL" (or stringify values consistently) for both actual_rows and
expected_rows before sorting so actual_rows.sort() and expected_rows.sort()
won't raise TypeError; update the code paths around sort_mode, actual_rows, and
expected_rows to perform this normalization prior to sorting.

In `@tests/run_test.sh`:
- Around line 37-39: The build step does not check the result of running make,
so a failed build can be hidden and the script will still launch the old
./cloudSQL binary; update the test script to guard the build (the make
-j"$CPU_COUNT" invocation) by failing the script when make returns non-zero
(e.g., exit on error or explicitly check the exit status) before proceeding to
cd "$BUILD_DIR" || exit 1 and launching ./cloudSQL -p 5438 -d "$TEST_DATA_DIR"
&, ensuring a failed build prevents starting the server.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d257f190-f3c7-461c-9688-9751eba18675

📥 Commits

Reviewing files that changed from the base of the PR and between eb4b5cc and a49f061.

📒 Files selected for processing (7)
  • include/executor/operator.hpp
  • src/executor/operator.cpp
  • src/network/server.cpp
  • src/parser/expression.cpp
  • tests/cloudSQL_tests.cpp
  • tests/logic/slt_runner.py
  • tests/run_test.sh
✅ Files skipped from review due to trivial changes (1)
  • tests/cloudSQL_tests.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/parser/expression.cpp

Comment on lines 609 to +610
const common::Value key = right_key_->evaluate(&right_tuple, &right_schema);
hash_table_.emplace(key.to_string(), std::move(right_tuple));
hash_table_.emplace(key.to_string(), BuildTuple{std::move(right_tuple), false});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Do not hash SQL NULL join keys as ordinary values.

Lines 609-610 and 661-665 route join keys through to_string(), so a NULL key becomes just another hash entry. That makes NULL = NULL match here and can even collide with a real text key like "NULL", which breaks join semantics and prevents the unmatched-row paths from firing correctly in LEFT/RIGHT/FULL joins.

Also applies to: 661-675

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/executor/operator.cpp` around lines 609 - 610, The code currently calls
right_key_->evaluate(...) and then uses key.to_string() to build the hash_table_
entry, which treats SQL NULL like an ordinary string and causes NULL=NULL
matches; change both the build side (where hash_table_.emplace(...) is called
with key.to_string()) and the probe/lookup side (the symmetric block around the
661-675 region) to detect SQL NULL via the common::Value null-check (e.g.,
key.is_null() or equivalent) and handle it specially: do not convert NULL to a
string or insert it into the normal string-keyed hash (skip inserting NULL keys
or store them using a distinct null marker/optional type), and ensure lookups
treat NULL keys as non-matching so NULLs never match each other or collide with
the literal "NULL" string. Ensure you update both BuildTuple insertion sites and
the corresponding lookup logic to use the same null-handling convention.

Comment on lines +155 to +163
# Apply sort mode
if sort_mode == 'rowsort':
actual_rows.sort()
expected_rows.sort()
elif sort_mode == 'valuesort':
actual_values = sorted([str(val) if val is not None else "NULL" for row in actual_rows for val in row])
expected_values = sorted([val for row in expected_rows for val in row])
actual_rows = [[v] for v in actual_values]
expected_rows = [[v] for v in expected_values]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Show the NULL decode path and the raw rowsort branch.
rg -n -C2 'row_data.append\(None\)|rowsort|actual_rows.sort|expected_rows.sort' tests/logic/slt_runner.py

python3 - <<'PY'
rows = [[None], ["a"]]
try:
    rows.sort()
    print("sorted:", rows)
except Exception as exc:
    print(type(exc).__name__, exc)
PY

Repository: poyrazK/cloudSQL

Length of output: 695


🏁 Script executed:

cd tests/logic && cat -n slt_runner.py | sed -n '140,170p'

Repository: poyrazK/cloudSQL

Length of output: 1619


🏁 Script executed:

# Search for test files that use rowsort and might test NULL handling
fd -e 'sql' tests/ | head -20 | xargs grep -l 'rowsort' | head -5

Repository: poyrazK/cloudSQL

Length of output: 389


🏁 Script executed:

# Check the expected_rows data structure and how it's populated
rg -n -B5 'expected_rows.*append|expected_rows.*=' tests/logic/slt_runner.py | head -30

Repository: poyrazK/cloudSQL

Length of output: 832


🏁 Script executed:

find tests -type f -name '*.sql' | head -10

Repository: poyrazK/cloudSQL

Length of output: 42


🏁 Script executed:

# Search for test files that contain both rowsort and NULL patterns
find tests -type f \( -name '*.sql' -o -name '*.txt' \) -exec grep -l 'rowsort' {} \; | head -5

Repository: poyrazK/cloudSQL

Length of output: 42


🏁 Script executed:

# Check the structure of test data to see if rowsort tests include NULL
if [ -d tests/data ] || [ -d tests/sql ]; then 
  find tests -type f | head -20
fi

Repository: poyrazK/cloudSQL

Length of output: 42


🏁 Script executed:

cat -n tests/logic/slt_runner.py | sed -n '1,50p'

Repository: poyrazK/cloudSQL

Length of output: 1867


🏁 Script executed:

# Find where client.query is defined
rg -n 'def query' tests/logic/slt_runner.py

Repository: poyrazK/cloudSQL

Length of output: 88


🏁 Script executed:

# Look at the comparison logic to understand the actual data types
cat -n tests/logic/slt_runner.py | sed -n '168,200p'

Repository: poyrazK/cloudSQL

Length of output: 1780


Normalize NULLs before applying rowsort.

actual_rows carries SQL NULLs as None, while expected_rows uses "NULL" string tokens parsed from the test file. Sorting both sides raw will crash with TypeError when comparing None and str in NULL-containing outer-join results. The valuesort branch already applies this normalization pattern; rowsort should too.

🛠️ Suggested fix
             if sort_mode == 'rowsort':
-                actual_rows.sort()
-                expected_rows.sort()
+                actual_rows = sorted(
+                    actual_rows,
+                    key=lambda row: [("NULL" if val is None else str(val)) for val in row],
+                )
+                expected_rows = sorted(
+                    expected_rows,
+                    key=lambda row: [str(val) for val in row],
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/logic/slt_runner.py` around lines 155 - 163, The rowsort branch fails
when comparing SQL NULLs (None) with string "NULL" because actual_rows contains
None while expected_rows contains the string token; modify the rowsort handling
in the block that checks sort_mode == 'rowsort' to normalize NULLs the same way
valuesort does—convert None to "NULL" (or stringify values consistently) for
both actual_rows and expected_rows before sorting so actual_rows.sort() and
expected_rows.sort() won't raise TypeError; update the code paths around
sort_mode, actual_rows, and expected_rows to perform this normalization prior to
sorting.

Comment on lines +37 to +39
cd "$BUILD_DIR" || exit 1
make -j"$CPU_COUNT"
./cloudSQL -p 5438 -d "$TEST_DATA_DIR" &
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Show the build/start block and whether the script enables fail-fast mode.
rg -n -C2 'set -e|pipefail|make -j|cloudSQL -p' tests/run_test.sh

Repository: poyrazK/cloudSQL

Length of output: 184


🏁 Script executed:

head -50 tests/run_test.sh | cat -n

Repository: poyrazK/cloudSQL

Length of output: 1647


Add error handling to the build step to prevent launching a stale binary.

Line 38 does not guard the make command, so if the build fails, the script proceeds to launch ./cloudSQL with a stale binary from a previous build, masking build regressions and producing misleading test results.

Fix
 cd "$BUILD_DIR" || exit 1
-make -j"$CPU_COUNT"
+make -j"$CPU_COUNT" || exit 1
 ./cloudSQL -p 5438 -d "$TEST_DATA_DIR" &
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cd "$BUILD_DIR" || exit 1
make -j"$CPU_COUNT"
./cloudSQL -p 5438 -d "$TEST_DATA_DIR" &
cd "$BUILD_DIR" || exit 1
make -j"$CPU_COUNT" || exit 1
./cloudSQL -p 5438 -d "$TEST_DATA_DIR" &
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/run_test.sh` around lines 37 - 39, The build step does not check the
result of running make, so a failed build can be hidden and the script will
still launch the old ./cloudSQL binary; update the test script to guard the
build (the make -j"$CPU_COUNT" invocation) by failing the script when make
returns non-zero (e.g., exit on error or explicitly check the exit status)
before proceeding to cd "$BUILD_DIR" || exit 1 and launching ./cloudSQL -p 5438
-d "$TEST_DATA_DIR" &, ensuring a failed build prevents starting the server.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/executor/operator.cpp (1)

608-611: ⚠️ Potential issue | 🔴 Critical

SQL NULL join keys must not be hashed as ordinary values.

The code converts join keys to strings via to_string(), which treats NULL as a regular hashable value. This violates SQL semantics where NULL = NULL is UNKNOWN (not TRUE), causing:

  1. Incorrect matches when both sides have NULL keys
  2. Potential collision with the literal string "NULL"
  3. Broken LEFT/RIGHT/FULL join semantics for rows that should remain unmatched
Proposed fix to skip NULL keys
     while (right_->next(right_tuple)) {
         const common::Value key = right_key_->evaluate(&right_tuple, &right_schema);
+        if (key.is_null()) {
+            // NULL keys never match; for RIGHT/FULL joins, store separately or mark specially
+            continue;  // Skip for now; these will be emitted as unmatched in finalization
+        }
         hash_table_.emplace(key.to_string(), BuildTuple{std::move(right_tuple), false});
     }

The probe side (line 666) needs symmetric handling:

         const common::Value key = left_key_->evaluate(&(left_tuple_.value()), &left_schema);
+        if (key.is_null()) {
+            // NULL never matches; for LEFT/FULL joins, emit with NULLs immediately
+            if (join_type_ == JoinType::Left || join_type_ == JoinType::Full) {
+                std::vector<common::Value> joined_values = left_tuple_->values();
+                for (size_t i = 0; i < right_schema.column_count(); ++i) {
+                    joined_values.push_back(common::Value::make_null());
+                }
+                out_tuple = Tuple(std::move(joined_values));
+                left_tuple_ = std::nullopt;
+                return true;
+            }
+            left_tuple_ = std::nullopt;
+            continue;
+        }
         auto range = hash_table_.equal_range(key.to_string());

For RIGHT/FULL joins, NULL-keyed right tuples need separate tracking to emit as unmatched during finalization.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/executor/operator.cpp` around lines 608 - 611, The loop that builds the
hash table currently hashes right join keys by calling
right_key_->evaluate(...).to_string(), which treats SQL NULL as a normal value;
change the logic in the build phase (the loop using right_->next and
right_key_->evaluate) to skip inserting any tuple whose evaluated join key is
NULL (use the Value is-null accessor, e.g., common::Value::is_null()/isNull()).
Also add tracking for NULL-keyed right tuples (e.g., a separate container or
counter of skipped right tuples stored alongside BuildTuple info) so RIGHT and
FULL join finalization can still emit those right-side rows as unmatched; make
symmetric changes to the probe logic that evaluates left keys (the probe path
using key evaluation and hash_table_ lookup) to treat NULL keys as non-matchable
(i.e., do not probe/hash when key is NULL). Ensure references to hash_table_,
BuildTuple, right_key_->evaluate, right_->next and the probe key-evaluation code
are updated accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/executor/operator.cpp`:
- Around line 608-611: The loop that builds the hash table currently hashes
right join keys by calling right_key_->evaluate(...).to_string(), which treats
SQL NULL as a normal value; change the logic in the build phase (the loop using
right_->next and right_key_->evaluate) to skip inserting any tuple whose
evaluated join key is NULL (use the Value is-null accessor, e.g.,
common::Value::is_null()/isNull()). Also add tracking for NULL-keyed right
tuples (e.g., a separate container or counter of skipped right tuples stored
alongside BuildTuple info) so RIGHT and FULL join finalization can still emit
those right-side rows as unmatched; make symmetric changes to the probe logic
that evaluates left keys (the probe path using key evaluation and hash_table_
lookup) to treat NULL keys as non-matchable (i.e., do not probe/hash when key is
NULL). Ensure references to hash_table_, BuildTuple, right_key_->evaluate,
right_->next and the probe key-evaluation code are updated accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: da3b95b0-2d93-4ea2-997c-49e5584686f3

📥 Commits

Reviewing files that changed from the base of the PR and between a49f061 and 0b8d82b.

📒 Files selected for processing (3)
  • include/executor/operator.hpp
  • src/executor/operator.cpp
  • tests/cloudSQL_tests.cpp

@poyrazK poyrazK merged commit e8f2c69 into main Mar 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant