feat: Implement LEFT JOIN and fix NULL handling in PostgreSQL protocol#12
feat: Implement LEFT JOIN and fix NULL handling in PostgreSQL protocol#12
Conversation
…nd automate SLT tests
📝 WalkthroughWalkthroughThis pull request extends HashJoinOperator to support LEFT, RIGHT, and FULL outer joins alongside INNER joins. It introduces a JoinType enum, modifies the operator's internal structure to track tuple match states, adjusts output schema nullability based on join type, enhances column resolution logic, and updates NULL encoding in network protocol. Comprehensive test coverage is added including new SLT test suite and runner framework. Changes
Sequence DiagramsequenceDiagram
participant QE as Query Executor
participant HOp as HashJoinOperator
participant Left as Left Operator
participant Right as Right Operator
participant Schema as Output Schema
QE->>HOp: Constructor(left, right, left_key, right_key, JoinType::Left)
HOp->>Schema: Adjust nullability (right columns nullable)
QE->>HOp: Open()
HOp->>Left: Open()
HOp->>Right: Open()
HOp->>HOp: Build hash_table with BuildTuple{tuple, matched=false}
QE->>HOp: Next()
HOp->>Left: Next() - get left_tuple
HOp->>HOp: Lookup in hash_table using left_key
alt Match found
HOp->>HOp: Iterate matches, emit joined tuples
HOp->>HOp: Mark BuildTuple.matched = true
else No match (LEFT join)
HOp->>HOp: Emit NULL-extended tuple (right side NULL)
end
HOp->>HOp: Check right_idx_iter for unmatched right tuples
alt Unmatched right exists (LEFT join doesn't emit)
HOp->>HOp: Continue to next phase
end
QE->>HOp: Close()
HOp->>Left: Close()
HOp->>Right: Close()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
src/network/server.cpp (1)
418-423: Reservecol_valsup front.
num_colsis already known here, so reserving once avoids repeated reallocations and string moves while building each row.♻️ Small cleanup
struct ColValue { bool is_null; std::string val; }; std::vector<ColValue> col_vals; + col_vals.reserve(num_cols);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/network/server.cpp` around lines 418 - 423, The vector col_vals is constructed without reserved capacity even though num_cols is available; to avoid repeated reallocations and string moves, call col_vals.reserve(num_cols) immediately after declaring col_vals (before populating it) so the ColValue entries can be emplaced/pushed without growing the buffer repeatedly.tests/logic/slt_runner.py (1)
125-129: Don't silently ignore the SLT sort modifier.Line 129 parses the optional
[sort]token and then drops it.rowsort/valuesortqueries will still be compared in wire order, which makes the runner incompatible with valid SLT files and can produce false failures.Do you want me to sketch a minimal
rowsort/valuesortimplementation, or at least a fail-fast check for unsupported modifiers?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/logic/slt_runner.py` around lines 125 - 129, The code currently drops the optional SLT sort modifier after parsing types; update the parsing in the query handling (where line, parts, and types are used) to detect a third token and handle 'rowsort' and 'valuesort' (e.g., set a rowsort/valuesort flag or mode) instead of ignoring it; then propagate that flag into the comparison logic that verifies query results so results are either sorted appropriately before comparison (rows sorted by entire row, values sorted per-row) or, if you prefer a simpler change, raise a clear error for unsupported modifiers when the third token is present so the runner fails fast for 'rowsort'/'valuesort'.src/executor/operator.cpp (1)
581-588: Consider extending nullable marking for Right and Full joins.Currently, only LEFT joins mark right-side columns as nullable. For completeness, RIGHT joins should mark left-side columns as nullable, and FULL joins should mark both sides as nullable. This is relevant if/when Right and Full joins are implemented.
♻️ Proposed schema construction for all join types
if (left_ && right_) { for (const auto& col : left_->output_schema().columns()) { - schema_.add_column(col); + auto col_meta = col; + if (join_type_ == executor::JoinType::Right || join_type_ == executor::JoinType::Full) { + col_meta.set_nullable(true); + } + schema_.add_column(col_meta); } for (const auto& col : right_->output_schema().columns()) { /* Left Join results in nullable columns for the right side */ auto col_meta = col; - if (join_type_ == executor::JoinType::Left) { + if (join_type_ == executor::JoinType::Left || join_type_ == executor::JoinType::Full) { col_meta.set_nullable(true); } schema_.add_column(col_meta); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/executor/operator.cpp` around lines 581 - 588, The schema construction only marks right-side columns nullable for JoinType::Left; update the logic so JoinType::Right marks left-side columns nullable and JoinType::Full marks both sides nullable: when iterating right_->output_schema().columns() set col_meta.set_nullable(true) if join_type_ == executor::JoinType::Left || join_type_ == executor::JoinType::Full, and when iterating left_->output_schema().columns() set col_meta.set_nullable(true) if join_type_ == executor::JoinType::Right || join_type_ == executor::JoinType::Full; keep using schema_.add_column(...) and the existing col/col_meta variables and join_type_ checks to implement this.tests/run_test.sh (1)
22-22: Consider cross-platform compatibility for CPU count detection.
sysctl -n hw.ncpuis macOS-specific. On Linux,nprocis the equivalent command. If CI runs on Linux, this will fail.♻️ Cross-platform suggestion
-make -j"$(sysctl -n hw.ncpu)" +if command -v nproc &> /dev/null; then + make -j"$(nproc)" +else + make -j"$(sysctl -n hw.ncpu)" +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/run_test.sh` at line 22, The current line using "make -j$(sysctl -n hw.ncpu)" is macOS-specific; replace it with a cross-platform CPU detection pattern: determine CPU_COUNT by trying "nproc", then "sysctl -n hw.ncpu", then "getconf _NPROCESSORS_ONLN", falling back to 1 if none succeed, and use "make -j$CPU_COUNT" instead; update tests/run_test.sh to set that CPU_COUNT variable before invoking make so the script works on both Linux and macOS CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/executor/query_executor.cpp`:
- Around line 651-663: The join mapping in query_executor.cpp correctly produces
JoinType::Right and JoinType::Full, but HashJoinOperator::next() (in class
HashJoinOperator) only implements Left and Inner semantics; update
HashJoinOperator::next() to explicitly handle JoinType::Right and JoinType::Full
by tracking matched build/probe rows (use the existing hash table entries or
matched flags), emitting unmatched right-side tuples padded with NULLs on the
left for Right joins, and for Full joins emit both unmatched left-side tuples
padded with NULLs on the right and unmatched right-side tuples padded with NULLs
on the left (preserving existing matched emission for inner matches). Ensure you
use the same internal data structures and matching flags used for Left joins so
behavior is consistent with memory/accounting already present in
HashJoinOperator.
In `@src/parser/expression.cpp`:
- Around line 237-245: The vectorized lookup in
ColumnExpr::evaluate_vectorized() still only calls schema->find_column(name_)
and must be updated to mirror the scalar fix: first attempt
schema->find_column(this->to_string()) for an exact/qualified match, and only if
that returns not-found (static_cast<size_t>(-1)) and has_table() is true, fall
back to schema->find_column(name_); update the lookup logic in
evaluate_vectorized() to use the same two-step resolution (to_string() then
name_) referencing ColumnExpr::has_table(), name_, and to_string().
In `@tests/cloudSQL_tests.cpp`:
- Around line 786-788: The test's SQL string passed to the Lexer uses undeclared
aliases `a` and `b`, so update the literal in the Lexer instantiation (the call
creating variable `lexer` via std::make_unique<Lexer>) to use real table
qualifiers `t1` and `t2` instead of `a`/`b` (e.g., replace occurrences of `a.id`
with `t1.id` and `b.val`/`b.x` with `t2.val`/`t2.x`) so the test actually
exercises the table-qualified column path logic added in this PR.
In `@tests/run_test.sh`:
- Line 22: The make invocation uses unquoted command substitution: make
-j$(sysctl -n hw.ncpu); update the command to quote the substitution to prevent
word-splitting by changing the invocation that contains "$(sysctl -n hw.ncpu)"
(or assign the output to a variable like NC and use "$NC") so the -j argument is
always a single token; locate the line with make -j$(sysctl -n hw.ncpu) in
tests/run_test.sh and replace it with a quoted substitution.
---
Nitpick comments:
In `@src/executor/operator.cpp`:
- Around line 581-588: The schema construction only marks right-side columns
nullable for JoinType::Left; update the logic so JoinType::Right marks left-side
columns nullable and JoinType::Full marks both sides nullable: when iterating
right_->output_schema().columns() set col_meta.set_nullable(true) if join_type_
== executor::JoinType::Left || join_type_ == executor::JoinType::Full, and when
iterating left_->output_schema().columns() set col_meta.set_nullable(true) if
join_type_ == executor::JoinType::Right || join_type_ ==
executor::JoinType::Full; keep using schema_.add_column(...) and the existing
col/col_meta variables and join_type_ checks to implement this.
In `@src/network/server.cpp`:
- Around line 418-423: The vector col_vals is constructed without reserved
capacity even though num_cols is available; to avoid repeated reallocations and
string moves, call col_vals.reserve(num_cols) immediately after declaring
col_vals (before populating it) so the ColValue entries can be emplaced/pushed
without growing the buffer repeatedly.
In `@tests/logic/slt_runner.py`:
- Around line 125-129: The code currently drops the optional SLT sort modifier
after parsing types; update the parsing in the query handling (where line,
parts, and types are used) to detect a third token and handle 'rowsort' and
'valuesort' (e.g., set a rowsort/valuesort flag or mode) instead of ignoring it;
then propagate that flag into the comparison logic that verifies query results
so results are either sorted appropriately before comparison (rows sorted by
entire row, values sorted per-row) or, if you prefer a simpler change, raise a
clear error for unsupported modifiers when the third token is present so the
runner fails fast for 'rowsort'/'valuesort'.
In `@tests/run_test.sh`:
- Line 22: The current line using "make -j$(sysctl -n hw.ncpu)" is
macOS-specific; replace it with a cross-platform CPU detection pattern:
determine CPU_COUNT by trying "nproc", then "sysctl -n hw.ncpu", then "getconf
_NPROCESSORS_ONLN", falling back to 1 if none succeed, and use "make
-j$CPU_COUNT" instead; update tests/run_test.sh to set that CPU_COUNT variable
before invoking make so the script works on both Linux and macOS CI.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 763e14a2-069e-4057-a7f0-560ed3cf9a88
📒 Files selected for processing (15)
include/executor/operator.hppinclude/executor/types.hppsrc/executor/operator.cppsrc/executor/query_executor.cppsrc/network/server.cppsrc/parser/expression.cppsrc/parser/parser.cpptests/cloudSQL_tests.cpptests/logic/aggregates.slttests/logic/basic.slttests/logic/expressions.slttests/logic/joins.slttests/logic/slt_runner.pytests/raft_simulation_tests.cpptests/run_test.sh
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/executor/operator.cpp`:
- Around line 609-610: The code currently calls right_key_->evaluate(...) and
then uses key.to_string() to build the hash_table_ entry, which treats SQL NULL
like an ordinary string and causes NULL=NULL matches; change both the build side
(where hash_table_.emplace(...) is called with key.to_string()) and the
probe/lookup side (the symmetric block around the 661-675 region) to detect SQL
NULL via the common::Value null-check (e.g., key.is_null() or equivalent) and
handle it specially: do not convert NULL to a string or insert it into the
normal string-keyed hash (skip inserting NULL keys or store them using a
distinct null marker/optional type), and ensure lookups treat NULL keys as
non-matching so NULLs never match each other or collide with the literal "NULL"
string. Ensure you update both BuildTuple insertion sites and the corresponding
lookup logic to use the same null-handling convention.
In `@tests/logic/slt_runner.py`:
- Around line 155-163: The rowsort branch fails when comparing SQL NULLs (None)
with string "NULL" because actual_rows contains None while expected_rows
contains the string token; modify the rowsort handling in the block that checks
sort_mode == 'rowsort' to normalize NULLs the same way valuesort does—convert
None to "NULL" (or stringify values consistently) for both actual_rows and
expected_rows before sorting so actual_rows.sort() and expected_rows.sort()
won't raise TypeError; update the code paths around sort_mode, actual_rows, and
expected_rows to perform this normalization prior to sorting.
In `@tests/run_test.sh`:
- Around line 37-39: The build step does not check the result of running make,
so a failed build can be hidden and the script will still launch the old
./cloudSQL binary; update the test script to guard the build (the make
-j"$CPU_COUNT" invocation) by failing the script when make returns non-zero
(e.g., exit on error or explicitly check the exit status) before proceeding to
cd "$BUILD_DIR" || exit 1 and launching ./cloudSQL -p 5438 -d "$TEST_DATA_DIR"
&, ensuring a failed build prevents starting the server.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d257f190-f3c7-461c-9688-9751eba18675
📒 Files selected for processing (7)
include/executor/operator.hppsrc/executor/operator.cppsrc/network/server.cppsrc/parser/expression.cpptests/cloudSQL_tests.cpptests/logic/slt_runner.pytests/run_test.sh
✅ Files skipped from review due to trivial changes (1)
- tests/cloudSQL_tests.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- src/parser/expression.cpp
| const common::Value key = right_key_->evaluate(&right_tuple, &right_schema); | ||
| hash_table_.emplace(key.to_string(), std::move(right_tuple)); | ||
| hash_table_.emplace(key.to_string(), BuildTuple{std::move(right_tuple), false}); |
There was a problem hiding this comment.
Do not hash SQL NULL join keys as ordinary values.
Lines 609-610 and 661-665 route join keys through to_string(), so a NULL key becomes just another hash entry. That makes NULL = NULL match here and can even collide with a real text key like "NULL", which breaks join semantics and prevents the unmatched-row paths from firing correctly in LEFT/RIGHT/FULL joins.
Also applies to: 661-675
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/executor/operator.cpp` around lines 609 - 610, The code currently calls
right_key_->evaluate(...) and then uses key.to_string() to build the hash_table_
entry, which treats SQL NULL like an ordinary string and causes NULL=NULL
matches; change both the build side (where hash_table_.emplace(...) is called
with key.to_string()) and the probe/lookup side (the symmetric block around the
661-675 region) to detect SQL NULL via the common::Value null-check (e.g.,
key.is_null() or equivalent) and handle it specially: do not convert NULL to a
string or insert it into the normal string-keyed hash (skip inserting NULL keys
or store them using a distinct null marker/optional type), and ensure lookups
treat NULL keys as non-matching so NULLs never match each other or collide with
the literal "NULL" string. Ensure you update both BuildTuple insertion sites and
the corresponding lookup logic to use the same null-handling convention.
| # Apply sort mode | ||
| if sort_mode == 'rowsort': | ||
| actual_rows.sort() | ||
| expected_rows.sort() | ||
| elif sort_mode == 'valuesort': | ||
| actual_values = sorted([str(val) if val is not None else "NULL" for row in actual_rows for val in row]) | ||
| expected_values = sorted([val for row in expected_rows for val in row]) | ||
| actual_rows = [[v] for v in actual_values] | ||
| expected_rows = [[v] for v in expected_values] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Show the NULL decode path and the raw rowsort branch.
rg -n -C2 'row_data.append\(None\)|rowsort|actual_rows.sort|expected_rows.sort' tests/logic/slt_runner.py
python3 - <<'PY'
rows = [[None], ["a"]]
try:
rows.sort()
print("sorted:", rows)
except Exception as exc:
print(type(exc).__name__, exc)
PYRepository: poyrazK/cloudSQL
Length of output: 695
🏁 Script executed:
cd tests/logic && cat -n slt_runner.py | sed -n '140,170p'Repository: poyrazK/cloudSQL
Length of output: 1619
🏁 Script executed:
# Search for test files that use rowsort and might test NULL handling
fd -e 'sql' tests/ | head -20 | xargs grep -l 'rowsort' | head -5Repository: poyrazK/cloudSQL
Length of output: 389
🏁 Script executed:
# Check the expected_rows data structure and how it's populated
rg -n -B5 'expected_rows.*append|expected_rows.*=' tests/logic/slt_runner.py | head -30Repository: poyrazK/cloudSQL
Length of output: 832
🏁 Script executed:
find tests -type f -name '*.sql' | head -10Repository: poyrazK/cloudSQL
Length of output: 42
🏁 Script executed:
# Search for test files that contain both rowsort and NULL patterns
find tests -type f \( -name '*.sql' -o -name '*.txt' \) -exec grep -l 'rowsort' {} \; | head -5Repository: poyrazK/cloudSQL
Length of output: 42
🏁 Script executed:
# Check the structure of test data to see if rowsort tests include NULL
if [ -d tests/data ] || [ -d tests/sql ]; then
find tests -type f | head -20
fiRepository: poyrazK/cloudSQL
Length of output: 42
🏁 Script executed:
cat -n tests/logic/slt_runner.py | sed -n '1,50p'Repository: poyrazK/cloudSQL
Length of output: 1867
🏁 Script executed:
# Find where client.query is defined
rg -n 'def query' tests/logic/slt_runner.pyRepository: poyrazK/cloudSQL
Length of output: 88
🏁 Script executed:
# Look at the comparison logic to understand the actual data types
cat -n tests/logic/slt_runner.py | sed -n '168,200p'Repository: poyrazK/cloudSQL
Length of output: 1780
Normalize NULLs before applying rowsort.
actual_rows carries SQL NULLs as None, while expected_rows uses "NULL" string tokens parsed from the test file. Sorting both sides raw will crash with TypeError when comparing None and str in NULL-containing outer-join results. The valuesort branch already applies this normalization pattern; rowsort should too.
🛠️ Suggested fix
if sort_mode == 'rowsort':
- actual_rows.sort()
- expected_rows.sort()
+ actual_rows = sorted(
+ actual_rows,
+ key=lambda row: [("NULL" if val is None else str(val)) for val in row],
+ )
+ expected_rows = sorted(
+ expected_rows,
+ key=lambda row: [str(val) for val in row],
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/logic/slt_runner.py` around lines 155 - 163, The rowsort branch fails
when comparing SQL NULLs (None) with string "NULL" because actual_rows contains
None while expected_rows contains the string token; modify the rowsort handling
in the block that checks sort_mode == 'rowsort' to normalize NULLs the same way
valuesort does—convert None to "NULL" (or stringify values consistently) for
both actual_rows and expected_rows before sorting so actual_rows.sort() and
expected_rows.sort() won't raise TypeError; update the code paths around
sort_mode, actual_rows, and expected_rows to perform this normalization prior to
sorting.
| cd "$BUILD_DIR" || exit 1 | ||
| make -j"$CPU_COUNT" | ||
| ./cloudSQL -p 5438 -d "$TEST_DATA_DIR" & |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Show the build/start block and whether the script enables fail-fast mode.
rg -n -C2 'set -e|pipefail|make -j|cloudSQL -p' tests/run_test.shRepository: poyrazK/cloudSQL
Length of output: 184
🏁 Script executed:
head -50 tests/run_test.sh | cat -nRepository: poyrazK/cloudSQL
Length of output: 1647
Add error handling to the build step to prevent launching a stale binary.
Line 38 does not guard the make command, so if the build fails, the script proceeds to launch ./cloudSQL with a stale binary from a previous build, masking build regressions and producing misleading test results.
Fix
cd "$BUILD_DIR" || exit 1
-make -j"$CPU_COUNT"
+make -j"$CPU_COUNT" || exit 1
./cloudSQL -p 5438 -d "$TEST_DATA_DIR" &📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cd "$BUILD_DIR" || exit 1 | |
| make -j"$CPU_COUNT" | |
| ./cloudSQL -p 5438 -d "$TEST_DATA_DIR" & | |
| cd "$BUILD_DIR" || exit 1 | |
| make -j"$CPU_COUNT" || exit 1 | |
| ./cloudSQL -p 5438 -d "$TEST_DATA_DIR" & |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/run_test.sh` around lines 37 - 39, The build step does not check the
result of running make, so a failed build can be hidden and the script will
still launch the old ./cloudSQL binary; update the test script to guard the
build (the make -j"$CPU_COUNT" invocation) by failing the script when make
returns non-zero (e.g., exit on error or explicitly check the exit status)
before proceeding to cd "$BUILD_DIR" || exit 1 and launching ./cloudSQL -p 5438
-d "$TEST_DATA_DIR" &, ensuring a failed build prevents starting the server.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/executor/operator.cpp (1)
608-611:⚠️ Potential issue | 🔴 CriticalSQL NULL join keys must not be hashed as ordinary values.
The code converts join keys to strings via
to_string(), which treats NULL as a regular hashable value. This violates SQL semantics whereNULL = NULLisUNKNOWN(notTRUE), causing:
- Incorrect matches when both sides have NULL keys
- Potential collision with the literal string
"NULL"- Broken LEFT/RIGHT/FULL join semantics for rows that should remain unmatched
Proposed fix to skip NULL keys
while (right_->next(right_tuple)) { const common::Value key = right_key_->evaluate(&right_tuple, &right_schema); + if (key.is_null()) { + // NULL keys never match; for RIGHT/FULL joins, store separately or mark specially + continue; // Skip for now; these will be emitted as unmatched in finalization + } hash_table_.emplace(key.to_string(), BuildTuple{std::move(right_tuple), false}); }The probe side (line 666) needs symmetric handling:
const common::Value key = left_key_->evaluate(&(left_tuple_.value()), &left_schema); + if (key.is_null()) { + // NULL never matches; for LEFT/FULL joins, emit with NULLs immediately + if (join_type_ == JoinType::Left || join_type_ == JoinType::Full) { + std::vector<common::Value> joined_values = left_tuple_->values(); + for (size_t i = 0; i < right_schema.column_count(); ++i) { + joined_values.push_back(common::Value::make_null()); + } + out_tuple = Tuple(std::move(joined_values)); + left_tuple_ = std::nullopt; + return true; + } + left_tuple_ = std::nullopt; + continue; + } auto range = hash_table_.equal_range(key.to_string());For RIGHT/FULL joins, NULL-keyed right tuples need separate tracking to emit as unmatched during finalization.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/executor/operator.cpp` around lines 608 - 611, The loop that builds the hash table currently hashes right join keys by calling right_key_->evaluate(...).to_string(), which treats SQL NULL as a normal value; change the logic in the build phase (the loop using right_->next and right_key_->evaluate) to skip inserting any tuple whose evaluated join key is NULL (use the Value is-null accessor, e.g., common::Value::is_null()/isNull()). Also add tracking for NULL-keyed right tuples (e.g., a separate container or counter of skipped right tuples stored alongside BuildTuple info) so RIGHT and FULL join finalization can still emit those right-side rows as unmatched; make symmetric changes to the probe logic that evaluates left keys (the probe path using key evaluation and hash_table_ lookup) to treat NULL keys as non-matchable (i.e., do not probe/hash when key is NULL). Ensure references to hash_table_, BuildTuple, right_key_->evaluate, right_->next and the probe key-evaluation code are updated accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/executor/operator.cpp`:
- Around line 608-611: The loop that builds the hash table currently hashes
right join keys by calling right_key_->evaluate(...).to_string(), which treats
SQL NULL as a normal value; change the logic in the build phase (the loop using
right_->next and right_key_->evaluate) to skip inserting any tuple whose
evaluated join key is NULL (use the Value is-null accessor, e.g.,
common::Value::is_null()/isNull()). Also add tracking for NULL-keyed right
tuples (e.g., a separate container or counter of skipped right tuples stored
alongside BuildTuple info) so RIGHT and FULL join finalization can still emit
those right-side rows as unmatched; make symmetric changes to the probe logic
that evaluates left keys (the probe path using key evaluation and hash_table_
lookup) to treat NULL keys as non-matchable (i.e., do not probe/hash when key is
NULL). Ensure references to hash_table_, BuildTuple, right_key_->evaluate,
right_->next and the probe key-evaluation code are updated accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: da3b95b0-2d93-4ea2-997c-49e5584686f3
📒 Files selected for processing (3)
include/executor/operator.hppsrc/executor/operator.cpptests/cloudSQL_tests.cpp
This PR introduces several improvements to the execution engine and network layer:
LEFT JOINlogic inHashJoinOperator, ensuring non-matching left tuples are preserved with NULL values for right-side columns.ColumnExprto correctly resolve columns qualified with table names.run_test.shand added initial test cases for joins, aggregates, and expressions.JoinTypeenum to unify join representation across parser and executor.Summary by CodeRabbit
New Features
Bug Fixes
Tests