Skip to content
This repository was archived by the owner on Apr 30, 2026. It is now read-only.

Follow-ups to #8: variadic and/or in WHERE, sym_atom_cmp invariant, INT_MIN→null DAG, glob escape docs#9

Merged
singaraiona merged 6 commits intoRayforceDB:masterfrom
ser-vasilich:post-pr8-followups
Apr 29, 2026
Merged

Follow-ups to #8: variadic and/or in WHERE, sym_atom_cmp invariant, INT_MIN→null DAG, glob escape docs#9
singaraiona merged 6 commits intoRayforceDB:masterfrom
ser-vasilich:post-pr8-followups

Conversation

@ser-vasilich
Copy link
Copy Markdown
Contributor

Six follow-up commits on top of #8 (now merged) — picked up issues that surfaced after that PR landed plus the two non-blockers from @singaraiona's review I'd flagged for follow-up.

Commits

  • 399b3cc fix(arith): neg/abs INT_MIN → typed null (k/q convention) — the original PR fix(8 bugs) + test(coverage +1.3pp): probes, type-preservation, broad workout #8 fix used unsigned-wrap negation, but (abs INT16_MIN) returning a negative value is mathematically wrong. Restored k/q convention: INT_MIN → typed null of the same width, detectable via nil?. Atom path only.

  • aaa4e61 fix(expr): DAG path neg/abs INT_MIN → typed null (i64) — extends the same contract to the DAG-fused (select x: (neg col) from: t) path via a single-threaded post-pass that runs after ray_pool_dispatch joins. mark_i64_overflow_as_null zeroes the data slot (preserving the "null position is 0" invariant) and sets the null bit through ray_vec_set_null (which drops any attached accelerator index inline).

  • 103132a fix(cmp): sym_atom_cmp asserts intern invariant instead of fake fallback — the raw-id fallback I added in fix(8 bugs) + test(coverage +1.3pp): probes, type-preservation, broad workout #8 to address the NULL ray_sym_str case was logically inconsistent: with no backing string we have no defensible lexicographic answer, and returning ±1 by interned id fabricates a non-lex total order — same shape of lie as the return 0 it replaced. Replaced with assert matching v1's "trust the invariant, crash if it breaks" stance.

  • 8f5e04f docs(glob): document literal-metachar escape via one-element classglob.h documented the lenient unterminated-[ policy but not how to intentionally match */?/[/]/-. There is no backslash escape; the idiom is one-element class wrap ([*], [?], [[], []], [-]). Documented in glob.h and pinned in like.rfl.

  • f0afb20 fix(query): variadic and/or in WHERE — balanced auto-fold to DAG binary — turned out to be a correctness regression, not the silent perf cliff I initially thought: compile_expr_dag errored with "WHERE predicate not supported by DAG compiler" rather than falling back to eval. Fixed by folding variadic AND/OR into a balanced binary tree before lowering — same fused-expr path as hand-nested binary, no extra column allocations.

  • 868a1ff fix(cmp): restore v1 short-circuit semantics for and/or — discovered this was a v1 regression, not just absent feature: v1 registered both with FN_SPECIAL_FORM, PR fix(8 bugs) + test(coverage +1.3pp): probes, type-preservation, broad workout #8 dropped it. Restored. Short-circuit triggers only when the accumulator is a scalar with the determining truth value, so vector broadcast semantics ((or [t f t] true) → [t t t]) are preserved.

Tests

974 of 975 passing under ASan + UBSan (1 skipped, 0 failed).

ser-vasilich and others added 6 commits April 29, 2026 12:23
Previous implementation used unsigned-wrap negation, so:

  (abs -32768h)           -> -32768h   ❌ negative result from abs!
  (abs (- (neg 32767h) 1h)) -> -32768h ❌ same path
  (neg INT_MIN)           -> INT_MIN   ❌ wrap, silent

Math: -INT_MIN doesn't fit in the same signed width.  Per k/q
convention, surface the overflow as a typed null of the same width:

  (neg INT16_MIN) -> 0Nh
  (neg INT32_MIN) -> 0Ni
  (neg INT64_MIN) -> 0Nl
  (abs INT_MIN)   -> typed null   (same per width)

Properties:
  ✓ Type preserved (no widening to i64)
  ✓ No UB (overflow detected before negation)
  ✓ Detectable via `nil?`
  ✓ Consistent with existing null propagation: (neg 0Ni) → 0Ni

The check is wrapped in RAY_UNLIKELY since INT_MIN is extremely rare
in real data; modern branch predictors will treat it as a
near-zero-cost test on the hot path.

DAG inline loop in expr.c:811-812 still uses unsigned wrap (separate
hot path for OP_NEG/OP_ABS over i64 vectors in select-with-derived-
columns).  Inconsistency between atom and DAG paths to be addressed
in a follow-up.

Tests in test/rfl/arith/{neg,abs}.rfl now pin:
  - (neg/abs INT16_MIN/INT32_MIN/INT64_MIN) → typed null of same type
  - adjacent values (INT_MAX, -INT_MAX) still negate normally

Addresses Anton's PR RayforceDB#8 review blocker 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The fused expression path (`(select x: (neg col) from: t)` and the
fallback `exec_elementwise_unary`) used unsigned-wrap negation and let
INT64_MIN survive in the result vector with no null bit set — same
silent-overflow shape as the atom path that a9e2223 fixed, but in the
hot vector path.

Add a single-threaded post-pass `mark_i64_overflow_as_null` that, for
last-instruction OP_NEG/OP_ABS over i64, scans the result, zeroes any
INT64_MIN slot (preserving the "data at null position is zero"
invariant) and sets the null bit through `ray_vec_set_null` (which also
drops any attached accelerator index, per existing semantics).

Wired into `expr_eval_full`, `expr_eval_full_parted`, and the
`exec_elementwise_unary` fallback.  No-op for f64 / non-NEG-ABS / inputs
without INT_MIN.  RAY_UNLIKELY hint lets the compiler keep the scan
branch-predicted as cold.

Tests added in test/rfl/arith/{neg,abs}.rfl exercise both the DAG-fused
select-from path and the eval-path vec via concat-built INT_MIN columns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous fallback (82fae9d) "compare raw i64 ids when ray_sym_str
returns NULL" was logically inconsistent: with no backing string we
have no defensible answer for *lexicographic* ordering — returning
±1 by raw id fabricates a non-lex total order, the same shape of lie
as the original `return 0` that Anton flagged, just lying about
"unequal" instead of "equal".

NULL from ray_sym_str only fires under corruption (uninitialised intern
table, out-of-range id, evicted slot) — none of these are reachable via
the normal RFL API.  v1 (core/sort.c::compare_symbols) trusts the same
invariant and would SIGSEGV on strcmp(NULL,...) if it broke.

Replace the fake fallback with a documented assert.  Honours Anton's
"never silently say 'equal'" directive without inventing a fake
ordering: in debug we abort with a clear diagnostic; in release the
assert is elided and we fall through to ray_str_cmp(NULL,...) which
matches v1's UB-crash semantics — both modes loud, neither silent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lenient-parsing note in glob.h covered "what happens with a broken
pattern" but not the closely related question of how to *intentionally*
match a metacharacter.  There is no backslash escape — the only way to
match a literal `*`/`?`/`[`/`]`/`-` is to wrap it in a one-element
character class (`[*]`, `[?]`, `[[]`, `[]]`, `[-]`).  This is the
standard fnmatch idiom but not obvious to a Rayforce user coming from
SQL `LIKE` (which has no metachars at all) or from a regex background
(which assumes `\` works).

Document the idiom in glob.h alongside the supported metacharacters,
and pin the behaviour in test/rfl/strop/like.rfl with five concrete
patterns.  No code change to the matcher itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR RayforceDB#8 dropped `register_binary_op("and", ..., OP_AND)` and switched
`and`/`or` to eval-time variadic via `register_vary`.  `compile_expr_dag`
still recognised the 2-arg form via `resolve_binary_dag` (binary
shortcut path) but bailed out with NULL on n>=4 — and the WHERE-clause
caller upgraded that to "WHERE predicate not supported by DAG compiler".
Anton flagged this as a silent perf cliff; in practice it is a
correctness regression (queries that worked in v1 / pre-PR fail
outright).

Fold variadic AND/OR into a balanced binary tree of OP_AND/OP_OR nodes
before lowering: `(and a b c d)` → `(and (and a b) (and c d))`,
depth log2(N).  The fused-expr executor evaluates this as a flat
sequence of binary AND/OR instructions sharing scratch registers, so
no extra column allocations vs the hand-nested form.

Iterative pairwise reduction inside `compile_expr_dag`; bail at >64
inputs (avoids unbounded stack array, no realistic query needs more).
Balanced shape (rather than left-fold) keeps the tree symmetric and
minimises dependency-chain depth, leaving room for a future
predicate-parallel executor without rewriting this pass.

Tests in dag_binary_ops.rfl pin: 3/4-arg AND in WHERE matches the
nested-binary form; 3/5-arg OR in WHERE; mixed `(and (or...) (or...))`;
fold semantics agrees with composing two filtered selects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR RayforceDB#8 changed `and`/`or` from `register_binary_op` (with OP_AND/OP_OR
opcodes) to `register_vary` — but registered them with `RAY_FN_NONE`,
losing the `RAY_FN_SPECIAL_FORM` flag that v1 carried.  Result: every
arg is eagerly evaluated even when the result is already determined
by a prior scalar — `(and false (heavy-fn))` runs `heavy-fn`,
`(or true (slow-query))` runs `slow-query`.  v1 short-circuited; v2
silently regressed.

Promote both to special form so the kernel receives unevaluated AST
nodes and calls `ray_eval` itself — same dispatch shape as `if`, `do`,
`let`, `select` etc.  Short-circuit triggers only when the running
accumulator is a *scalar* with the determining truth value: scalar
falsy for AND, scalar truthy for OR.  When the accumulator is a
vector we fall through to `ray_and_fn` / `ray_or_fn` for proper
element-wise broadcast (preserves `(or [t f t] true) → [t t t]` shape
semantics that the existing test asserts).

Tests in cmp/and.rfl and cmp/or.rfl pin: short-circuit prevents
evaluation of subsequent args (probe via `undefined-name`), and
short-circuit does *not* trigger when the result is undetermined.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@singaraiona singaraiona merged commit c32d379 into RayforceDB:master Apr 29, 2026
4 checks passed
@ser-vasilich ser-vasilich deleted the post-pr8-followups branch April 29, 2026 10:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants