Environment
- cbm version: 0.6.0
- Platform: macOS (Darwin arm64)
Bug
WITH DISTINCT ... RETURN count(x) returns the non-distinct count — the DISTINCT keyword in a WITH clause is silently ignored. This breaks the standard openCypher pattern for counting distinct values, which is important because count(DISTINCT ...) also isn't supported (separate issue).
Minimal reproducer
With a graph containing multiple nodes all sharing the same file_path value (e.g. 14 symbol nodes in JarvisApp.swift):
MATCH (n) WHERE n.file_path CONTAINS 'JarvisApp.swift'
WITH DISTINCT n.file_path AS path
RETURN count(path) AS distinct_count
Expected: distinct_count = 1 (only one unique file_path value).
Actual: distinct_count = 14 — DISTINCT was ignored, all 14 rows passed through, count(path) counted them all.
Root cause
src/cypher/cypher.c, execute_with_clause at line 2758. It calls with_sort_skip_limit at line 2783 (which does sort + skip + limit on bindings) but never applies DISTINCT. The wc->distinct flag on the cbm_return_clause_t* WITH clause is never honored.
Proposed fix
Add a bindings_apply_distinct helper that deduplicates a binding array based on the WITH-projected fields (using the same hash-or-comparison pattern as rb_apply_distinct at line 2433, adapted for bindings instead of result rows), and call it from execute_with_clause before with_sort_skip_limit.
Happy to open a PR if useful.
Environment
Bug
WITH DISTINCT ... RETURN count(x)returns the non-distinct count — theDISTINCTkeyword in aWITHclause is silently ignored. This breaks the standard openCypher pattern for counting distinct values, which is important becausecount(DISTINCT ...)also isn't supported (separate issue).Minimal reproducer
With a graph containing multiple nodes all sharing the same
file_pathvalue (e.g. 14 symbol nodes inJarvisApp.swift):Expected:
distinct_count = 1(only one uniquefile_pathvalue).Actual:
distinct_count = 14—DISTINCTwas ignored, all 14 rows passed through,count(path)counted them all.Root cause
src/cypher/cypher.c,execute_with_clauseat line 2758. It callswith_sort_skip_limitat line 2783 (which does sort + skip + limit on bindings) but never applies DISTINCT. Thewc->distinctflag on thecbm_return_clause_t*WITH clause is never honored.Proposed fix
Add a
bindings_apply_distincthelper that deduplicates a binding array based on the WITH-projected fields (using the same hash-or-comparison pattern asrb_apply_distinctat line 2433, adapted for bindings instead of result rows), and call it fromexecute_with_clausebeforewith_sort_skip_limit.Happy to open a PR if useful.