Skip to content

Cypher: WITH DISTINCT is silently ignored (no deduplication) #238

@maksodf

Description

@maksodf

Environment

  • cbm version: 0.6.0
  • Platform: macOS (Darwin arm64)

Bug

WITH DISTINCT ... RETURN count(x) returns the non-distinct count — the DISTINCT keyword in a WITH clause is silently ignored. This breaks the standard openCypher pattern for counting distinct values, which is important because count(DISTINCT ...) also isn't supported (separate issue).

Minimal reproducer

With a graph containing multiple nodes all sharing the same file_path value (e.g. 14 symbol nodes in JarvisApp.swift):

MATCH (n) WHERE n.file_path CONTAINS 'JarvisApp.swift'
WITH DISTINCT n.file_path AS path
RETURN count(path) AS distinct_count

Expected: distinct_count = 1 (only one unique file_path value).
Actual: distinct_count = 14DISTINCT was ignored, all 14 rows passed through, count(path) counted them all.

Root cause

src/cypher/cypher.c, execute_with_clause at line 2758. It calls with_sort_skip_limit at line 2783 (which does sort + skip + limit on bindings) but never applies DISTINCT. The wc->distinct flag on the cbm_return_clause_t* WITH clause is never honored.

Proposed fix

Add a bindings_apply_distinct helper that deduplicates a binding array based on the WITH-projected fields (using the same hash-or-comparison pattern as rb_apply_distinct at line 2433, adapted for bindings instead of result rows), and call it from execute_with_clause before with_sort_skip_limit.

Happy to open a PR if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions