Skip to content

Cypher: DISTINCT applied after ORDER BY + LIMIT instead of before (execution order bug) #237

@maksodf

Description

@maksodf

Environment

  • cbm version: 0.6.0
  • Platform: macOS (Darwin arm64)
  • Binary: /Users/*/.local/bin/codebase-memory-mcp (official install)

Bug

RETURN DISTINCT ... ORDER BY ... LIMIT N returns far fewer rows than it should because DISTINCT is applied AFTER ORDER BY + LIMIT instead of before. In standard Cypher (openCypher spec), DISTINCT is part of the RETURN projection and must be evaluated BEFORE ORDER BY → SKIP → LIMIT.

Minimal reproducer

In a graph with ~72 distinct Swift files indexed, each containing ~30 symbol nodes:

MATCH (n) WHERE n.file_path ENDS WITH '.swift'
RETURN DISTINCT n.file_path AS path
ORDER BY path
LIMIT 100

Expected: up to 72 rows (all distinct Swift file paths).
Actual: only ~5 rows — the alphabetically earliest files. The top 100 rows by file_path ASC come entirely from the first 5 files (each has >20 nodes), and DISTINCT applied to those 100 yields only ~5 distinct paths.

Removing ORDER BY makes it work correctly:

MATCH (n) WHERE n.file_path ENDS WITH '.swift'
RETURN DISTINCT n.file_path AS path
LIMIT 100

→ returns 72 distinct rows as expected.

Root cause

src/cypher/cypher.c lines 3266–3270:

rb_apply_order_by(rb, ret);
rb_apply_skip_limit(rb, ret->skip, ret->limit > 0 ? ret->limit : max_rows);
if (ret->distinct) {
    rb_apply_distinct(rb);
}

rb_apply_distinct at line 2433 is self-contained (reads rb->rows, rb->col_count, rb->row_count, dedupes in place — no ordering dependencies).

Proposed fix (one-line reorder)

if (ret->distinct) {
    rb_apply_distinct(rb);
}
rb_apply_order_by(rb, ret);
rb_apply_skip_limit(rb, ret->skip, ret->limit > 0 ? ret->limit : max_rows);

Happy to open a PR if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions