Environment
- cbm version: 0.6.0
- Platform: macOS (Darwin arm64)
- Binary:
/Users/*/.local/bin/codebase-memory-mcp (official install)
Bug
RETURN DISTINCT ... ORDER BY ... LIMIT N returns far fewer rows than it should because DISTINCT is applied AFTER ORDER BY + LIMIT instead of before. In standard Cypher (openCypher spec), DISTINCT is part of the RETURN projection and must be evaluated BEFORE ORDER BY → SKIP → LIMIT.
Minimal reproducer
In a graph with ~72 distinct Swift files indexed, each containing ~30 symbol nodes:
MATCH (n) WHERE n.file_path ENDS WITH '.swift'
RETURN DISTINCT n.file_path AS path
ORDER BY path
LIMIT 100
Expected: up to 72 rows (all distinct Swift file paths).
Actual: only ~5 rows — the alphabetically earliest files. The top 100 rows by file_path ASC come entirely from the first 5 files (each has >20 nodes), and DISTINCT applied to those 100 yields only ~5 distinct paths.
Removing ORDER BY makes it work correctly:
MATCH (n) WHERE n.file_path ENDS WITH '.swift'
RETURN DISTINCT n.file_path AS path
LIMIT 100
→ returns 72 distinct rows as expected.
Root cause
src/cypher/cypher.c lines 3266–3270:
rb_apply_order_by(rb, ret);
rb_apply_skip_limit(rb, ret->skip, ret->limit > 0 ? ret->limit : max_rows);
if (ret->distinct) {
rb_apply_distinct(rb);
}
rb_apply_distinct at line 2433 is self-contained (reads rb->rows, rb->col_count, rb->row_count, dedupes in place — no ordering dependencies).
Proposed fix (one-line reorder)
if (ret->distinct) {
rb_apply_distinct(rb);
}
rb_apply_order_by(rb, ret);
rb_apply_skip_limit(rb, ret->skip, ret->limit > 0 ? ret->limit : max_rows);
Happy to open a PR if useful.
Environment
/Users/*/.local/bin/codebase-memory-mcp(official install)Bug
RETURN DISTINCT ... ORDER BY ... LIMIT Nreturns far fewer rows than it should becauseDISTINCTis applied AFTERORDER BY + LIMITinstead of before. In standard Cypher (openCypher spec),DISTINCTis part of theRETURNprojection and must be evaluated BEFOREORDER BY → SKIP → LIMIT.Minimal reproducer
In a graph with ~72 distinct Swift files indexed, each containing ~30 symbol nodes:
Expected: up to 72 rows (all distinct Swift file paths).
Actual: only ~5 rows — the alphabetically earliest files. The top 100 rows by
file_path ASCcome entirely from the first 5 files (each has >20 nodes), andDISTINCTapplied to those 100 yields only ~5 distinct paths.Removing
ORDER BYmakes it work correctly:→ returns 72 distinct rows as expected.
Root cause
src/cypher/cypher.clines 3266–3270:rb_apply_distinctat line 2433 is self-contained (readsrb->rows,rb->col_count,rb->row_count, dedupes in place — no ordering dependencies).Proposed fix (one-line reorder)
Happy to open a PR if useful.