Skip to content

Cypher: count(DISTINCT x) not supported (parse error 'expected token type 85, got 9') #239

@maksodf

Description

@maksodf

Environment

  • cbm version: 0.6.0
  • Platform: macOS (Darwin arm64)

Bug

count(DISTINCT expr) — a standard openCypher aggregation — produces a parse error instead of counting distinct values. This is a common pattern for counting unique values in a column.

Minimal reproducer

MATCH (n) WHERE n.file_path CONTAINS '.swift'
RETURN count(DISTINCT n.file_path) AS distinct_swift_files

Expected: an integer (count of distinct file_path values).
Actual: parse error — expected token type 85, got 9 at pos 59.

Root cause

src/cypher/cypher.c, parse_aggregate_item at line 1109:

static int parse_aggregate_item(parser_t *p, cbm_return_item_t *item) {
    cbm_token_type_t ft = peek(p)->type;
    advance(p);
    expect(p, TOK_LPAREN);
    if (match(p, TOK_STAR)) {
        item->variable = heap_strdup("*");
    } else {
        if (parse_var_dot_prop(p, item) < 0) {
            return CBM_NOT_FOUND;
        }
    }
    expect(p, TOK_RPAREN);
    item->func = heap_strdup(agg_func_name(ft));
    return 0;
}

After expect(p, TOK_LPAREN), the parser immediately calls parse_var_dot_prop. It never checks for the DISTINCT keyword. TOK_DISTINCT exists in the enum already, so the token is being produced — it's just not accepted inside aggregate function calls.

Proposed fix

Add a bool distinct_in_agg field to cbm_return_item_t, have parse_aggregate_item check for TOK_DISTINCT after TOK_LPAREN, and have the aggregation executor dedupe values when the flag is set. Medium-size change (header + parser + executor).

Happy to open a PR if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions