Add MERGE ON CREATE SET / ON MATCH SET support (#1619)#2347
Add MERGE ON CREATE SET / ON MATCH SET support (#1619)#2347gregfelice wants to merge 4 commits intoapache:masterfrom
Conversation
Implements the openCypher-standard ON CREATE SET and ON MATCH SET
clauses for the MERGE statement. This allows conditional property
updates depending on whether MERGE created a new path or matched
an existing one:
MERGE (n:Person {name: 'Alice'})
ON CREATE SET n.created = timestamp()
ON MATCH SET n.updated = timestamp()
Implementation spans parser, planner, and executor:
- Grammar: new merge_actions_opt/merge_actions/merge_action rules
in cypher_gram.y, with ON keyword added to cypher_kwlist.h
- Nodes: on_match/on_create lists on cypher_merge, corresponding
on_match_set_info/on_create_set_info on cypher_merge_information,
and prop_expr on cypher_update_item (all serialized through
copy/out/read funcs)
- Transform: cypher_clause.c transforms ON SET items and stores
prop_expr for direct expression evaluation
- Executor: cypher_set.c extracts apply_update_list() from
process_update_list(); cypher_merge.c calls it at all merge
decision points (simple merge, terminal, non-terminal with
eager buffering, and first-clause-with-followers paths)
Key design choice: prop_expr stores the Expr* directly in
cypher_update_item rather than using prop_position into the scan
tuple. The planner strips target list entries for SET expressions
that CustomScan doesn't need, making prop_position references
dangling. By storing the expression directly (only for MERGE ON
SET items), we evaluate it with ExecInitExpr/ExecEvalExpr
independent of the scan tuple layout.
Includes regression tests covering: basic ON CREATE SET, basic
ON MATCH SET, combined ON CREATE + ON MATCH, multiple SET items,
expression evaluation, interaction with WITH clause, and edge
property updates.
All 31 regression tests pass.
|
Friendly ping — this PR adds MERGE ON CREATE SET / ON MATCH SET support (issue #1619), one of the most requested Cypher features for AGE. This is critical for users migrating from Kuzu (recently archived) and Neo4j. The implementation adds grammar rules, executor support, and full regression test coverage. Would really appreciate a review when someone has bandwidth. Thanks! |
There was a problem hiding this comment.
Pull request overview
This PR implements the ON CREATE SET and ON MATCH SET sub-clauses for MERGE statements (openCypher-standard feature, issue #1619). These allow conditional property updates depending on whether MERGE created a new path or matched an existing one.
Changes:
- New grammar rules (
ONkeyword,merge_actions_opt/actions/actionrules), new node fields (on_match/on_createoncypher_merge;on_match_set_info/on_create_set_infooncypher_merge_information;prop_exproncypher_update_item) with full serialization support - Extracted shared
apply_update_list()from theSETexecutor and wired it into all four MERGE execution paths (simple merge, terminal non-first clause, non-terminal eager-buffering path, first-clause-with-followers) - Regression tests covering basic ON CREATE SET, ON MATCH SET, combined clauses, multiple items, reverse ordering, duplicate clause error detection, and edge property updates
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/include/parser/cypher_kwlist.h |
Adds on as a RESERVED_KEYWORD |
src/backend/parser/cypher_gram.y |
Adds ON token and merge_actions_opt/merge_actions/merge_action grammar rules |
src/include/nodes/cypher_nodes.h |
Adds on_match/on_create to cypher_merge; on_match_set_info/on_create_set_info to cypher_merge_information; prop_expr to cypher_update_item |
src/backend/nodes/cypher_copyfuncs.c |
Copies new prop_expr, on_match_set_info, on_create_set_info fields |
src/backend/nodes/cypher_outfuncs.c |
Serializes new fields; fixes wrong comment (cypher_delete → cypher_merge) |
src/backend/nodes/cypher_readfuncs.c |
Deserializes new prop_expr, on_match_set_info, on_create_set_info fields |
src/backend/parser/cypher_clause.c |
Transforms ON MATCH/CREATE SET item lists and stores prop_expr for direct evaluation |
src/include/executor/cypher_utils.h |
Declares apply_update_list(); adds on_match_set_info/on_create_set_info to scan state |
src/backend/executor/cypher_set.c |
Extracts apply_update_list() from process_update_list(); adds prop_expr-based direct evaluation path |
src/backend/executor/cypher_merge.c |
Calls apply_update_list() at all four merge decision points; reorders mark_tts_isnull/ExecStoreVirtualTuple for correctness |
regress/sql/cypher_merge.sql |
Adds regression tests for ON CREATE SET, ON MATCH SET, errors, and cleanup |
regress/expected/cypher_merge.out |
Expected output for new test cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@gregfelice Please see Copilot's suggestions. |
|
Will do
…On Mon, Mar 2, 2026 at 1:53 PM John Gemignani ***@***.***> wrote:
*jrgemignani* left a comment (apache/age#2347)
<#2347 (comment)>
@gregfelice <https://github.com/gregfelice> Please see Copilot's
suggestions.
—
Reply to this email directly, view it on GitHub
<#2347 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABBXGH2CKSBODOEPJIOOND4OXKDLAVCNFSM6AAAAACWCZROSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSOBWGIZTKNJTHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Greg Felice
415.343.5227
***@***.***
linkedin.com/in/gregfelice
calendly.com/gregfelice/30min
|
…sor test - Move ExecInitExpr for ON CREATE/MATCH SET items from per-row execution in apply_update_list() to plan initialization in begin_cypher_merge(). Follows the established pattern used by cypher_target_node (id_expr_state, prop_expr_state). - Add prop_expr_state field to cypher_update_item with serialization support in outfuncs/readfuncs/copyfuncs. - apply_update_list() uses pre-initialized state when available, falls back to per-row init for plain SET callers. - Fix misleading comment: "ON MATCH SET" → "ON CREATE SET" for Case 1 first-run test. - Add Case 1 second-run test that triggers ON MATCH SET with a predecessor clause (MATCH ... MERGE ... ON MATCH SET).
|
Addressed all three Copilot suggestions:
All regression tests pass ( |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| LIMIT | ||
| MATCH MERGE | ||
| NOT NULL_P | ||
| OPERATOR OPTIONAL OR ORDER | ||
| ON OPERATOR OPTIONAL OR ORDER |
There was a problem hiding this comment.
The ON token is added to the %token list here (making it a lexical keyword), but it is NOT added to the safe_keywords production rule later in the file (around lines 2431-2477). All other keywords added as tokens in this file are also listed in safe_keywords so they can be used as identifiers, property names, or labels via the schema_name → reserved_keyword → safe_keywords path.
By omitting ON from safe_keywords, property keys or node labels named on (e.g., n.on, MATCH (n:on)) will now fail to parse with a syntax error. This is a breaking change for any graph using on as a property name or label. Adding | ON { $$ = KEYWORD_STRDUP($1); } to the safe_keywords rule would fix this.
| ON MATCH SET n.b = 2 | ||
| RETURN n | ||
| $$) AS (n agtype); | ||
|
|
There was a problem hiding this comment.
There is no test covering the interaction of ON CREATE SET / ON MATCH SET with chained (non-terminal) MERGE statements, which exercises the eager-buffering code path added in PR #2344. The non-terminal MERGE path (lines 664-750 in cypher_merge.c) has ON CREATE SET and ON MATCH SET logic, but the tests only cover Case 1 with a MATCH predecessor (not a MERGE-then-MERGE chain). A test like:
MERGE (a:A {name: 'X'}) ON CREATE SET a.new = true
MERGE (b:B {name: 'Y'}) ON CREATE SET b.new = true
RETURN a, b
or with a chained terminal MERGE following a non-terminal MERGE with ON SET would validate that the eager buffering path handles ON SET correctly.
| -- Chained MERGE with ON CREATE SET in non-terminal and terminal clauses | |
| SELECT * FROM cypher('merge_actions', $$ | |
| MERGE (a:Person {name: 'ChainCreateA'}) | |
| ON CREATE SET a.new = true | |
| MERGE (b:Person {name: 'ChainCreateB'}) | |
| ON CREATE SET b.new = true | |
| RETURN a.name, a.new, b.name, b.new | |
| $$) AS (a_name agtype, a_new agtype, b_name agtype, b_new agtype); | |
| -- Chained MERGE with non-terminal ON CREATE SET and terminal ON MATCH SET | |
| -- Setup an existing node to be matched by the second MERGE | |
| SELECT * FROM cypher('merge_actions', $$ | |
| CREATE (p:Person {name: 'ChainMatch', seen: false}) | |
| $$) AS (n agtype); | |
| SELECT * FROM cypher('merge_actions', $$ | |
| MERGE (a:Person {name: 'ChainCreateOnce'}) | |
| ON CREATE SET a.created = true | |
| MERGE (b:Person {name: 'ChainMatch'}) | |
| ON MATCH SET b.seen = true | |
| RETURN a.name, a.created, b.name, b.seen | |
| $$) AS (a_name agtype, a_created agtype, b_name agtype, b_seen agtype); |
|
@gregfelice Please see the last 2 Copilot reviews |
|
will do
…On Wed, Mar 4, 2026 at 10:47 AM John Gemignani ***@***.***> wrote:
*jrgemignani* left a comment (apache/age#2347)
<#2347 (comment)>
@gregfelice <https://github.com/gregfelice> Please see the last 2 Copilot
reviews
—
Reply to this email directly, view it on GitHub
<#2347 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABBXGDY6IMMIQUNDJYOOA34PBFXTAVCNFSM6AAAAACWCZROSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSOJYGM4TINJWGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Greg Felice
415.343.5227
***@***.***
linkedin.com/in/gregfelice
calendly.com/gregfelice/30min
|
1. Add ON to safe_keywords in cypher_gram.y so that property keys and labels named 'on' still work (e.g., n.on, MATCH (n:on)). All other keywords added as tokens are also in safe_keywords. 2. Add chained (non-terminal) MERGE regression tests exercising the eager-buffering code path with ON CREATE SET and ON MATCH SET. First run creates both nodes (ON CREATE SET fires), second run matches both (ON MATCH SET fires). All regression tests pass (cypher_merge: ok).
|
Addressed the remaining Copilot suggestions:
All regression tests pass ( |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| PG_KEYWORD("merge", MERGE, RESERVED_KEYWORD) | ||
| PG_KEYWORD("not", NOT, RESERVED_KEYWORD) | ||
| PG_KEYWORD("null", NULL_P, RESERVED_KEYWORD) | ||
| PG_KEYWORD("on", ON, RESERVED_KEYWORD) |
There was a problem hiding this comment.
The ON keyword is registered as RESERVED_KEYWORD in cypher_kwlist.h, which means it cannot be used as a plain identifier (variable name) in Cypher queries. While it is listed in safe_keywords in the grammar (so it can be used as a label, property key, or schema-qualified name), using ON as a variable name (e.g., MATCH (on)) would fail because var_name only accepts symbolic_name (i.e., IDENTIFIER), and reserved keywords are not identifiers.
If any existing user queries use on as a variable name, this would be a breaking change. Consider whether on needs to be RESERVED_KEYWORD or if it could use a different classification (though the current grammar likely requires it to be RESERVED_KEYWORD to be recognized as the start of ON MATCH SET / ON CREATE SET in the merge context). At minimum, the regression tests should cover the case of using on as an identifier/property key to confirm backward compatibility.
src/backend/parser/cypher_clause.c
Outdated
| foreach(lc2, merge_information->on_match_set_info->set_items) | ||
| { | ||
| cypher_update_item *item = lfirst(lc2); | ||
| TargetEntry *set_tle = get_tle_by_resno(query->targetList, | ||
| item->prop_position); | ||
| if (set_tle != NULL) | ||
| item->prop_expr = (Node *)set_tle->expr; | ||
| } |
There was a problem hiding this comment.
The prop_expr field is only populated when set_tle != NULL. The comment states that the planner may strip the prop_position target entry, which is precisely why prop_expr was introduced. However, if get_tle_by_resno returns NULL (which should not happen at parse-time since the TLE was just added, but is guarded defensively), prop_expr stays NULL, causing the executor to fall back to reading from scanTupleSlot->tts_values[prop_position - 1] — but this is exactly the entry that the planner might have stripped. This silent fallback path could cause incorrect behavior or a crash instead of a clear error.
An assertion or error should be added for the case where set_tle is NULL, since this would indicate an internal bug in the transform. The silent fallback to the potentially-stripped scan tuple position is dangerous.
src/backend/parser/cypher_clause.c
Outdated
| foreach(lc2, merge_information->on_create_set_info->set_items) | ||
| { | ||
| cypher_update_item *item = lfirst(lc2); | ||
| TargetEntry *set_tle = get_tle_by_resno(query->targetList, | ||
| item->prop_position); | ||
| if (set_tle != NULL) | ||
| item->prop_expr = (Node *)set_tle->expr; | ||
| } |
There was a problem hiding this comment.
The same issue described in the on_match_set_info loop (silent fallback to potentially-stripped prop_position) also applies here for on_create_set_info. An assertion or error should be added for the case when set_tle is NULL.
| if (css->on_create_set_info) | ||
| apply_update_list(&css->css, | ||
| css->on_create_set_info); |
There was a problem hiding this comment.
In the Case 1 (has predecessor, non-terminal) ON CREATE SET path (line 728-730), apply_update_list is called after process_path(css, prebuilt_path_array, true) with ecxt_scantuple still pointing to the lateral join's scantuple (set at the top of the loop). Unlike Case 2 (process_simple_merge, line 360) and Case 3 (line 988), ExecStoreVirtualTuple is not called before apply_update_list to ensure tts_nvalid is set correctly.
While this may work if the lateral join's projected slot already has tts_nvalid = natts after ExecProject, the entity values for the newly-created MERGE path entities (e.g., a fresh node b in MERGE (a)-[:R]->(b)) would be null in the lateral join's slot. However, apply_update_list reads the entity from scanTupleSlot->tts_values[entity_position - 1] to get its graphid for the UPDATE. If the entity slot is null (from the lateral join), this would fail at the tts_isnull check (line 453, if (scanTupleSlot->tts_isnull[update_item->entity_position - 1]) continue), silently skipping the ON CREATE SET update without applying it.
The fix would be to use a fresh scan tuple slot (like the sss->ss.ss_ScanTupleSlot approach used in Case 3), calling ExecStoreVirtualTuple before apply_update_list.
| if (css->on_create_set_info) | |
| apply_update_list(&css->css, | |
| css->on_create_set_info); | |
| if (css->on_create_set_info) | |
| { | |
| /* Use the scan tuple slot populated by process_path */ | |
| econtext->ecxt_scantuple = | |
| css->css.ss.ss_ScanTupleSlot; | |
| ExecStoreVirtualTuple(econtext->ecxt_scantuple); | |
| apply_update_list(&css->css, | |
| css->on_create_set_info); | |
| } |
| if (css->on_create_set_info) | ||
| apply_update_list(&css->css, css->on_create_set_info); |
There was a problem hiding this comment.
The same issue as in the non-terminal Case 1: in the terminal Case 1 ON CREATE SET path (line 824-828), apply_update_list is called with ecxt_scantuple pointing to the lateral join's projection (set at line 789-790). The newly-created entity's values would be null in this slot. Depending on whether process_path writes entity values into this lateral join slot (which it does via merge_vertex), this may or may not work, but it is inconsistent with the Case 2 and Case 3 approaches that call ExecStoreVirtualTuple first.
| if (css->on_create_set_info) | |
| apply_update_list(&css->css, css->on_create_set_info); | |
| if (css->on_create_set_info) | |
| { | |
| /* | |
| * Materialize the current virtual tuple into the scan | |
| * slot so that apply_update_list sees the values | |
| * corresponding to the newly-created path. | |
| * | |
| * This mirrors the behavior in the other MERGE cases, | |
| * which call ExecStoreVirtualTuple before | |
| * apply_update_list. | |
| */ | |
| ExecStoreVirtualTuple(econtext->ecxt_scantuple); | |
| apply_update_list(&css->css, css->on_create_set_info); | |
| } |
|
@gregfelice Yeah, yeah, Copilot has more to say :( If it weren't for this PR adding a new keyword and grammar I wouldn't worry so much. But, it does so we need to be sure everything works. |
…N keyword test - Move ExecStoreVirtualTuple before apply_update_list unconditionally in Case 1 non-terminal and terminal MERGE paths, matching the pattern at Case 3 (line 994). Ensures tts_nvalid is set for downstream ExecProject even when ON CREATE SET is absent. - Add resolve_merge_set_exprs() helper to deduplicate the prop_expr resolution loops for ON MATCH SET and ON CREATE SET. Includes ereport when target entry is missing (internal error, should never happen). - Add regression test for ON keyword as label name, confirming backward compatibility via safe_keywords grammar path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Addressed the remaining 5 Copilot suggestions:
All regression tests pass ( |
Summary
Implements the openCypher-standard
ON CREATE SETandON MATCH SETclauses for the MERGE statement, resolving #1619. This allows conditional property updates depending on whether MERGE created a new path or matched an existing one:Design
The implementation spans parser, planner, and executor:
Parser — New grammar rules (
merge_actions_opt,merge_actions,merge_action) incypher_gram.y. TheONkeyword is added tocypher_kwlist.h.Nodes —
on_match/on_createlists oncypher_merge, correspondingon_match_set_info/on_create_set_infooncypher_merge_information, andprop_exproncypher_update_item. All fields serialized through copy/out/read funcs.Transform —
cypher_clause.ctransforms ON SET items and storesprop_exprfor direct expression evaluation.Executor —
apply_update_list()is extracted fromprocess_update_list()incypher_set.cas reusable SET logic.cypher_merge.ccalls it at all merge decision points:Why prop_expr?
The PostgreSQL planner strips target list entries for SET expressions that the CustomScan doesn't reference. This makes
prop_positionreferences into the scan tuple dangling. The solution: store theExpr*directly incypher_update_item->prop_exprand evaluate it withExecInitExpr/ExecEvalExpr, independent of scan tuple layout. This is only done for MERGE ON SET items — regular SET continues to useprop_positionunchanged.Files changed (12)
src/include/parser/cypher_kwlist.hONkeywordsrc/backend/parser/cypher_gram.ysrc/include/nodes/cypher_nodes.hsrc/backend/nodes/cypher_copyfuncs.csrc/backend/nodes/cypher_outfuncs.csrc/backend/nodes/cypher_readfuncs.csrc/backend/parser/cypher_clause.csrc/include/executor/cypher_utils.happly_update_listdeclarationsrc/backend/executor/cypher_set.capply_update_list()fromprocess_update_list()src/backend/executor/cypher_merge.capply_update_listat all merge decision pointsregress/sql/cypher_merge.sqlregress/expected/cypher_merge.outTest plan
Closes #1619