-
Notifications
You must be signed in to change notification settings - Fork 203
Description
Summary
flatten_join_alias_var_optimizer in src/backend/optimizer/util/clauses.c unconditionally called pfree(havingQual) even when flatten_join_alias_vars returned the same pointer (i.e., nothing was changed). This caused a use-after-free that led to non-deterministic ORCA fallback to the Postgres planner for correlated subqueries with GROUP BY () HAVING <outer_ref>.
Root Cause
In the original code:
Node *havingQual = queryNew->havingQual;
if (NULL != havingQual)
{
queryNew->havingQual = flatten_join_alias_vars(queryNew, havingQual);
pfree(havingQual); // ← always freed, even when pointer unchanged
}When flatten_join_alias_vars returns the same pointer (e.g., havingQual is an outer-reference Var with varlevelsup=1 — nothing to flatten), the code frees the live node and leaves queryNew->havingQual pointing to freed memory.
Observed Mechanism (Debug Instrumentation)
For the query:
SELECT v.c, (SELECT count(*) FROM gstest2 GROUP BY () HAVING v.c)
FROM (VALUES (false),(true)) v(c) ORDER BY v.c;The inner subquery's havingQual is v.c (a T_Var, nodeTag=150). Debug output:
DEBUG flatten_join_alias_var_optimizer: pfree havingQual=0x55fc9d054080 (same=1) nodeTag_before=150
DEBUG after pfree: havingQual=0x55fc9d054080 nodeTag_after=2139062143 ← freed (0x7F7F7F7F)
DEBUG copyQuery: havingQual=0x55fc9d054080 nodeTag=596 ← memory REUSED as T_List!
Step-by-step:
pfree(v.c Var)at address0x55fc9d054080— returned to palloc free poolEliminateDistinctClausecallsgpdb::CopyObject(query)→copyObjectImpl(T_Query)- While copying earlier fields (targetList, groupingSets…),
pallocreuses0x55fc9d054080for a newT_Listnode (nodeTag=596) COPY_NODE_FIELD(havingQual)callscopyObjectImpl(0x55fc9d054080)— now sees T_List instead of T_VarpqueryEliminateDistinct->havingQual= copy of a randomT_List- ORCA's query translator receives a
T_Listas the HAVING expression (expects a scalar boolean) - ORCA finds a
RangeTblEntryforgstest2inside that list and throws:GPORCA does not support the following feature: ({RTE :alias <> :eref {ALIAS :aliasname gstest2 ...} :rtekind 0 ...}) - This is a non-
ExmaGPDBGPOS exception → caught inCGPOptimizer::PlannedStmtFromQueryInternal→ ORCA falls back to Postgres planner
Why the Bug Went Unnoticed
The Postgres planner fallback produced the correct result (f | NULL), so no regression test ever failed. The memory corruption was silently masked.
The bug was exposed when fixing the same function's list_free guards on targetList/returningList (adding pointer-equality checks before freeing). After that fix, ORCA no longer fell back for this query — but ORCA's decorrelation logic for GROUP BY () HAVING <outer_ref> was incorrect, producing wrong results (f | 0 instead of f | NULL).
Fix
Guard pfree with a pointer-equality check (same pattern already applied to targetList, returningList, scatterClause, limitOffset, limitCount):
Node *havingQual = queryNew->havingQual;
if (NULL != havingQual)
{
queryNew->havingQual = flatten_join_alias_vars(queryNew, havingQual);
if (havingQual != queryNew->havingQual) // ← only free if mutated
pfree(havingQual);
}Related ORCA Fix
The ORCA decorrelation bug exposed by this fix (incorrect COALESCE(count(*), 0) applied to GROUP BY () HAVING <outer_ref>) was separately fixed in src/backend/gporca/libgpopt/src/xforms/CSubqueryHandler.cpp by detecting the correlated-HAVING pattern in SSubqueryDesc::Psd() and forcing m_fCorrelatedExecution = true to route through the SubPlan (correlated execution) path instead of the incorrect Left Outer Join + COALESCE decorrelation path.