Skip to content

Bug: flatten_join_alias_var_optimizer unconditional pfree causes use-after-free, triggering ORCA fallback via T_List type confusion #1618

@yjhjstz

Description

@yjhjstz

Summary

flatten_join_alias_var_optimizer in src/backend/optimizer/util/clauses.c unconditionally called pfree(havingQual) even when flatten_join_alias_vars returned the same pointer (i.e., nothing was changed). This caused a use-after-free that led to non-deterministic ORCA fallback to the Postgres planner for correlated subqueries with GROUP BY () HAVING <outer_ref>.

Root Cause

In the original code:

Node *havingQual = queryNew->havingQual;
if (NULL != havingQual)
{
    queryNew->havingQual = flatten_join_alias_vars(queryNew, havingQual);
    pfree(havingQual);   // ← always freed, even when pointer unchanged
}

When flatten_join_alias_vars returns the same pointer (e.g., havingQual is an outer-reference Var with varlevelsup=1 — nothing to flatten), the code frees the live node and leaves queryNew->havingQual pointing to freed memory.

Observed Mechanism (Debug Instrumentation)

For the query:

SELECT v.c, (SELECT count(*) FROM gstest2 GROUP BY () HAVING v.c)
FROM (VALUES (false),(true)) v(c) ORDER BY v.c;

The inner subquery's havingQual is v.c (a T_Var, nodeTag=150). Debug output:

DEBUG flatten_join_alias_var_optimizer: pfree havingQual=0x55fc9d054080 (same=1) nodeTag_before=150
DEBUG after pfree:  havingQual=0x55fc9d054080 nodeTag_after=2139062143   ← freed (0x7F7F7F7F)
DEBUG copyQuery:    havingQual=0x55fc9d054080 nodeTag=596                ← memory REUSED as T_List!

Step-by-step:

  1. pfree(v.c Var) at address 0x55fc9d054080 — returned to palloc free pool
  2. EliminateDistinctClause calls gpdb::CopyObject(query)copyObjectImpl(T_Query)
  3. While copying earlier fields (targetList, groupingSets…), palloc reuses 0x55fc9d054080 for a new T_List node (nodeTag=596)
  4. COPY_NODE_FIELD(havingQual) calls copyObjectImpl(0x55fc9d054080) — now sees T_List instead of T_Var
  5. pqueryEliminateDistinct->havingQual = copy of a random T_List
  6. ORCA's query translator receives a T_List as the HAVING expression (expects a scalar boolean)
  7. ORCA finds a RangeTblEntry for gstest2 inside that list and throws:
    GPORCA does not support the following feature:
    ({RTE :alias <> :eref {ALIAS :aliasname gstest2 ...} :rtekind 0 ...})
    
  8. This is a non-ExmaGPDB GPOS exception → caught in CGPOptimizer::PlannedStmtFromQueryInternalORCA falls back to Postgres planner

Why the Bug Went Unnoticed

The Postgres planner fallback produced the correct result (f | NULL), so no regression test ever failed. The memory corruption was silently masked.

The bug was exposed when fixing the same function's list_free guards on targetList/returningList (adding pointer-equality checks before freeing). After that fix, ORCA no longer fell back for this query — but ORCA's decorrelation logic for GROUP BY () HAVING <outer_ref> was incorrect, producing wrong results (f | 0 instead of f | NULL).

Fix

Guard pfree with a pointer-equality check (same pattern already applied to targetList, returningList, scatterClause, limitOffset, limitCount):

Node *havingQual = queryNew->havingQual;
if (NULL != havingQual)
{
    queryNew->havingQual = flatten_join_alias_vars(queryNew, havingQual);
    if (havingQual != queryNew->havingQual)   // ← only free if mutated
        pfree(havingQual);
}

Related ORCA Fix

The ORCA decorrelation bug exposed by this fix (incorrect COALESCE(count(*), 0) applied to GROUP BY () HAVING <outer_ref>) was separately fixed in src/backend/gporca/libgpopt/src/xforms/CSubqueryHandler.cpp by detecting the correlated-HAVING pattern in SSubqueryDesc::Psd() and forcing m_fCorrelatedExecution = true to route through the SubPlan (correlated execution) path instead of the incorrect Left Outer Join + COALESCE decorrelation path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions