Describe the bug / opportunity
LogicalPlan is 320 bytes on the stack today, but the typical query-execution path never produces the variants that drive that size. The Ddl(DdlStatement) variant is the offender: it carries CreateExternalTable (312 bytes) and CreateFunction (288 bytes), and the enum-size rule (max(variant) + tag) forces the whole LogicalPlan enum to the same width on every code path — including SELECT queries that will never instantiate a DDL node.
This shows up directly on the planning hot path. Profiling sql_planner (samply, logical_plan_tpch_all) on macOS aarch64:
55% in sql_planner binary (DataFusion + Rust stdlib)
31% libsystem_malloc.dylib (malloc / free / realloc)
13% libsystem_platform.dylib (memcpy / memmove)
1% other (kernel, dyld, pthread)
A non-trivial share of the 13% memcpy/memmove time is LogicalPlan moves: every std::mem::take in the optimizer's in-place rewriters, every owned-API LogicalPlan::map_*, every Arc<LogicalPlan> write currently shuffles 320 bytes, even when the loaded variant is something small like Projection (40 bytes) or Filter (128 bytes).
Per-variant sizes
=== LogicalPlan enum total ===
320 bytes LogicalPlan
=== Per-variant inner struct ===
40 bytes Projection
128 bytes Filter
40 bytes Window
64 bytes Aggregate
48 bytes Sort
176 bytes Join
40 bytes Repartition
32 bytes Union
56 bytes Subquery
72 bytes SubqueryAlias
24 bytes Limit
88 bytes Distinct
16 bytes Extension
56 bytes RecursiveQuery
48 bytes Analyze
48 bytes Explain
168 bytes TableScan
32 bytes Values
144 bytes Unnest
96 bytes DmlStatement
120 bytes CreateMemoryTable
96 bytes CreateView
88 bytes DistinctOn
56 bytes Statement
320 bytes DdlStatement <-- forces LogicalPlan to 320
16 bytes EmptyRelation
16 bytes DescribeTable
=== Inside DdlStatement ===
312 bytes CreateExternalTable <-- dominates DdlStatement
288 bytes CreateFunction <-- second-largest
144 bytes CreateIndex
72 bytes DropTable / DropView
48 bytes DropCatalogSchema
40 bytes CreateCatalog / CreateCatalogSchema / DropFunction
If CreateExternalTable and CreateFunction are Boxed inside DdlStatement, the max DDL variant drops to CreateIndex at 144 bytes, the max LogicalPlan variant becomes Join at 176, and LogicalPlan shrinks to 176 bytes (–45%) — the enum discriminant fits inside Join's alignment padding, so LogicalPlan ends up the same width as Join itself. Paid for by one heap allocation per DDL plan, which is negligible because DDL plans are not on the per-query hot path.
To Reproduce
// in datafusion/expr, with all relevant types in scope:
println!("{}", std::mem::size_of::<LogicalPlan>()); // 320
println!("{}", std::mem::size_of::<DdlStatement>()); // 320
println!("{}", std::mem::size_of::<CreateExternalTable>()); // 312
println!("{}", std::mem::size_of::<CreateFunction>()); // 288
Expected behavior
LogicalPlan should not be sized by variants that never appear on the query path. Moving the two outsized DDL variants behind a Box brings LogicalPlan to a size driven by Join (176 bytes), which is paid by every plan node on every query.
Additional context
Local cargo bench -p datafusion --bench sql_planner --quick on macOS aarch64, comparing main vs. boxed DDL variants:
| bench |
main |
boxed |
delta |
optimizer_tpch_all |
8.61 ms |
8.18 ms |
–5.0% |
optimizer_tpcds_all |
168.0 ms |
163.5 ms |
–2.7% |
Smaller benches (sub-200 µs) are within --quick noise.
CI bench on the GKE aarch64 runner should give a tighter signal; willing to open a draft PR so a maintainer can trigger it.
Describe the bug / opportunity
LogicalPlanis 320 bytes on the stack today, but the typical query-execution path never produces the variants that drive that size. TheDdl(DdlStatement)variant is the offender: it carriesCreateExternalTable(312 bytes) andCreateFunction(288 bytes), and the enum-size rule (max(variant) + tag) forces the wholeLogicalPlanenum to the same width on every code path — including SELECT queries that will never instantiate a DDL node.This shows up directly on the planning hot path. Profiling
sql_planner(samply,logical_plan_tpch_all) on macOS aarch64:A non-trivial share of the 13% memcpy/memmove time is
LogicalPlanmoves: everystd::mem::takein the optimizer's in-place rewriters, every owned-APILogicalPlan::map_*, everyArc<LogicalPlan>write currently shuffles 320 bytes, even when the loaded variant is something small likeProjection(40 bytes) orFilter(128 bytes).Per-variant sizes
If
CreateExternalTableandCreateFunctionareBoxed insideDdlStatement, the max DDL variant drops toCreateIndexat 144 bytes, the maxLogicalPlanvariant becomesJoinat 176, andLogicalPlanshrinks to 176 bytes (–45%) — the enum discriminant fits insideJoin's alignment padding, soLogicalPlanends up the same width asJoinitself. Paid for by one heap allocation per DDL plan, which is negligible because DDL plans are not on the per-query hot path.To Reproduce
Expected behavior
LogicalPlanshould not be sized by variants that never appear on the query path. Moving the two outsized DDL variants behind aBoxbringsLogicalPlanto a size driven byJoin(176 bytes), which is paid by every plan node on every query.Additional context
Local
cargo bench -p datafusion --bench sql_planner --quickon macOS aarch64, comparing main vs. boxed DDL variants:optimizer_tpch_alloptimizer_tpcds_allSmaller benches (sub-200 µs) are within
--quicknoise.CI bench on the GKE aarch64 runner should give a tighter signal; willing to open a draft PR so a maintainer can trigger it.