Skip to content

[Bug] (dynamic-partition) DynamicPartitionScheduler.runtimeInfos leaks entries on DROP TABLE, causing FE OOM #62883

@horus-leonardo

Description

@horus-leonardo

Search before asking

  • I had searched in the issues and found no similar issues.

Version

4.0.5-rc01 (commit 59de8c4c524). The same code paths are present on branch-4.0 HEAD and master HEAD as of today.

What's Wrong?

DynamicPartitionScheduler.runtimeInfos (Map<Long, Map<String,String>> keyed by tableId) accumulates entries indefinitely.

Entries are added by createOrUpdateRuntimeInfo() on every scheduler tick for tables with dynamic_partition.enable=true or partitionRetentionCount > 0.

The only place removeRuntimeInfo() is called in production code is ShowDynamicPartitionCommand.doRun() (fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/ShowDynamicPartitionCommand.java:107). That cleanup is opportunistic. It only runs when:

  • a user explicitly issues SHOW DYNAMIC PARTITION against a database;
  • the table is still present in db.getTables() (so DROP TABLE never reaches it — the table is already gone from the catalog);
  • the table no longer satisfies olapTable.dynamicPartitionExists().

No catalog mutation path calls it. Three scenarios leave stale entries behind:

  1. InternalCatalog.unprotectDropTable() — table is gone from the catalog, the runtimeInfos entry stays.
  2. DynamicPartitionScheduler.executeDynamicPartition() when db == null — the scheduler removes the pair from dynamicPartitionTableInfo via iterator.remove() and leaves runtimeInfos untouched.
  3. Same method when olapTable is null, an MTMV, or has lost both its dynamic_partition.enable flag and partitionRetentionCount — same story.

In a cluster where users don't run SHOW DYNAMIC PARTITION regularly (most automated ETL workloads), the map grows unbounded.

In our production cluster (4.0.5-rc01, ETL workload with frequent CREATE/DROP on dynamic_partition tables, ~24K DDL/hour), the FE OOMed after a few weeks of uptime. Heap dump:

  • runtimeInfos backed by a ConcurrentHashMap$Node[] of 2,097,152 buckets, roughly 1M–1.5M leaked entries.
  • 554 MB retained on DynamicPartitionScheduler (17% of live heap post-GC walk).
  • Dump file: 52 GiB, live heap 3.23 GB after reachability walk, the rest is the leak path holding old Database/Table graphs alive transitively through these stale entries.

What You Expected?

runtimeInfos.remove(tableId) should run when:

  1. A table is dropped — inside InternalCatalog.unprotectDropTable(), alongside db.unregisterTable().
  2. The scheduler removes a table from its working set in executeDynamicPartition(), in each of the iterator.remove(); continue; branches.

How to Reproduce?

Run a CREATE/DROP loop against a table with dynamic_partition.enable=true for long enough and watch FE heap. The leak is fastest under high-DDL-churn ETL workloads, but slow churn hits it eventually because nothing ever clears the map.

A self-contained repro:

CREATE DATABASE leak_repro;
USE leak_repro;
-- in a shell loop, repeat for N=10000 cycles:
CREATE TABLE t (
  k INT, dt DATE
) DUPLICATE KEY(k) PARTITION BY RANGE(dt) ()
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES (
  "dynamic_partition.enable"="true",
  "dynamic_partition.time_unit"="DAY",
  "dynamic_partition.start"="-3",
  "dynamic_partition.end"="3",
  "dynamic_partition.prefix"="p",
  "replication_num"="1"
);
DROP TABLE t;

Take a heap dump, open in Eclipse MAT, look at retained heap on DynamicPartitionScheduler. Bucket count on runtimeInfos will track the iteration count.

Anything Else?

Suggested patch (4 lines added across two files):

diff --git a/fe/fe-core/src/main/java/org/apache/doris/clone/DynamicPartitionScheduler.java b/fe/fe-core/src/main/java/org/apache/doris/clone/DynamicPartitionScheduler.java
@@ -671,6 +671,7 @@ public class DynamicPartitionScheduler extends MasterDaemon {
             Database db = Env.getCurrentInternalCatalog().getDbNullable(dbId);
             if (db == null) {
                 iterator.remove();
+                removeRuntimeInfo(tableId);
                 continue;
             }
@@ -688,6 +689,7 @@ public class DynamicPartitionScheduler extends MasterDaemon {
                             || !olapTable.getTableProperty().getDynamicPartitionProperty().getEnable())
                     && olapTable.getPartitionRetentionCount() <= 0) {
                 iterator.remove();
+                removeRuntimeInfo(tableId);
                 continue;
             } else if (olapTable.isBeingSynced()) {
diff --git a/fe/fe-core/src/main/java/org/apache/doris/datasource/InternalCatalog.java b/fe/fe-core/src/main/java/org/apache/doris/datasource/InternalCatalog.java
@@ -1027,6 +1027,8 @@ public class InternalCatalog implements CatalogIf<Database> {
         Env.getCurrentEnv().getQueryStats().clear(...);
         table.removeTableIdentifierFromPrimaryTable();
         db.unregisterTable(table.getId());
+        // Fix DynamicPartitionScheduler.runtimeInfos leak on DROP TABLE.
+        Env.getCurrentEnv().getDynamicPartitionScheduler().removeRuntimeInfo(table.getId());
         StopWatch watch = StopWatch.createStarted();
         Env.getCurrentRecycleBin().recycleTable(...);

A patched build is running in our production cluster since 2026-04-27. Same workload that previously grew runtimeInfos past 500 MB now keeps it flat.

Two analogous leaks fixed in the past by similar remove() calls on the cleanup path:

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions