[Refactor](Multi Catalog) Unify external meta cache framework#60937
[Refactor](Multi Catalog) Unify external meta cache framework#60937morningman merged 3 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run external |
|
run buildall |
d86d58c to
d3bbd5e
Compare
|
run buildall |
TPC-H: Total hot run time: 28796 ms |
|
run buildall |
TPC-DS: Total hot run time: 183863 ms |
|
run buildall |
1 similar comment
|
run buildall |
TPC-H: Total hot run time: 28648 ms |
TPC-DS: Total hot run time: 184044 ms |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 28843 ms |
TPC-DS: Total hot run time: 185041 ms |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Code Review Summary — Unified External Metadata Cache Framework
Overall this is a well-structured refactoring that introduces a clean 3-level cache model (Engine → Catalog → Entry) with SPI-based plugin support. The migration of Iceberg/Paimon/Hudi/MaxCompute/Doris to engine-level adapters while keeping Hive on a legacy path is a sound incremental rollout strategy.
Critical Checkpoints
Goal & Correctness: The goal is to unify scattered per-engine cache managers into a single framework with consistent lifecycle, invalidation, and configuration. The framework structure accomplishes this, but there is a type-safety bug in the generic getSchemaCacheValue path (see inline comment) that would cause IllegalArgumentException at runtime for any engine registering a SchemaCacheKey subclass.
Concurrency: The framework relies on ConcurrentHashMap for engineCaches and CatalogEntryGroup, and Caffeine caches are inherently thread-safe. CatalogEntryGroup extends ConcurrentHashMap which is fine. The routeCatalogEngines broadcast pattern is safe because safeInvalidate catches IllegalStateException for uninitialized catalogs. No new lock hierarchy concerns introduced.
Lifecycle Management: CatalogMgr.removeCatalog() now calls catalog.onClose() before removing from maps and calls removeCatalog() on cache mgr before removing from idToCatalog — this is a correctness improvement ensuring cache cleanup while catalog is still accessible.
Configuration: CacheSpec handles enable/ttl/capacity with compatibility key mapping. Dynamic changes are supported through notifyPropertiesUpdated() which does removeCatalog + prepareCatalog.
Parallel Code Paths: The DefaultExternalMetaCache serves as fallback for engines without dedicated entries. Each ExternalTable subclass routes via getMetaCacheEngine(). The base ExternalTable.getMetaCacheEngine() throws by default — table types that forget to override will fail at runtime.
Test Coverage: Good unit test coverage for CacheSpec, MetaCacheEntry, SPI loading, Iceberg, and Paimon caches. Missing: No unit tests for HudiExternalMetaCache (the most complex engine cache with partition values, fs_view, and meta_client entries), MaxComputeExternalMetaCache, or DorisExternalMetaCache.
Performance: routeCatalogEngines broadcasts prepareCatalog to ALL 7 engines for every catalog, creating ~19 empty Caffeine cache instances per catalog regardless of catalog type. Memory impact is modest (~tens of KB per catalog) but scales linearly. Consider filtering by relevant engine or documenting this as intentional.
Observability: No new metrics or logging for cache hit/miss rates at the engine level, though Caffeine's built-in stats are exposed via MetaCacheEntry.getStats().
Issues Found
-
[Bug]
ensureTypeCompatibleuses strictClass.equals()which breaks when engine-specificSchemaCacheKeysubclasses are used through the generic path (see inline comment onExternalMetaCacheMgr.java:406) -
[Design]
routeCatalogEnginesbroadcasts to all engines unconditionally — wasteful but functionally safe (see inline comment on line 343) -
[Risk]
ExternalTable.getMetaCacheEngine()throws by default — any new table type that forgets to override will fail at runtime with no compile-time safety net (see inline comment onExternalTable.java:219) -
[Test Gap] No unit tests for HudiExternalMetaCache despite being the most complex engine cache implementation
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalMetaCacheMgr.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalMetaCacheMgr.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/metacache/AbstractExternalMetaCache.java
Show resolved
Hide resolved
|
run buildall |
TPC-H: Total hot run time: 27864 ms |
TPC-DS: Total hot run time: 154005 ms |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 27849 ms |
TPC-DS: Total hot run time: 153786 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
5f7fd7a to
9fd79d3
Compare
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 27128 ms |
TPC-DS: Total hot run time: 168966 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Code Review: [Refactor](Multi Catalog) Unify external meta cache framework
Reviewed ~95 files (6810 additions, 3610 deletions). The new 3-layer framework (Manager → Engine → Entry) is well-structured and the MetaCacheEntryDef/CatalogEntryGroup/CacheSpec abstractions are clean. Below are findings organized by the Critical Checkpoints from Part 1.3.
Critical Checkpoint Analysis
Goal & correctness: The PR unifies 6+ engine-specific cache implementations under a common framework. The refactoring preserves existing cache semantics (TTL, capacity, invalidation granularity). Tests cover routing, delegation, and basic cache operations.
Modification scope: Large but focused — all changes serve the unification goal. No unrelated changes detected.
Concurrency: See inline comments. getCatalogProperties() can throw unchecked IllegalStateException on query-path threads racing with catalog removal. CatalogMgr.removeCatalog() calls onClose() (blocking I/O) while holding writeLock and before removing catalog from maps, exposing partially torn-down state.
Configuration: CacheSpec introduces property-based config (meta.cache.<engine>.<entry>.*). Dynamic changes are handled via prepareCatalogByEngine re-initialization on property change events in ExternalCatalog.notifyCachePropertiesChanged(). This looks correct.
Incompatible changes: catalog_meta_cache_statistics schema changed from 4 columns to 23. The FE filterColumns mechanism provides partial backward compatibility (new FE + old BE works). However, new BE + old FE would fail. Since this is an information_schema table (not user data), impact is limited to monitoring queries during rolling upgrade.
Parallel code paths: invalidateCatalog vs invalidateCatalogEntries vs removeCatalog — semantics are distinct and correctly separated. Engine-specific invalidateCatalog overrides (Iceberg, Hudi) correctly call cleanup before super.invalidateCatalog().
Test coverage: Route resolution, schema cache delegation, Doris/Hudi engine caches have tests. Missing: no test for concurrent catalog removal + cache access race; no test for Iceberg/Paimon/MaxCompute engine caches; no test for CacheSpec property parsing; no test for MetaCacheEntryStats collection accuracy.
Observability: The expanded catalog_meta_cache_statistics table (23 columns with hit rates, load times, eviction counts, etc.) is a significant improvement over the previous 4-column version.
Performance: safeInvalidate short-circuits via isCatalogInitialized check — good. Cache loaders use memoized suppliers for lazy initialization (Iceberg snapshots, Paimon projections) — good. No obvious hot-path regressions.
See inline comments for specific issues.
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalMetaCacheMgr.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/hudi/HudiExternalMetaCache.java
Outdated
Show resolved
Hide resolved
|
run buildall |
TPC-H: Total hot run time: 26898 ms |
TPC-DS: Total hot run time: 168515 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
What problem does this PR solve?
Part of #60686
Summary
This PR continues the external metadata cache refactor by introducing a unified engine-scoped cache framework and aligning multiple external catalog implementations on top of it.
The goal is not to finish every historical cache migration in one shot. The goal is to make the framework shape explicit and consistent enough that later migrations can continue on top of one model instead of adding more engine-specific cache flows.
At a high level, this PR does three things:
Framework View
The current framework can be viewed as three layers:
flowchart TD A["ExternalMetaCacheMgr"] --> B["ExternalMetaCacheRegistry"] A --> C["ExternalMetaCacheRouteResolver"] A --> D["ExternalMetaCache(engine facade)"] D --> E["AbstractExternalMetaCache"] E --> F["CatalogEntryGroup(catalog scoped)"] F --> G["MetaCacheEntry(table/schema/partition/...)"] H["IcebergExternalMetaCache"] --> E I["PaimonExternalMetaCache"] --> E J["HudiExternalMetaCache"] --> E K["MaxComputeExternalMetaCache"] --> E L["DorisExternalMetaCache"] --> E M["HiveExternalMetaCache"] --> EThis structure makes a few framework boundaries explicit:
Main Changes
datasource.metacacheExternalMetaCacheMgrso registration and routing are more explicit instead of staying mixed in one manager pathEngine Alignment Direction
This PR is mainly about pulling different engines closer to one framework shape.
The important point is not that every engine is identical now. The important point is that they are being moved toward one consistent framework model.
Compatibility
Validation
mvn -pl fe-core -am -DskipTests compileCheck List (For Reviewer who merge this PR)