Skip to content

HIVE-29281: Make proactive cache eviction work with catalog#6379

Open
Neer393 wants to merge 1 commit intoapache:masterfrom
Neer393:HIVE-29281
Open

HIVE-29281: Make proactive cache eviction work with catalog#6379
Neer393 wants to merge 1 commit intoapache:masterfrom
Neer393:HIVE-29281

Conversation

@Neer393
Copy link
Copy Markdown
Contributor

@Neer393 Neer393 commented Mar 19, 2026

What changes were proposed in this pull request?

Made the proactive cache eviction catalog aware by making changes in ProactiveCacheEviction file and the CacheTag file.

Why are the changes needed?

The proactive cache eviction should be catalog aware otherwise same name tables under different catalogs may cause false cache hits/miss. To avoid this, the cache eviction should be aware of the catalog.

Does this PR introduce any user-facing change?

No user facing changes as user does not know about the proactive cache eviction.

How was this patch tested?

Added unit tests for with and without catalog and all of them passed. Not sure how to manually test proactive cache eviction so verified only via unit tests

@Neer393
Copy link
Copy Markdown
Contributor Author

Neer393 commented Mar 20, 2026

@zhangbutao I need a review here. I looked at all the merged PRs under HIVE-22820 and have made changes accordingly for making it catalog aware. Please help me here if I missed anything. Thanks

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes LLAP proactive cache eviction catalog-aware by propagating catalog names through cache tags and eviction requests, preventing collisions when identical db/table names exist across catalogs.

Changes:

  • Extend LLAP proactive eviction request structure to include catalog scoping (catalog → db → table → partitions).
  • Introduce catalog tracking on TableDesc/PartitionDesc and update cache-tag generation to include catalog-qualified names.
  • Update LLAP cache metadata serialization and unit tests to reflect catalog-qualified cache tags.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
storage-api/src/java/org/apache/hadoop/hive/common/io/CacheTag.java Updates cache tag semantics/docs and parent-tag derivation to preserve catalog prefix.
ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java Adds catalogName field and updates constructors/clone to carry catalog without polluting EXPLAIN.
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java Exposes catalog name via PartitionDesc based on TableDesc, with default fallback.
ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java Makes eviction requests catalog-scoped and includes catalog in proto requests and tag matching.
llap-common/src/protobuf/LlapDaemonProtocol.proto Adds catalog name field to EvictEntityRequestProto.
ql/src/java/org/apache/hadoop/hive/llap/LlapHiveUtils.java Prefixes cache tags with catalog when deriving metrics tags.
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java Adjusts eviction debug logging for catalog+db structure.
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapCacheMetadataSerializer.java Adds backward-ish handling for cache tags missing catalog during decode.
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java Updates synthetic tag to include default catalog prefix.
ql/src/java/org/apache/hadoop/hive/ql/ddl/** Ensures eviction builders are invoked with catalog where available.
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java Ensures TableDesc created from Table carries catalog name.
ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java Updates TableDesc construction call sites for new signature.
Various test files Update existing tests and add new coverage for catalog-aware eviction and proto round-trips.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +337 to 344
/**
* Add a partition of a table scoped to the given catalog.
*/
public Builder addPartitionOfATable(String catalog, String db, String tableName,
LinkedHashMap<String, String> partSpec) {
ensureTable(catalog, db, tableName);
entities.get(catalog).get(db).get(tableName).add(partSpec);
return this;
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request.Builder claims the catalog key defaults to Warehouse.DEFAULT_CATALOG_NAME, but the builder currently stores the catalog parameter as-is. If a caller passes null (or an empty string), this will create a null key and later NPE in toProtoRequests() when calling toLowerCase(). Normalize catalog (and arguably db/table) at the builder boundary, e.g. default null/blank catalog to the default catalog and enforce non-null keys.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All calls made to this will never be null.

@Neer393
Copy link
Copy Markdown
Contributor Author

Neer393 commented Mar 25, 2026

@zhangbutao resolved the copilot's reviews

@zhangbutao
Copy link
Copy Markdown
Contributor

@zhangbutao resolved the copilot's reviews

Thanks for pinging me. I will do the code review later.

@Neer393
Copy link
Copy Markdown
Contributor Author

Neer393 commented Apr 2, 2026

Made changes as per requested.
Need approval here @zhangbutao @deniskuzZ

@Neer393
Copy link
Copy Markdown
Contributor Author

Neer393 commented Apr 3, 2026

Fixed sonarqube issues as well. All good for approval and merge @zhangbutao @deniskuzZ

Copy link
Copy Markdown
Contributor

@zhangbutao zhangbutao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
I think most of This PR is fine. Thanks.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +222 to +228
public String getCatalogName() {
return catalogName;
}

public void setCatalogName(String catalogName) {
this.catalogName = catalogName == null ? Warehouse.DEFAULT_CATALOG_NAME : catalogName;
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableDesc.catalogName can remain null for instances built via the no-arg constructor + setters, even though the new constructor/setter normalize null to Warehouse.DEFAULT_CATALOG_NAME. Consider initializing catalogName eagerly (field initializer or in TableDesc()) so getCatalogName() never returns null. Also, equals()/hashCode() currently ignore catalogName, which can cause different-catalog descriptors to compare equal and collide in hash-based collections; include catalogName in both (or document why it must be excluded).

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +170
* the same DB name, and that getSingleCatalogName/getSingleDbName return null when multiple
* catalog-DB pairs are present.
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s Javadoc mentions getSingleCatalogName/getSingleDbName, but those methods no longer exist (they were replaced by hasCatalogName/hasDatabaseName). Update the comment to reflect the current API/behavior to avoid confusion for future maintainers.

Suggested change
* the same DB name, and that getSingleCatalogName/getSingleDbName return null when multiple
* catalog-DB pairs are present.
* the same DB name, and that requests spanning multiple catalog-DB pairs are not treated as
* having a single catalog or database; callers should use hasCatalogName/hasDatabaseName with
* explicit values instead.

Copilot uses AI. Check for mistakes.

public static CacheTag cacheTagBuilder(String dbAndTable, String... partitions) {
String[] parts = dbAndTable.split("\\.");
if(parts.length < 3) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space where ?

partDescs.put("mutli=one", "one=/1");
partDescs.put("mutli=two/", "two=2");
tag = CacheTag.build("math.rules", partDescs);
tag = CacheTag.build(Warehouse.DEFAULT_CATALOG_NAME + ".math.rules", partDescs);
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not handle DEFAULT_CATALOG_NAME inside CacheTag.build when size ==2?
you wouldn't need CacheTag.build(Warehouse.DEFAULT_CATALOG_NAME, "math.rules") either

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same reason as this - #6379 (comment)

if (part == null) {
return CacheTag.build(LlapUtil.getDbAndTableNameForMetrics(path, includeParts));
return CacheTag.build(
Warehouse.DEFAULT_CATALOG_NAME, LlapUtil.getDbAndTableNameForMetrics(path, includeParts));
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we handle path with catalog?
seems that we always go with default catalog when partitionDesc is null. would it ever be non-null for unpartitioned tables?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I understand but according to the current usages, all of them pass a partition.
Coming to your question of non partitioned tables, I tried but apart from that there is no way of determining the catalog. Could you suggest something ?

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants