Skip to content

OAK-12249: lazy ES index provisioning — skip creation for empty reindex#2955

Draft
bhabegger wants to merge 2 commits into
apache:trunkfrom
bhabegger:OAK-12249-lazy-provisioning
Draft

OAK-12249: lazy ES index provisioning — skip creation for empty reindex#2955
bhabegger wants to merge 2 commits into
apache:trunkfrom
bhabegger:OAK-12249-lazy-provisioning

Conversation

@bhabegger

Copy link
Copy Markdown
Contributor

Summary

  • Defers provisionIndex() from the ElasticIndexWriter constructor to the first updateDocument() or deleteDocuments() call when both FT_OAK-12249 and FT_OAK-12248 are enabled
  • A reindex that produces zero documents never creates an Elasticsearch index or alias, eliminating the empty-index problem described in OAK-12249
  • ensureProvisioned() handles the incremental-write-after-empty-reindex case: if an alias does not exist when the first document arrives, it creates a new backing index with a fresh seed and points the alias at it

Dependency on OAK-12248

This PR must be merged after #2950 (OAK-12248: graceful 404 handling).

This branch is currently based on OAK-12248-graceful-404 so the diff includes both changes. Once OAK-12248 merges to trunk this branch will be rebased and the diff will show only the OAK-12249 commit.

Deployment order is enforced at runtime: isLazyProvisioningActive() returns true only when both FT_OAK-12249 and FT_OAK-12248 are enabled. Enabling FT_OAK-12249 alone falls back to eager provisioning and logs a WARN.

Tests

Three new unit tests in ElasticIndexWriterTest:

  • lazyProvisioning_requiresGraceful404Toggle — asserts lazy provisioning is inactive when OAK-12248 toggle is off
  • emptyReindex_doesNotCreateEsIndex — verifies no ES index is created during construction when no documents are written
  • nonEmptyReindex_provisionsOnFirstDocument — verifies provisionIndex() is called on the first updateDocument() and not before

All 11 tests in ElasticIndexWriterTest pass.

Jira

https://issues.apache.org/jira/browse/OAK-12249

bhabegger and others added 2 commits June 15, 2026 08:12
When an Elasticsearch alias does not exist (index_not_found_exception / 404),
the query now returns an empty cursor and logs INFO instead of propagating
an ERROR via FulltextIndex.getPlans(). The fix is behind feature toggle
FT_OAK-12248 (disabled by default) registered in ElasticIndexProviderService.

The 404 is caught in ElasticIndexStatistics.getCountOrZeroOn404() so the
planner receives 0 estimated documents and proceeds normally; ElasticResultRow-
AsyncIterator.onFailure() / hasNext() also suppress the ERROR when the toggle
is on. ElasticGraceful404QueryTest verifies both toggle-off (ERROR expected)
and toggle-on (empty result, no ERROR, INFO logged) behaviours.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When FT_OAK-12249 and FT_OAK-12248 are both enabled, ElasticIndexWriter
defers provisionIndex() from the constructor to the first updateDocument()
or deleteDocuments() call. A reindex that produces zero documents never
creates an Elasticsearch index or alias, eliminating the empty-index
problem described in OAK-12249.

Deployment order is enforced at runtime: isLazyProvisioningActive() returns
true only when both toggles are on. Enabling FT_OAK-12249 alone logs a WARN
and falls back to eager provisioning, preventing 404 errors on query paths
that lack graceful 404 handling.

ensureProvisioned() handles the incremental-write-after-empty-reindex case:
if an alias does not exist when the first document arrives, it creates a new
backing index with a fresh seed and points the alias at it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bhabegger bhabegger force-pushed the OAK-12249-lazy-provisioning branch from d61927f to 77817e0 Compare June 15, 2026 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant