[CORRUPTED] Synthetic Benchmark PR #136889 - Remove early phase failure in batched #16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Benchmark PR elastic#136889
Type: Corrupted (contains bugs)
Original PR Title: Remove early phase failure in batched
Original PR Description: Resolves elastic#134151, elastic#130821.
Background
A bug was introduced by elastic#121885 due to the following code, which handles batched query exceptions due to a batched partial reduction failure: https://github.com/elastic/elasticsearch/blob/bd356491b1e32b19993ed6cd70cc2415df1253ce/server/src/main/java/org/elasticsearch/action/search/SearchQueryThenFetchAsyncAction.java#L525-L544
Raising a phase failure in this way leads to a couple issues:
Solution
Problem 1 could be resolved with a simple flag, as proposed in elastic#131085. Problem 2 could be resolved with some careful use of the same flag to clean contexts upon receiving stale query results. However, in the interest of stability, I propose a solution that more closely resembles how a reduction failure is handled by a non-batched query phase. In non-batched, a reduction failure is held in the QueryPhaseResultConsumer until shard fanout is complete. Only later, during final reduction at the beginning of the fetch phase, do we fail the search.
Fast failure + proper task cancellation are worthy goals for the future. I am tracking these as follow-up improvements for after the release of batched query execution.
This PR:
Original PR URL: Remove early phase failure in batched elastic/elasticsearch#136889
Issues Breakdown