Skip to content

Conversation

@valeriy42
Copy link
Contributor

@valeriy42 valeriy42 commented Nov 13, 2025

The test testTwoJobsWithSameRandomizeSeedUseSameTrainingSet fails intermittently because documents may be processed in different orders during reindexing. Since we use an online reservoir sampling algorithm, this order actually matters. To ensure deterministic reindexing of the document sequence, both the number of shards and the number of segments must be 1.

This PR fixes the test by creating the source index with only 1 segment. This ensures deterministic document order during reindexing, resulting in consistent ID assignments and training set selection when using the same seed.

Fixes #117805

@valeriy42 valeriy42 added the >test Issues or PRs that are addressing/adding tests label Nov 13, 2025
@valeriy42 valeriy42 added the :ml Machine learning label Nov 13, 2025

public void testTwoJobsWithSameRandomizeSeedUseSameTrainingSet() throws Exception {
String sourceIndex = "regression_two_jobs_with_same_randomize_seed_source";
indexData(sourceIndex, 100, 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexData is calling directly client().admin().indices().prepareCreate() instead of prepareCreate() of the test framework. This ensures that the index has always 1 shard.

However, it still can have multiple segments which then leads to non-deterministics order in which the reservoir sampling might get the documents. Hence, we need to fix both the shards and the segments to 1.

@valeriy42 valeriy42 marked this pull request as ready for review November 14, 2025 09:57
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@valeriy42 valeriy42 merged commit c0e0bda into elastic:main Nov 14, 2025
34 checks passed
@valeriy42 valeriy42 deleted the tests/is-117805 branch November 14, 2025 14:17
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Nov 16, 2025
* main: (135 commits)
  Mute org.elasticsearch.upgrades.IndexSortUpgradeIT testIndexSortForNumericTypes {upgradedNodes=1} elastic#138130
  Mute org.elasticsearch.upgrades.IndexSortUpgradeIT testIndexSortForNumericTypes {upgradedNodes=2} elastic#138129
  Mute org.elasticsearch.search.basic.SearchWithRandomDisconnectsIT testSearchWithRandomDisconnects elastic#138128
  [DiskBBQ] avoid EsAcceptDocs bug by calling cost before building iterator (elastic#138127)
  Log NOT_PREFERRED shard movements (elastic#138069)
  Improve bulk loading of binary doc values (elastic#137995)
  Add internal action for getting inference fields and inference results for those fields (elastic#137680)
  Address issue with DateFieldMapper#isFieldWithinQuery(...) (elastic#138032)
  WriteLoadConstraintDecider: Have separate rate limiting for canRemain and canAllocate decisions (elastic#138067)
  Adding NodeContext to TransportBroadcastByNodeAction (elastic#138057)
  Mute org.elasticsearch.simdvec.ESVectorUtilTests testSoarDistanceBulk elastic#138117
  Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeIT test elastic#137909
  Backport batched_response_might_include_reduction_failure version to 8.19 (elastic#138046)
  Add summary metrics for tdigest fields (elastic#137982)
  Add gp-llm-v2 model ID and inference endpoint (elastic#138045)
  Various tracing fixes (elastic#137908)
  [ML] Fixing KDE evaluate() to return correct ValueAndMagnitude object (elastic#128602)
  Mute org.elasticsearch.xpack.shutdown.NodeShutdownIT testStalledShardMigrationProperlyDetected elastic#115697
  [ML] Fix Flaky Audit Message Assertion in testWithDatastream for RegressionIT and ClassificationIT (elastic#138065)
  [ML] Fix Non-Deterministic Training Set Selection in RegressionIT testTwoJobsWithSameRandomizeSeedUseSameTrainingSet (elastic#138063)
  ...

# Conflicts:
#	rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.vectors/200_dense_vector_docvalue_fields.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning Team:ML Meta label for the ML team >test Issues or PRs that are addressing/adding tests v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] RegressionIT testTwoJobsWithSameRandomizeSeedUseSameTrainingSet failing

3 participants