Skip to content

Commit c0e0bda

Browse files
authored
[ML] Fix Non-Deterministic Training Set Selection in RegressionIT testTwoJobsWithSameRandomizeSeedUseSameTrainingSet (#138063)
The test testTwoJobsWithSameRandomizeSeedUseSameTrainingSet fails intermittently because documents may be processed in different orders during reindexing. Since we use an online reservoir sampling algorithm, this order actually matters. To ensure deterministic reindexing of the document sequence, both the number of shards and the number of segments must be 1. This PR fixes the test by creating the source index with only 1 segment. This ensures deterministic document order during reindexing, resulting in consistent ID assignments and training set selection when using the same seed. Fixes #117805
1 parent c560ee0 commit c0e0bda

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

muted-tests.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,9 +58,6 @@ tests:
5858
- class: org.elasticsearch.xpack.test.rest.XPackRestIT
5959
method: test {p0=transform/transforms_reset/Test reset running transform}
6060
issue: https://github.com/elastic/elasticsearch/issues/117473
61-
- class: org.elasticsearch.xpack.ml.integration.RegressionIT
62-
method: testTwoJobsWithSameRandomizeSeedUseSameTrainingSet
63-
issue: https://github.com/elastic/elasticsearch/issues/117805
6461
- class: org.elasticsearch.packaging.test.ArchiveTests
6562
method: test44AutoConfigurationNotTriggeredOnNotWriteableConfDir
6663
issue: https://github.com/elastic/elasticsearch/issues/118208

x-pack/plugin/ml/qa/native-multi-node-tests/src/javaRestTest/java/org/elasticsearch/xpack/ml/integration/RegressionIT.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,9 @@ public void testStopAndRestart() throws Exception {
361361
public void testTwoJobsWithSameRandomizeSeedUseSameTrainingSet() throws Exception {
362362
String sourceIndex = "regression_two_jobs_with_same_randomize_seed_source";
363363
indexData(sourceIndex, 100, 0);
364+
// Force merge to single segment to ensure deterministic _doc sort order during reindexing
365+
// Without this, multiple segments or segment merges can cause non-deterministic document processing order
366+
client().admin().indices().prepareForceMerge(sourceIndex).setMaxNumSegments(1).setFlush(true).get();
364367

365368
String firstJobId = "regression_two_jobs_with_same_randomize_seed_1";
366369
String firstJobDestIndex = firstJobId + "_dest";

0 commit comments

Comments
 (0)