tests: add weekly random DDL integration coverage#5542
Conversation
…into 0115-ddl-test
…into 0115-ddl-test
# Conflicts: # Makefile
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughAdds a new ChangesEvent broker sync-point scan fix
Random DDL test runner binary and integration test scenarios
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive random DDL and DML weekly integration testing suite, including a deterministic workload generator and verifier utility (random_ddl_test_runner) and several test scenarios (single capture, multi-capture, failover, and slow lossy DDL). It also updates the event broker to handle pending syncpoint events when advancing scan windows. The review feedback identifies multiple high-severity data races in ddl_worker.go and extra_workers.go due to concurrent unsynchronized reads of table names via tbl.fqName(), and suggests a minor improvement in bootstrap.go to use the standard library's math.MaxInt32 instead of a custom bitwise function.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| start := time.Now() | ||
| _, err := db.ExecContext(ctx, sqlText) | ||
| if err == nil { | ||
| apply() | ||
| selector.record(kind.name) | ||
| } | ||
| if logger != nil { | ||
| logger.Printf("ddl worker=%d kind=%s target=%s elapsed=%s err=%v", | ||
| workerID, kind.name, tbl.fqName(), time.Since(start), err) | ||
| } | ||
| if trace != nil { | ||
| trace.record(kind.name, tbl.fqName(), sqlText, err) | ||
| } |
There was a problem hiding this comment.
The tbl.fqName() method reads tbl.name without holding tbl.mu. Since tbl.name can be mutated concurrently by the apply function of genRenameTable under tbl.mu.Lock(), calling tbl.fqName() concurrently without a lock causes a data race. To prevent this, retrieve the fully qualified name while holding tbl.mu.Lock() before executing the DDL, and use the cached string for logging and tracing.
| start := time.Now() | |
| _, err := db.ExecContext(ctx, sqlText) | |
| if err == nil { | |
| apply() | |
| selector.record(kind.name) | |
| } | |
| if logger != nil { | |
| logger.Printf("ddl worker=%d kind=%s target=%s elapsed=%s err=%v", | |
| workerID, kind.name, tbl.fqName(), time.Since(start), err) | |
| } | |
| if trace != nil { | |
| trace.record(kind.name, tbl.fqName(), sqlText, err) | |
| } | |
| start := time.Now() | |
| tbl.mu.Lock() | |
| fqName := tbl.fqName() | |
| tbl.mu.Unlock() | |
| _, err := db.ExecContext(ctx, sqlText) | |
| if err == nil { | |
| apply() | |
| selector.record(kind.name) | |
| } | |
| if logger != nil { | |
| logger.Printf("ddl worker=%d kind=%s target=%s elapsed=%s err=%v", | |
| workerID, kind.name, fqName, time.Since(start), err) | |
| } | |
| if trace != nil { | |
| trace.record(kind.name, fqName, sqlText, err) | |
| } |
| func runBigInsertTxn(ctx context.Context, db *sql.DB, tbl *table, rows int) error { | ||
| // Build a single multi-row INSERT inside a transaction to create a "big txn" workload. | ||
| tbl.mu.Lock() | ||
| if !tbl.exists { | ||
| tbl.mu.Unlock() | ||
| return nil | ||
| } | ||
| schema := tbl.schema.clone() | ||
| startID := tbl.nextID | ||
| tbl.nextID += int64(rows) | ||
| tbl.mu.Unlock() | ||
|
|
||
| var cols []column | ||
| for _, c := range schema.columns { | ||
| if c.generated != "" { | ||
| continue | ||
| } | ||
| cols = append(cols, c) | ||
| } | ||
| if len(cols) == 0 { | ||
| return nil | ||
| } | ||
|
|
||
| colNames := make([]string, 0, len(cols)) | ||
| for _, c := range cols { | ||
| colNames = append(colNames, c.name) | ||
| } | ||
|
|
||
| var valuesSQL strings.Builder | ||
| var args []any | ||
| for i := 0; i < rows; i++ { | ||
| if i > 0 { | ||
| valuesSQL.WriteString(",") | ||
| } | ||
| valuesSQL.WriteString("(") | ||
| for j := range cols { | ||
| if j > 0 { | ||
| valuesSQL.WriteString(",") | ||
| } | ||
| valuesSQL.WriteString("?") | ||
| } | ||
| valuesSQL.WriteString(")") | ||
| rowID := startID + int64(i) | ||
| for _, c := range cols { | ||
| args = append(args, buildRandomValue(rand.New(rand.NewSource(rowID)), tbl, c, rowID)) | ||
| } | ||
| } | ||
|
|
||
| stmt := fmt.Sprintf("INSERT INTO %s (%s) VALUES %s", | ||
| tbl.fqName(), | ||
| backtickJoin(colNames), | ||
| valuesSQL.String(), | ||
| ) | ||
|
|
||
| tx, err := db.BeginTx(ctx, nil) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| if _, err := tx.ExecContext(ctx, stmt, args...); err != nil { | ||
| _ = tx.Rollback() | ||
| return err | ||
| } | ||
| return tx.Commit() | ||
| } | ||
|
|
There was a problem hiding this comment.
The tbl.fqName() method reads tbl.name without holding tbl.mu. Since tbl.name can be mutated concurrently by the apply function of genRenameTable under tbl.mu.Lock(), calling tbl.fqName() concurrently without a lock causes a data race. To prevent this, retrieve the fully qualified name while holding tbl.mu.Lock() at the beginning of runBigInsertTxn, and use the cached string to construct the SQL statement.
func runBigInsertTxn(ctx context.Context, db *sql.DB, tbl *table, rows int) error {
// Build a single multi-row INSERT inside a transaction to create a "big txn" workload.
tbl.mu.Lock()
if !tbl.exists {
tbl.mu.Unlock()
return nil
}
fqName := tbl.fqName()
schema := tbl.schema.clone()
startID := tbl.nextID
tbl.nextID += int64(rows)
tbl.mu.Unlock()
var cols []column
for _, c := range schema.columns {
if c.generated != "" {
continue
}
cols = append(cols, c)
}
if len(cols) == 0 {
return nil
}
colNames := make([]string, 0, len(cols))
for _, c := range cols {
colNames = append(colNames, c.name)
}
var valuesSQL strings.Builder
var args []any
for i := 0; i < rows; i++ {
if i > 0 {
valuesSQL.WriteString(",")
}
valuesSQL.WriteString("(")
for j := range cols {
if j > 0 {
valuesSQL.WriteString(",")
}
valuesSQL.WriteString("?")
}
valuesSQL.WriteString(")")
rowID := startID + int64(i)
for _, c := range cols {
args = append(args, buildRandomValue(rand.New(rand.NewSource(rowID)), tbl, c, rowID))
}
}
stmt := fmt.Sprintf("INSERT INTO %s (%s) VALUES %s",
fqName,
backtickJoin(colNames),
valuesSQL.String(),
)
tx, err := db.BeginTx(ctx, nil)
if err != nil {
return err
}
if _, err := tx.ExecContext(ctx, stmt, args...); err != nil {
_ = tx.Rollback()
return err
}
return tx.Commit()
}| func conflictWriter( | ||
| ctx context.Context, | ||
| db *sql.DB, | ||
| model *clusterModel, | ||
| seed int64, | ||
| cfg dmlConfig, | ||
| counters *dmlCounters, | ||
| ) { | ||
| // conflictWriter continuously upserts into a small key space to create write conflicts. | ||
| // This targets row-level contention and duplicate key paths. | ||
| if !cfg.KeyConflictEnabled || cfg.KeyConflictKeyspace <= 0 { | ||
| return | ||
| } | ||
| rng := rand.New(rand.NewSource(seed)) | ||
| targetTables := collectConflictTables(model) | ||
| if len(targetTables) == 0 { | ||
| return | ||
| } | ||
|
|
||
| for { | ||
| select { | ||
| case <-ctx.Done(): | ||
| return | ||
| default: | ||
| } | ||
|
|
||
| tbl := targetTables[rng.Intn(len(targetTables))] | ||
| tbl.mu.Lock() | ||
| exists := tbl.exists | ||
| tbl.mu.Unlock() | ||
| if !exists { | ||
| _ = sleepWithContext(ctx, 200*time.Millisecond) | ||
| continue | ||
| } | ||
|
|
||
| key := rng.Intn(cfg.KeyConflictKeyspace) + 1 | ||
| stmt := fmt.Sprintf("INSERT INTO %s (`id`,`a`,`b`,`c`,`d`,`e`,`bin`) VALUES (?,?,?,?,?,?,?) "+ | ||
| "ON DUPLICATE KEY UPDATE `a`=VALUES(`a`),`b`=VALUES(`b`),`c`=VALUES(`c`)", | ||
| tbl.fqName(), | ||
| ) | ||
| args := []any{ | ||
| int64(key), | ||
| int32(rng.Intn(1_000_000)), | ||
| randASCII(rng, 16), | ||
| deterministicDecimal(int64(key)), | ||
| deterministicTime(int64(key)), | ||
| fmt.Sprintf("{\"k\":%d}", key), | ||
| []byte(fmt.Sprintf("%064x", key)), | ||
| } | ||
| _, err := db.ExecContext(ctx, stmt, args...) | ||
| counters.record(err) | ||
| _ = sleepWithContext(ctx, 20*time.Millisecond) | ||
| } | ||
| } |
There was a problem hiding this comment.
The tbl.fqName() method reads tbl.name without holding tbl.mu. Since tbl.name can be mutated concurrently by the apply function of genRenameTable under tbl.mu.Lock(), calling tbl.fqName() concurrently without a lock causes a data race. To prevent this, retrieve the fully qualified name while holding tbl.mu.Lock() inside conflictWriter, and use the cached string to construct the SQL statement.
func conflictWriter(
ctx context.Context,
db *sql.DB,
model *clusterModel,
seed int64,
cfg dmlConfig,
counters *dmlCounters,
) {
// conflictWriter continuously upserts into a small key space to create write conflicts.
// This targets row-level contention and duplicate key paths.
if !cfg.KeyConflictEnabled || cfg.KeyConflictKeyspace <= 0 {
return
}
rng := rand.New(rand.NewSource(seed))
targetTables := collectConflictTables(model)
if len(targetTables) == 0 {
return
}
for {
select {
case <-ctx.Done():
return
default:
}
tbl := targetTables[rng.Intn(len(targetTables))]
tbl.mu.Lock()
exists := tbl.exists
fqName := tbl.fqName()
tbl.mu.Unlock()
if !exists {
_ = sleepWithContext(ctx, 200*time.Millisecond)
continue
}
key := rng.Intn(cfg.KeyConflictKeyspace) + 1
stmt := fmt.Sprintf("INSERT INTO %s (`id`,`a`,`b`,`c`,`d`,`e`,`bin`) VALUES (?,?,?,?,?,?,?) "+
"ON DUPLICATE KEY UPDATE `a`=VALUES(`a`),`b`=VALUES(`b`),`c`=VALUES(`c`)",
fqName,
)
args := []any{
int64(key),
int32(rng.Intn(1_000_000)),
randASCII(rng, 16),
deterministicDecimal(int64(key)),
deterministicTime(int64(key)),
fmt.Sprintf("{\"k\":%d}", key),
[]byte(fmt.Sprintf("%064x", key)),
}
_, err := db.ExecContext(ctx, stmt, args...)
counters.record(err)
_ = sleepWithContext(ctx, 20*time.Millisecond)
}
}| func mathMaxInt32() int64 { | ||
| return int64(^uint32(0) >> 1) | ||
| } |
There was a problem hiding this comment.
Actionable comments posted: 20
🧹 Nitpick comments (2)
tests/utils/random_ddl_test_runner/ddl_test.go (1)
9-47: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRename these tests to camelCase.
Lines 9 and 33 declare test functions with underscores, which violates the repo Go naming rule for functions. As per coding guidelines, "Functions should use camelCase naming and do not include underscores (e.g.,
getPartitionNum, notget_partition_num)."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/utils/random_ddl_test_runner/ddl_test.go` around lines 9 - 47, Rename the test functions in ddl_test.go to camelCase to match the Go naming rule and repo guidelines: update TestGenDropColumn_DoesNotDropPrimaryKey and TestGenAddPartition_RequiresRangePartition to use camelCase without underscores, while keeping their behavior and references to genDropColumn and genAddPartition unchanged.Source: Coding guidelines
tests/utils/random_ddl_test_runner/selector_test.go (1)
8-30: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRename this test to camelCase.
Line 8 uses underscores in the test function name, which violates the repo Go naming rule for functions. As per coding guidelines, "Functions should use camelCase naming and do not include underscores (e.g.,
getPartitionNum, notget_partition_num)."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/utils/random_ddl_test_runner/selector_test.go` around lines 8 - 30, The test function name in the selector test uses underscores and should be renamed to camelCase to match the repo’s Go naming rule. Update the `TestDDLSelector_CoverageDebtLikeBehavior` symbol in `selector_test.go` to a camelCase form with no underscores, keeping the same test logic and coverage behavior unchanged.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration_tests/weekly_rand_multi_failover/run.sh`:
- Line 149: The workdir cleanup in the script is using an unguarded WORK_DIR
value, which can make the reset dangerous if the variable is empty or malformed.
Update the reset logic around the rm -rf in the run.sh flow to validate WORK_DIR
before deletion, and only proceed when it is a non-empty expected path; use the
WORK_DIR handling near the rm -rf/mkdir sequence as the place to add the guard.
In `@tests/integration_tests/weekly_rand_multi/run.sh`:
- Line 143: The cleanup command in the weekly_rand_multi run script is unsafe
because rm -rf uses WORK_DIR unquoted and without validating it first. Update
the WORK_DIR cleanup logic to check that WORK_DIR is set and non-empty before
deleting, and quote the variable everywhere it is used in the rm/mkdir sequence
to avoid accidental expansion or deletion of the wrong directory tree.
- Around line 118-120: The MySQL sink fallback in the run script is hardcoded to
localhost, which bypasses the configured downstream endpoint. Update the sink
URI construction in the case branch that sets SINK_URI so it uses the same
downstream host and port variables already used elsewhere in the script,
matching the runner config and SQL target. Keep the change localized to the
SINK_URI assignment and use the existing DOWN_TIDB_HOST and DOWN_TIDB_PORT
symbols for consistency.
In `@tests/integration_tests/weekly_rand_single/run.sh`:
- Around line 113-115: The MySQL sink URI is hard-coded to localhost in the sink
selection branch, which makes it diverge from the configured downstream used
elsewhere in run.sh. Update the SINK_URI assignment in the MySQL case to use the
same downstream host and port variables already referenced by the runner config
and run_sql checks, so the changefeed replicates into the intended target. Keep
the fix localized to the sink URI setup logic in the shell script.
- Line 138: The cleanup step in the weekly_rand_single run script uses an
unquoted recursive delete on WORK_DIR, which can target the wrong path if the
variable is empty or malformed. Update the rm -rf invocation to guard against
empty WORK_DIR and quote the variable expansion, and keep the mkdir -p step tied
to the same validated WORK_DIR value so the safe path is only removed and
recreated when it is set correctly.
In `@tests/integration_tests/weekly_rand_slow_lossy_ddl/run.sh`:
- Line 95: Guard the workdir reset in the weekly_rand_slow_lossy_ddl run.sh
script before the rm -rf call, since WORK_DIR can be blank or malformed and
cause deletion outside the test sandbox. Update the startup cleanup around the
WORK_DIR handling so the recursive delete only runs when WORK_DIR is verified as
a safe, non-empty path, keeping the existing mkdir -p flow intact.
- Around line 123-128: The weekly_rand_slow_lossy_ddl run script exits right
after check_sync_diff, so late PANIC/FATAL/DATA RACE messages in captured logs
can be missed. Add the same post-diff shell-side log scan used by the sibling
weekly failover script immediately after check_sync_diff in run.sh, using the
existing scan_logs_for_failures helper and the current WORK_DIR/TEST_NAME
context to detect failures before printing success.
In `@tests/utils/random_ddl_test_runner/bootstrap.go`:
- Around line 17-18: The bootstrap flow is using an unbounded context for
DDL/DML execution, which can let schema creation or bulk insert calls hang
indefinitely. Update `bootstrap()` in `random_ddl_test_runner/bootstrap.go` to
create a timeout/deadline-aware context for the `ExecContext` work, and thread
that same bounded context through the related bootstrap phases and helpers at
the referenced call sites so the TiDB/MySQL operations fail fast instead of
wedging the weekly job.
In `@tests/utils/random_ddl_test_runner/config.go`:
- Around line 210-221: The config defaults in the DML initialization logic are
overwriting explicit user-supplied false/0 values, so update the
parsing/defaulting path to distinguish “unset” from “set to zero/false.” In the
code that sets c.DML.HotspotRatio, c.DML.HotTableRatio, c.DML.BigTxnInterval,
c.DML.BigTxnEnabled, and c.DML.KeyConflictEnabled (and the related
log_scan_enabled/fail_on_panic_match handling noted in the review), only apply
defaults when the field was omitted, not when it was explicitly provided as 0 or
false. Use presence-aware fields or separate “was set” checks in the config
loader so runner_config.json values are preserved.
In `@tests/utils/random_ddl_test_runner/ddl_worker.go`:
- Around line 132-145: The pickMissingTable helper is using only 10 random
probes, which can miss an existing dropped table and return nil too often.
Update pickMissingTable to deterministically scan the candidates slice for a
table whose exists flag is false, using the existing table.mu lock, and return
the first missing table found; keep the rand.Rand parameter only if it is still
needed elsewhere in ddl_worker.go.
In `@tests/utils/random_ddl_test_runner/dml.go`:
- Around line 342-346: The VARCHAR generator in randASCII handling should
respect the actual c.typ.varcharN limit instead of forcing a minimum of 8, since
that can exceed narrowed columns and trigger Data too long errors. Update the
logic in the VARCHAR case to cap the requested length by varcharN and use 1 as
the lower bound, while preserving the special pad column behavior in the same
switch.
In `@tests/utils/random_ddl_test_runner/extra_workers.go`:
- Around line 155-170: The conflictWriter upsert in extra_workers.go is
hard-coded to a fixed column list, so it keeps failing after random DDL changes
to tbl’s schema. Update the logic around the conflict insert path to either
derive the INSERT/UPDATE columns from the current table schema while holding
tbl.mu, or switch conflictWriter to a table family that is not touched by
column-altering DDL. Keep the reference points in conflictWriter and
collectConflictTables in mind so the statement always matches the live schema
before calling db.ExecContext.
In `@tests/utils/random_ddl_test_runner/failover.go`:
- Around line 121-133: The status probe in the capture helper is only limited by
the workload context, so a slow `/status` request can hang failover handling.
Update the request flow in the capture status check to use a short per-request
timeout, ideally by creating a derived context or client timeout around the
existing http.DefaultClient.Do calls in this helper, and keep the fallback
basic-auth retry path in the same timeout-bounded logic.
In `@tests/utils/random_ddl_test_runner/health.go`:
- Around line 54-64: The checkpoint decode in the health response parser is
losing TSO precision because `json.Decoder` is unmarshaling numeric fields into
`float64` before `parseUint64` sees them. Update the decode path in `health.go`
to call `dec.UseNumber()` and make the `changefeedStatus` parsing logic handle
`json.Number` for `checkpoint_tso` and `checkpoint_ts` so the exact integer
value is preserved during comparisons.
In `@tests/utils/random_ddl_test_runner/main.go`:
- Around line 77-84: `main()` currently dispatches to `runner.workload()`
without validating workload-only required config, so bad `cdc.changefeed_id`
values fail later with weaker errors. Add an upfront check in the workload
branch before calling `runner.workload()`, using the existing config/runner
setup that reads `cdc.changefeed_id`, and return a clear error immediately when
it is missing or empty. Keep the bootstrap path unchanged and use the `main()`
phase switch plus `runner.workload()` as the entry points for the fix.
In `@tests/utils/random_ddl_test_runner/model.go`:
- Around line 113-118: `tableSchema.clone()` is only doing a shallow copy of
`indexes`, so each `index.columns` slice still aliases the original backing
array. Update the cloning logic in `tableSchema.clone()` to deep-copy every
`index` entry, including its `columns` field, while preserving the existing
copies of `columns` and `primaryKey`. Use the `tableSchema.clone()` and `index`
symbols to locate the fix.
In `@tests/utils/random_ddl_test_runner/motif.go`:
- Around line 38-53: Stop the motif sequence as soon as any phase fails in the
motif runner. In the function that drives the phases around motifAddSiteCode,
motifUnifySiteCodeDefault, and motifAddCompositePK, check each call result and
return immediately on error instead of continuing to later sleeps and steps.
Only advance motifStep after each phase succeeds so DML workers never switch to
the post-PK pattern before the default-unification phase completes.
In `@tests/utils/random_ddl_test_runner/syncpoint_diff.go`:
- Around line 161-164: Carry the syncpoint cursor forward into post-workload
catch-up instead of restarting from zero. In ensureSyncpointDiffAfterWorkload,
preserve and reuse the lastPrimary value from the periodic syncpoint scan so the
follow-up checks continue from the next candidate rather than rechecking earlier
ones. Update the call flow between workload.go and runSyncpointDiffChecks to
pass the current cursor along with the remaining count, using the same
lastPrimary tracking already present in the syncpoint_diff.go loop.
- Around line 225-226: fetchDDLWindows currently calls http.DefaultClient.Do
with the long-lived workload context, which can block syncpoint diffing on a
stalled /ddl/history request. Update the request path in fetchDDLWindows to use
a bounded per-request timeout by creating a request-scoped context or dedicated
http.Client with timeout before the Do call, so the periodic loop can continue
even if TiDB hangs. Keep the change localized to the fetchDDLWindows HTTP
request handling.
In `@tests/utils/random_ddl_test_runner/workload.go`:
- Around line 260-266: In the health tracking logic around lastCheckpoint and
lastAdvance, avoid treating repeated zero checkpoints as progress. Update the
progress timestamp only when st.Checkpoint is non-zero and differs from the
previous checkpoint, so the initial zero state does not keep resetting
lastAdvance and prevent NoAdvanceHard from firing.
---
Nitpick comments:
In `@tests/utils/random_ddl_test_runner/ddl_test.go`:
- Around line 9-47: Rename the test functions in ddl_test.go to camelCase to
match the Go naming rule and repo guidelines: update
TestGenDropColumn_DoesNotDropPrimaryKey and
TestGenAddPartition_RequiresRangePartition to use camelCase without underscores,
while keeping their behavior and references to genDropColumn and genAddPartition
unchanged.
In `@tests/utils/random_ddl_test_runner/selector_test.go`:
- Around line 8-30: The test function name in the selector test uses underscores
and should be renamed to camelCase to match the repo’s Go naming rule. Update
the `TestDDLSelector_CoverageDebtLikeBehavior` symbol in `selector_test.go` to a
camelCase form with no underscores, keeping the same test logic and coverage
behavior unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0746e1ef-db06-4c9f-9650-30fdb250b2c9
📒 Files selected for processing (42)
Makefilepkg/eventservice/event_broker.gopkg/eventservice/event_broker_test.gotests/integration_tests/_utils/test_preparetests/integration_tests/run_weekly_rand_ddl_it_in_ci.shtests/integration_tests/weekly_rand_multi/conf/changefeed.tomltests/integration_tests/weekly_rand_multi/conf/changefeed_mysql.tomltests/integration_tests/weekly_rand_multi/conf/consumer.tomltests/integration_tests/weekly_rand_multi/run.shtests/integration_tests/weekly_rand_multi_failover/conf/changefeed.tomltests/integration_tests/weekly_rand_multi_failover/conf/changefeed_mysql.tomltests/integration_tests/weekly_rand_multi_failover/conf/consumer.tomltests/integration_tests/weekly_rand_multi_failover/run.shtests/integration_tests/weekly_rand_single/conf/changefeed_mysql.tomltests/integration_tests/weekly_rand_single/conf/consumer.tomltests/integration_tests/weekly_rand_single/run.shtests/integration_tests/weekly_rand_slow_lossy_ddl/conf/changefeed_mysql.tomltests/integration_tests/weekly_rand_slow_lossy_ddl/run.shtests/utils/random_ddl_test_runner/autotune.gotests/utils/random_ddl_test_runner/autotune_test.gotests/utils/random_ddl_test_runner/bootstrap.gotests/utils/random_ddl_test_runner/config.gotests/utils/random_ddl_test_runner/db.gotests/utils/random_ddl_test_runner/ddl.gotests/utils/random_ddl_test_runner/ddl_test.gotests/utils/random_ddl_test_runner/ddl_worker.gotests/utils/random_ddl_test_runner/dml.gotests/utils/random_ddl_test_runner/dml_test.gotests/utils/random_ddl_test_runner/extra_workers.gotests/utils/random_ddl_test_runner/failover.gotests/utils/random_ddl_test_runner/health.gotests/utils/random_ddl_test_runner/logger.gotests/utils/random_ddl_test_runner/logscan.gotests/utils/random_ddl_test_runner/logscan_test.gotests/utils/random_ddl_test_runner/main.gotests/utils/random_ddl_test_runner/model.gotests/utils/random_ddl_test_runner/motif.gotests/utils/random_ddl_test_runner/runner.gotests/utils/random_ddl_test_runner/selector.gotests/utils/random_ddl_test_runner/selector_test.gotests/utils/random_ddl_test_runner/syncpoint_diff.gotests/utils/random_ddl_test_runner/workload.go
|
|
||
| trap 'cleanup' EXIT | ||
|
|
||
| rm -rf $WORK_DIR && mkdir -p $WORK_DIR |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Guard the workdir reset before rm -rf.
Line 149 deletes an unguarded variable path. If WORK_DIR is empty or malformed, this cleanup can wipe the wrong directory and make the job fail destructively.
Suggested fix
-rm -rf $WORK_DIR && mkdir -p $WORK_DIR
+rm -rf -- "${WORK_DIR:?WORK_DIR is unset}" && mkdir -p -- "$WORK_DIR"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rm -rf $WORK_DIR && mkdir -p $WORK_DIR | |
| rm -rf -- "${WORK_DIR:?WORK_DIR is unset}" && mkdir -p -- "$WORK_DIR" |
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 149-149: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 149-149: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/integration_tests/weekly_rand_multi_failover/run.sh` at line 149, The
workdir cleanup in the script is using an unguarded WORK_DIR value, which can
make the reset dangerous if the variable is empty or malformed. Update the reset
logic around the rm -rf in the run.sh flow to validate WORK_DIR before deletion,
and only proceed when it is a non-empty expected path; use the WORK_DIR handling
near the rm -rf/mkdir sequence as the place to add the guard.
Source: Linters/SAST tools
| *) | ||
| SINK_URI="mysql://root@127.0.0.1:3306/" | ||
| ;; |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Use the configured downstream address for the MySQL sink.
This script writes runner config and downstream SQL against ${DOWN_TIDB_HOST}:${DOWN_TIDB_PORT}, but the mysql sink still points at 127.0.0.1:3306. That breaks the test whenever the downstream endpoint is overridden.
Suggested fix
*)
- SINK_URI="mysql://root@127.0.0.1:3306/"
+ SINK_URI="mysql://root@${DOWN_TIDB_HOST}:${DOWN_TIDB_PORT}/"
;;
esac
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| *) | |
| SINK_URI="mysql://root@127.0.0.1:3306/" | |
| ;; | |
| *) | |
| SINK_URI="mysql://root@${DOWN_TIDB_HOST}:${DOWN_TIDB_PORT}/" | |
| ;; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/integration_tests/weekly_rand_multi/run.sh` around lines 118 - 120, The
MySQL sink fallback in the run script is hardcoded to localhost, which bypasses
the configured downstream endpoint. Update the sink URI construction in the case
branch that sets SINK_URI so it uses the same downstream host and port variables
already used elsewhere in the script, matching the runner config and SQL target.
Keep the change localized to the SINK_URI assignment and use the existing
DOWN_TIDB_HOST and DOWN_TIDB_PORT symbols for consistency.
|
|
||
| trap 'cleanup' EXIT | ||
|
|
||
| rm -rf $WORK_DIR && mkdir -p $WORK_DIR |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Guard the recursive delete with quotes and a non-empty check.
If WORK_DIR resolves unexpectedly, this can delete the wrong directory tree.
Suggested fix
-rm -rf $WORK_DIR && mkdir -p $WORK_DIR
+rm -rf -- "${WORK_DIR:?WORK_DIR is unset}" && mkdir -p "$WORK_DIR"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rm -rf $WORK_DIR && mkdir -p $WORK_DIR | |
| rm -rf -- "${WORK_DIR:?WORK_DIR is unset}" && mkdir -p "$WORK_DIR" |
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 143-143: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 143-143: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/integration_tests/weekly_rand_multi/run.sh` at line 143, The cleanup
command in the weekly_rand_multi run script is unsafe because rm -rf uses
WORK_DIR unquoted and without validating it first. Update the WORK_DIR cleanup
logic to check that WORK_DIR is set and non-empty before deleting, and quote the
variable everywhere it is used in the rm/mkdir sequence to avoid accidental
expansion or deletion of the wrong directory tree.
Source: Linters/SAST tools
| *) | ||
| SINK_URI="mysql://root@127.0.0.1:3306/" | ||
| ;; |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Use the configured downstream address for the MySQL sink.
The runner config, run_sql calls, and final diff all target ${DOWN_TIDB_HOST}:${DOWN_TIDB_PORT}, but the changefeed sink is hard-coded to 127.0.0.1:3306. In any non-default topology this replicates into one downstream and validates another.
Suggested fix
*)
- SINK_URI="mysql://root@127.0.0.1:3306/"
+ SINK_URI="mysql://root@${DOWN_TIDB_HOST}:${DOWN_TIDB_PORT}/"
;;
esac
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| *) | |
| SINK_URI="mysql://root@127.0.0.1:3306/" | |
| ;; | |
| *) | |
| SINK_URI="mysql://root@${DOWN_TIDB_HOST}:${DOWN_TIDB_PORT}/" | |
| ;; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/integration_tests/weekly_rand_single/run.sh` around lines 113 - 115,
The MySQL sink URI is hard-coded to localhost in the sink selection branch,
which makes it diverge from the configured downstream used elsewhere in run.sh.
Update the SINK_URI assignment in the MySQL case to use the same downstream host
and port variables already referenced by the runner config and run_sql checks,
so the changefeed replicates into the intended target. Keep the fix localized to
the sink URI setup logic in the shell script.
|
|
||
| trap 'cleanup' EXIT | ||
|
|
||
| rm -rf $WORK_DIR && mkdir -p $WORK_DIR |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Guard the recursive delete with quotes and a non-empty check.
If WORK_DIR is empty or malformed, this can remove an unintended path.
Suggested fix
-rm -rf $WORK_DIR && mkdir -p $WORK_DIR
+rm -rf -- "${WORK_DIR:?WORK_DIR is unset}" && mkdir -p "$WORK_DIR"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rm -rf $WORK_DIR && mkdir -p $WORK_DIR | |
| rm -rf -- "${WORK_DIR:?WORK_DIR is unset}" && mkdir -p "$WORK_DIR" |
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 138-138: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 138-138: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/integration_tests/weekly_rand_single/run.sh` at line 138, The cleanup
step in the weekly_rand_single run script uses an unquoted recursive delete on
WORK_DIR, which can target the wrong path if the variable is empty or malformed.
Update the rm -rf invocation to guard against empty WORK_DIR and quote the
variable expansion, and keep the mkdir -p step tied to the same validated
WORK_DIR value so the safe path is only removed and recreated when it is set
correctly.
Source: Linters/SAST tools
| func (s tableSchema) clone() tableSchema { | ||
| cp := s | ||
| cp.columns = append([]column(nil), s.columns...) | ||
| cp.primaryKey = append([]string(nil), s.primaryKey...) | ||
| cp.indexes = append([]index(nil), s.indexes...) | ||
| return cp |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Deep-copy index.columns in tableSchema.clone().
cp.indexes = append([]index(nil), s.indexes...) only copies the outer slice. Each cloned index.columns still shares the same backing array, so later DDL mutations can leak across schema, initialSchema, and sibling tables cloned from the same family.
Possible fix
func (s tableSchema) clone() tableSchema {
cp := s
cp.columns = append([]column(nil), s.columns...)
cp.primaryKey = append([]string(nil), s.primaryKey...)
- cp.indexes = append([]index(nil), s.indexes...)
+ cp.indexes = make([]index, len(s.indexes))
+ for i, idx := range s.indexes {
+ cp.indexes[i] = idx
+ cp.indexes[i].columns = append([]string(nil), idx.columns...)
+ }
return cp
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (s tableSchema) clone() tableSchema { | |
| cp := s | |
| cp.columns = append([]column(nil), s.columns...) | |
| cp.primaryKey = append([]string(nil), s.primaryKey...) | |
| cp.indexes = append([]index(nil), s.indexes...) | |
| return cp | |
| func (s tableSchema) clone() tableSchema { | |
| cp := s | |
| cp.columns = append([]column(nil), s.columns...) | |
| cp.primaryKey = append([]string(nil), s.primaryKey...) | |
| cp.indexes = make([]index, len(s.indexes)) | |
| for i, idx := range s.indexes { | |
| cp.indexes[i] = idx | |
| cp.indexes[i].columns = append([]string(nil), idx.columns...) | |
| } | |
| return cp | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/utils/random_ddl_test_runner/model.go` around lines 113 - 118,
`tableSchema.clone()` is only doing a shallow copy of `indexes`, so each
`index.columns` slice still aliases the original backing array. Update the
cloning logic in `tableSchema.clone()` to deep-copy every `index` entry,
including its `columns` field, while preserving the existing copies of `columns`
and `primaryKey`. Use the `tableSchema.clone()` and `index` symbols to locate
the fix.
| if err := motifAddSiteCode(ctx, db, model, trace, logger); err == nil { | ||
| atomic.StoreInt32(motifStep, 1) | ||
| } | ||
|
|
||
| if err := sleepWithContext(ctx, step2At-step1At); err != nil { | ||
| return | ||
| } | ||
| if err := motifUnifySiteCodeDefault(ctx, db, model, trace, logger); err == nil { | ||
| atomic.StoreInt32(motifStep, 2) | ||
| } | ||
|
|
||
| if err := sleepWithContext(ctx, step3At-step2At); err != nil { | ||
| return | ||
| } | ||
| if err := motifAddCompositePK(ctx, db, model, trace, logger); err == nil { | ||
| atomic.StoreInt32(motifStep, 3) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Stop the motif sequence on the first failed phase.
Step 2 and step 3 still run after earlier failures. That can set motifStep=3 without ever finishing the default-unification phase, so DML workers switch to the post-PK pattern against rows that can never match site_code=''.
Suggested fix
if err := sleepWithContext(ctx, step1At); err != nil {
return
}
- if err := motifAddSiteCode(ctx, db, model, trace, logger); err == nil {
- atomic.StoreInt32(motifStep, 1)
+ if err := motifAddSiteCode(ctx, db, model, trace, logger); err != nil {
+ return
}
+ atomic.StoreInt32(motifStep, 1)
if err := sleepWithContext(ctx, step2At-step1At); err != nil {
return
}
- if err := motifUnifySiteCodeDefault(ctx, db, model, trace, logger); err == nil {
- atomic.StoreInt32(motifStep, 2)
+ if err := motifUnifySiteCodeDefault(ctx, db, model, trace, logger); err != nil {
+ return
}
+ atomic.StoreInt32(motifStep, 2)
if err := sleepWithContext(ctx, step3At-step2At); err != nil {
return
}
- if err := motifAddCompositePK(ctx, db, model, trace, logger); err == nil {
- atomic.StoreInt32(motifStep, 3)
+ if err := motifAddCompositePK(ctx, db, model, trace, logger); err != nil {
+ return
}
+ atomic.StoreInt32(motifStep, 3)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err := motifAddSiteCode(ctx, db, model, trace, logger); err == nil { | |
| atomic.StoreInt32(motifStep, 1) | |
| } | |
| if err := sleepWithContext(ctx, step2At-step1At); err != nil { | |
| return | |
| } | |
| if err := motifUnifySiteCodeDefault(ctx, db, model, trace, logger); err == nil { | |
| atomic.StoreInt32(motifStep, 2) | |
| } | |
| if err := sleepWithContext(ctx, step3At-step2At); err != nil { | |
| return | |
| } | |
| if err := motifAddCompositePK(ctx, db, model, trace, logger); err == nil { | |
| atomic.StoreInt32(motifStep, 3) | |
| if err := motifAddSiteCode(ctx, db, model, trace, logger); err != nil { | |
| return | |
| } | |
| atomic.StoreInt32(motifStep, 1) | |
| if err := sleepWithContext(ctx, step2At-step1At); err != nil { | |
| return | |
| } | |
| if err := motifUnifySiteCodeDefault(ctx, db, model, trace, logger); err != nil { | |
| return | |
| } | |
| atomic.StoreInt32(motifStep, 2) | |
| if err := sleepWithContext(ctx, step3At-step2At); err != nil { | |
| return | |
| } | |
| if err := motifAddCompositePK(ctx, db, model, trace, logger); err != nil { | |
| return | |
| } | |
| atomic.StoreInt32(motifStep, 3) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/utils/random_ddl_test_runner/motif.go` around lines 38 - 53, Stop the
motif sequence as soon as any phase fails in the motif runner. In the function
that drives the phases around motifAddSiteCode, motifUnifySiteCodeDefault, and
motifAddCompositePK, check each call result and return immediately on error
instead of continuing to later sleeps and steps. Only advance motifStep after
each phase succeeds so DML workers never switch to the post-PK pattern before
the default-unification phase completes.
| var lastPrimary uint64 | ||
| checked := 0 | ||
| for checked < required { | ||
| n, err := r.runSyncpointDiffChecks(ctx, down, model, required-checked, &lastPrimary, true) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift
Carry the syncpoint cursor into post-workload catch-up.
ensureSyncpointDiffAfterWorkload restarts from lastPrimary = 0, while workload.go only passes the remaining count. If the periodic loop already checked syncpoints, catch-up can recheck the earliest candidates and still report completion without adding the intended new coverage.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/utils/random_ddl_test_runner/syncpoint_diff.go` around lines 161 - 164,
Carry the syncpoint cursor forward into post-workload catch-up instead of
restarting from zero. In ensureSyncpointDiffAfterWorkload, preserve and reuse
the lastPrimary value from the periodic syncpoint scan so the follow-up checks
continue from the next candidate rather than rechecking earlier ones. Update the
call flow between workload.go and runSyncpointDiffChecks to pass the current
cursor along with the remaining count, using the same lastPrimary tracking
already present in the syncpoint_diff.go loop.
| resp, err := http.DefaultClient.Do(req) | ||
| if err != nil { |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the relevant file and nearby context
wc -l tests/utils/random_ddl_test_runner/syncpoint_diff.go
sed -n '180,270p' tests/utils/random_ddl_test_runner/syncpoint_diff.goRepository: pingcap/ticdc
Length of output: 2938
🏁 Script executed:
#!/bin/bash
set -euo pipefail
set -euo pipefail
rg -n "fetchDDLWindows\\(|pickNextSyncpointCandidate\\(|context.WithTimeout\\(|WithDeadline\\(" tests/utils/random_ddl_test_runner/syncpoint_diff.go tests/utils/random_ddl_test_runner -g'*.go'
sed -n '1,180p' tests/utils/random_ddl_test_runner/syncpoint_diff.goRepository: pingcap/ticdc
Length of output: 6779
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "fetchDDLWindows\\(|pickNextSyncpointCandidate\\(|context.WithTimeout\\(|WithDeadline\\(" tests/utils/random_ddl_test_runner/syncpoint_diff.go tests/utils/random_ddl_test_runner -g'*.go'
sed -n '1,180p' tests/utils/random_ddl_test_runner/syncpoint_diff.goRepository: pingcap/ticdc
Length of output: 6779
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect where syncpoint diff loop is started and how its context is scoped.
rg -n "syncpointDiffLoop\\(|ensureSyncpointDiffAfterWorkload\\(|runSyncpointDiffChecks\\(" tests/utils/random_ddl_test_runner -g'*.go'
sed -n '160,260p' tests/utils/random_ddl_test_runner/workload.go
sed -n '260,360p' tests/utils/random_ddl_test_runner/workload.goRepository: pingcap/ticdc
Length of output: 7455
Use a bounded timeout for /ddl/history. fetchDDLWindows uses http.DefaultClient with the long-lived workload context, so a stalled TiDB handler can freeze periodic syncpoint diffing until the test ends. A per-request timeout keeps the loop moving.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/utils/random_ddl_test_runner/syncpoint_diff.go` around lines 225 - 226,
fetchDDLWindows currently calls http.DefaultClient.Do with the long-lived
workload context, which can block syncpoint diffing on a stalled /ddl/history
request. Update the request path in fetchDDLWindows to use a bounded per-request
timeout by creating a request-scoped context or dedicated http.Client with
timeout before the Do call, so the periodic loop can continue even if TiDB
hangs. Keep the change localized to the fetchDDLWindows HTTP request handling.
| if lastCheckpoint == 0 { | ||
| lastCheckpoint = st.Checkpoint | ||
| lastAdvance = now | ||
| } else if st.Checkpoint != 0 && st.Checkpoint != lastCheckpoint { | ||
| lastCheckpoint = st.Checkpoint | ||
| lastAdvance = now | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Don’t reset checkpoint progress while checkpoint is zero.
When st.Checkpoint remains 0, lastCheckpoint == 0 stays true and lastAdvance is refreshed every health tick, so NoAdvanceHard never triggers. Only update progress when a non-zero checkpoint changes.
Proposed fix
- if lastCheckpoint == 0 {
- lastCheckpoint = st.Checkpoint
- lastAdvance = now
- } else if st.Checkpoint != 0 && st.Checkpoint != lastCheckpoint {
+ if st.Checkpoint != 0 && st.Checkpoint != lastCheckpoint {
lastCheckpoint = st.Checkpoint
lastAdvance = now
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if lastCheckpoint == 0 { | |
| lastCheckpoint = st.Checkpoint | |
| lastAdvance = now | |
| } else if st.Checkpoint != 0 && st.Checkpoint != lastCheckpoint { | |
| lastCheckpoint = st.Checkpoint | |
| lastAdvance = now | |
| } | |
| if st.Checkpoint != 0 && st.Checkpoint != lastCheckpoint { | |
| lastCheckpoint = st.Checkpoint | |
| lastAdvance = now | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/utils/random_ddl_test_runner/workload.go` around lines 260 - 266, In
the health tracking logic around lastCheckpoint and lastAdvance, avoid treating
repeated zero checkpoints as progress. Update the progress timestamp only when
st.Checkpoint is non-zero and differs from the previous checkpoint, so the
initial zero state does not keep resetting lastAdvance and prevent NoAdvanceHard
from firing.
What problem does this PR solve?
Issue Number: close #5543
This PR adds weekly random DDL+DML integration coverage and fixes issues found while running it. The runner also needs to avoid false failures when random DML payload strings contain words such as
panicorfatal.What is changed and how it works?
[PANIC],[FATAL],panic:,fatal error:, andWARNING: DATA RACE, while keeping custom substring patterns supported.Check List
Tests
Manual checks run:
PATH=/usr/local/go/bin:$PATH GOTOOLCHAIN=auto go test ./tests/utils/random_ddl_test_runner -count=1bash -nfor weekly random DDL scripts and_utils/test_prepare/tmp/tidb_cdc_test/weekly_rand_singlelogs with the new shell rule.make integration_test_weekly_rand_ddl_mysql; workload/converge/syncpoint diff passed, then old log scan rule failed on random payload false positives.Questions
Will it cause performance regression or break compatibility?
No. Runtime code only expands event-service scan progress when pending syncpoints exist; the remaining changes are test tooling and integration cases.
Do you need to update user documentation, design documentation or monitoring documentation?
No.
Release note
Summary by CodeRabbit
New Features
Bug Fixes
Tests