[DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] `diagnose` cmd + Skill Improvements + Fix Flaky Tests by kalverra · Pull Request #22368 · smartcontractkit/chainlink

kalverra · 2026-05-10T01:24:00Z

Improve `fix-chainlink-tests` Skill

Rename from chainlink-diagnose-tests to fix-chainlink-tests, better aligned with best practices.
Improve some edge cases and counting bugs in diagnose loop.
Add some memory for fix attempts in the form of a fix-attempt-*.jsonl file to preserve past attempts and reduce context compaction consequences.
Test postgres instance goes faster: https://github.com/smartcontractkit/chainlink/compare/postgresTestImprovements?expand=1#diff-90715ad3f71e06e08c4314d35148b9574ac20c98ff16c2db9afeefcd890dc035R107-R108

Faster (kinda not really) Postgres Tests

Tuned our test postgres instances to discard some production protections that don't help us, and only slow down tests. This wasn't very effective at speeding up tests, but it also didn't hurt. Left is before the changes, Right is after.

If anything, this indicates to me that our bottleneck for test speed IS NOT the postgres instance.

Fix Flaky Tests

Show off the skills

Remove unnecessary subtest that was giving confusing test count results.

Fix flaky tests in `core/services/workflows/syncer/` package

Tests in this package were using hardcoded DB keys, meaning they would stomp on each other all the time. The diagnose runs found these flake rates:

Flaky (8)
- github.com/smartcontractkit/chainlink/v2/core/services/workflows/
|-- syncer/ (3/25) 12.0%
|---- Test_workflowDeletedHandler (1/25) 4.0%
|---- Test_workflowDeletedHandler/success_deleting_existing_engine_and_spec (1/25) 4.0%
|---- Test_workflowPausedActivatedUpdatedHandler (2/25) 8.0%
|---- Test_workflowPausedActivatedUpdatedHandler/success_pausing_activating_and_updating_existing_engine_and_spec (2/25) 8.0%
|---- Test_workflowRegisteredHandler (3/25) 12.0%
|---- Test_workflowRegisteredHandler/correctly_generates_the_workflow_name (1/25) 4.0%
|---- Test_workflowRegisteredHandler/success_with_active_workflow_registered (2/25) 8.0%

Trunk.io has these tests and tickets listed for this package:

Test_workflowDeletedHandler 24% flaky | CRE-3577
Test_workflowDeletedHandler/success_deleting_existing_engine_and_spec 24% flaky | CRE-3578
Test_StratReconciliation_RetriesWithBackoff 18% flaky | CRE-3579

diagnose runs with --iterations 100 after the fixes show a 0% flake rate!

Review

There are a lot of linting fixes in this PR that we did as part of running experiments that can largely be ignored for reviews. Focus on changes in:

tools/test/: Updates to the diagnose harness and the AI skill.
core/services/workflows/syncer/v2/handler_test.go: Flaky test fixes

…postgresTestImprovements

github-actions · 2026-05-10T01:24:17Z

👋 kalverra, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

github-actions · 2026-05-10T01:24:31Z

CORA - Pending Reviewers

Codeowners Entry	Overall	Num Files	Owners
`*`	💬	1	@smartcontractkit/foundations, @smartcontractkit/core
`/core/services/ocr*/`	💬	7	@smartcontractkit/foundations, @smartcontractkit/core
`/core/services/ocr2/plugins/ocr2keeper/`	💬	2	@smartcontractkit/dev-services
`/core/services/workflows/`	🚫	13	@smartcontractkit/keystone
`.tool-versions`	💬	1	@smartcontractkit/core

For more details, see the full review summary.

github-actions · 2026-05-10T01:25:10Z

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

#added For any new functionality added.
#breaking_change For any functionality that requires manual action for the node to boot.
#bugfix For bug fixes.
#changed For any change to the existing functionality.
#db_update For any feature that introduces updates to database schema.
#deprecation_notice For any upcoming deprecation functionality.
#internal For changesets that need to be excluded from the final changelog.
#nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
#removed For any functionality/config that is removed.
#updated For any functionality that is updated.
#wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

github-actions · 2026-05-10T01:25:28Z

✅ No conflicts with other open PRs targeting develop

Copilot

Pull request overview

Risk Rating: MEDIUM

This PR improves the tools/test diagnose harness UX and performance by speeding up the ephemeral Postgres instance, enhancing progress/output behavior, and enriching the analyze/report pipeline (build-failure detection, slow reporting, runtime estimates, and metadata capture).

Changes:

Speed up test Postgres containers (tuned settings + tmpfs) and persist additional run metadata (e.g., Postgres version, has DB).
Improve diagnose runner output: better live-progress coordination, “analyzing” live timer, longest-possible runtime estimate, and stop-on-build-failure behavior.
Expand analyze/reporting: detect build failures from go test -json, adjust slow reporting to include top packages, and tweak summary/overall formatting.

Scrupulous human review recommended (high-impact logic):

tools/test/internal/runner/analyze.go: slow/top-packages merging and its interaction with summary metrics/CSV output.
tools/test/internal/runner/runner.go: new fail-fast-on-build-failure behavior and new AI-output markers (lpr_s:*, bf_stop ...).
tools/test/internal/db/db.go: Postgres container tuning (durability disabled, tmpfs) and any CI/platform implications.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tools/test/internal/runner/runner.go	Diagnose runner: build-failure stop, runtime estimate output, improved progress/analyzing behavior, run metadata updates.
tools/test/internal/runner/runner_test.go	Updated/added unit tests for analyzing progress, timeout parsing, runtime estimates, build-failure stop, serial progress mutex.
tools/test/internal/runner/diagnose_results_dir.go	Include `-run` pattern in diagnose results directory slug.
tools/test/internal/runner/diagnose_progress.go	More robust package-pattern detection when flags appear after packages; progress time clamping cleanup.
tools/test/internal/runner/diagnose_progress_test.go	Added regression tests for progress-line vs digest-line merging behavior.
tools/test/internal/runner/analyze.go	Build failure signals, slow-report restructuring (top packages), summary/overall formatting changes, new run meta fields.
tools/test/internal/runner/analyze_test.go	Added coverage for build-failure detection and severity color output in summary.
tools/test/internal/repo/repo.go	Use `strings.SplitSeq` iteration for go.mod parsing.
tools/test/internal/output/output.go	Add `NewForTest` helper to control live-inline behavior in unit tests.
tools/test/internal/output/output_test.go	Add test coverage for `NewForTest` live-inline behavior and AI-output interaction.
tools/test/internal/db/db.go	Speed up test Postgres container via config knobs and tmpfs.
tools/test/internal/config/config.go	Use `strings.SplitSeq` for fail-fast-on parsing.
tools/test/.agents/skills/chainlink-test-diagnosis/SKILL.md	Update agent skill guidance and references layout.
tools/test/.agents/skills/chainlink-test-diagnosis/references/flaky-patterns/filter.md	Add reference doc for a common flaky filter/logpoller pattern.
tools/test/.agents/skills/chainlink-test-diagnosis/eval/real-fix-shas.json	Add eval metadata file mapping PR SHA(s) to real fixes.
core/services/ocr2/plugins/ocr2keeper/integration_test.go	Switch pointer helpers to `new(...)` usage for config fields.
core/services/ocr2/plugins/ocr2keeper/evmregistry/v21/logprovider/integration_test.go	Switch pointer helpers to `new(...)` usage for config fields.
core/services/ocr2/plugins/mercury/plugin_test.go	Switch pointer helpers to `new(...)` usage in test fixtures.
core/services/ocr2/plugins/mercury/helpers_test.go	Switch pointer helpers to `new(...)` usage; keep local `ptr` helper for remaining cases.
core/services/ocr2/plugins/llo/onchain_channel_definition_cache_integration_test.go	Use `maps.Copy` for merging definitions; switch SHA3 import usage.
core/services/ocr2/plugins/llo/integration_test.go	Misc test cleanups; config pointer creation updates; remove skipped subtest in favor of commented block.
core/services/ocr2/plugins/llo/helpers_test.go	Listener creation with `net.ListenConfig`; test assertions changed to avoid hard-failing HTTP handler after read errors; pointer helper removal.
core/services/ocr2/plugins/llo/config/config.go	Simplify validation control flow for channel definitions vs contract address.
core/services/ocr2/plugins/llo/config/config_test.go	Tighten error assertions with `require.EqualError`.
.gitignore	Ignore `diagnose-attempted-fixes-*.jsonl`.

trunk-io · 2026-05-10T01:36:01Z

_{View Full Report ↗︎ ⋅ Docs}

Copilot

Pull request overview

Copilot reviewed 39 out of 41 changed files in this pull request and generated 3 comments.

Copilot

Pull request overview

Copilot reviewed 39 out of 41 changed files in this pull request and generated 3 comments.

…postgresTestImprovements

erikburt · 2026-05-11T19:21:52Z

 		postgres.WithUsername("postgres"),
 		postgres.WithPassword("postgres"),
-		testcontainers.WithCmdArgs("-c", "max_connections=1000"),
+		testcontainers.WithCmdArgs(


Should this maintain parity with the one we use in our unit tests? Otherwise, we may see different errors using the diagnose tool and what we see in practice.

And if these flags reduce flakes, then perhaps they should be present for the instance running with our unit tests?

Agreed we want to keep parity. Made this PR to help us do so: smartcontractkit/.github#1549

…postgresTestImprovements

cl-sonarqube-production · 2026-05-12T00:04:04Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.3% Duplication on New Code

See analysis details on SonarQube

kalverra added 5 commits May 7, 2026 20:19

Fix progress bar bug

c5521e2

Merge branch 'develop' of github.com:smartcontractkit/chainlink into …

4570c46

…postgresTestImprovements

Speedy postgres

e114c08

skill improvmenets

1c6acf8

ptr > new

712c090

Copilot AI review requested due to automatic review settings May 10, 2026 01:24

kalverra requested review from a team as code owners May 10, 2026 01:24

product-security-plaid-production Bot requested review from ChrisAmora, leeyikjiun, tvc-robsondebraga and vyzaldysanchez May 10, 2026 01:24

Copilot started reviewing on behalf of kalverra May 10, 2026 01:24 View session

kalverra enabled auto-merge May 10, 2026 01:24

Copilot AI reviewed May 10, 2026

View reviewed changes

Comment thread tools/test/internal/runner/analyze.go Outdated

Comment thread tools/test/internal/runner/analyze.go

Comment thread tools/test/internal/runner/analyze.go Outdated

Fix syncer tests

ea3e431

kalverra requested a review from a team as a code owner May 11, 2026 03:36

lints

3ef3df2

kalverra requested a review from Copilot May 11, 2026 03:45

Copilot started reviewing on behalf of kalverra May 11, 2026 03:46 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Comment thread tools/test/internal/runner/runner.go Outdated

Comment thread tools/test/internal/runner/analyze.go

Comment thread tools/test/internal/runner/analyze.go

Comments

b87712a

Copilot AI reviewed May 11, 2026

View reviewed changes

Comment thread tools/test/internal/runner/runner.go

Comment thread tools/test/internal/runner/analyze.go

Comment thread core/services/ocr2/plugins/llo/integration_test.go Outdated

Merge branch 'develop' of github.com:smartcontractkit/chainlink into …

524bef5

…postgresTestImprovements

kalverra changed the title ~~[DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] Postgres diagnose Improvements + Fix Flaky Tests~~ [DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] diagnosecmd + Skill Improvements + Fix Flaky Tests May 11, 2026

kalverra added 2 commits May 11, 2026 09:31

guard test counts

e06f4fd

Lint

3e03a41

Tofel reviewed May 11, 2026

View reviewed changes

Comment thread tools/test/internal/db/db.go

Tofel reviewed May 11, 2026

View reviewed changes

Comment thread tools/test/internal/db/db.go

kalverra enabled auto-merge May 11, 2026 14:39

kalverra requested a review from Tofel May 11, 2026 15:11

jmank88 reviewed May 11, 2026

View reviewed changes

Comment thread core/services/ocr2/plugins/llo/onchain_channel_definition_cache_integration_test.go Outdated

jmank88 reviewed May 11, 2026

View reviewed changes

Comment thread tools/test/internal/runner/analyze.go Outdated

jmank88 reviewed May 11, 2026

View reviewed changes

Comment thread core/services/ocr2/plugins/llo/integration_test.go Outdated

jmank88 reviewed May 11, 2026

View reviewed changes

Comment thread core/services/ocr2/plugins/llo/integration_test.go Outdated

Comments

a83c1f2

kalverra changed the title ~~[DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] diagnosecmd + Skill Improvements + Fix Flaky Tests~~ [DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] diagnose cmd + Skill Improvements + Fix Flaky Tests May 11, 2026

kalverra requested a review from jmank88 May 11, 2026 17:21

timothyF95 reviewed May 11, 2026

View reviewed changes

Comment thread core/services/ocr2/plugins/ocr2keeper/evmregistry/v21/logprovider/integration_test.go

jmank88 previously approved these changes May 11, 2026

View reviewed changes

erikburt reviewed May 11, 2026

View reviewed changes

fix too many params

60a5d46

kalverra dismissed jmank88’s stale review via 60a5d46 May 11, 2026 20:46

kalverra added 4 commits May 11, 2026 16:53

Update tool version

47ac947

lint

55bacd2

use same settings in CI postgres

09c1ee1

Merge branch 'develop' of github.com:smartcontractkit/chainlink into …

a1fd7e3

…postgresTestImprovements

kalverra requested review from erikburt, jmank88 and timothyF95 May 12, 2026 00:33

erikburt approved these changes May 12, 2026

View reviewed changes

Conversation

kalverra commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improve fix-chainlink-tests Skill

Faster (kinda not really) Postgres Tests

Fix Flaky Tests

Fix flaky tests in core/services/workflows/syncer/ package

Review

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CORA - Pending Reviewers

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trunk-io Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erikburt May 11, 2026

Choose a reason for hiding this comment

Uh oh!

kalverra May 11, 2026

Choose a reason for hiding this comment

Uh oh!

kalverra May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cl-sonarqube-production Bot commented May 12, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kalverra commented May 10, 2026 •

edited

Loading

Improve `fix-chainlink-tests` Skill

Fix flaky tests in `core/services/workflows/syncer/` package

github-actions Bot commented May 10, 2026 •

edited

Loading

github-actions Bot commented May 10, 2026 •

edited

Loading

trunk-io Bot commented May 10, 2026 •

edited

Loading