Skip to content

[DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] diagnose cmd + Skill Improvements + Fix Flaky Tests#22368

Open
kalverra wants to merge 18 commits into
developfrom
postgresTestImprovements
Open

[DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] diagnose cmd + Skill Improvements + Fix Flaky Tests#22368
kalverra wants to merge 18 commits into
developfrom
postgresTestImprovements

Conversation

@kalverra
Copy link
Copy Markdown
Collaborator

@kalverra kalverra commented May 10, 2026

Improve fix-chainlink-tests Skill

Faster (kinda not really) Postgres Tests

Tuned our test postgres instances to discard some production protections that don't help us, and only slow down tests. This wasn't very effective at speeding up tests, but it also didn't hurt. Left is before the changes, Right is after.

image

If anything, this indicates to me that our bottleneck for test speed IS NOT the postgres instance.

Fix Flaky Tests

Show off the skills

Fix flaky tests in core/services/workflows/syncer/ package

Tests in this package were using hardcoded DB keys, meaning they would stomp on each other all the time. The diagnose runs found these flake rates:

Flaky (8)
- github.com/smartcontractkit/chainlink/v2/core/services/workflows/
|-- syncer/ (3/25) 12.0%
|---- Test_workflowDeletedHandler (1/25) 4.0%
|---- Test_workflowDeletedHandler/success_deleting_existing_engine_and_spec (1/25) 4.0%
|---- Test_workflowPausedActivatedUpdatedHandler (2/25) 8.0%
|---- Test_workflowPausedActivatedUpdatedHandler/success_pausing_activating_and_updating_existing_engine_and_spec (2/25) 8.0%
|---- Test_workflowRegisteredHandler (3/25) 12.0%
|---- Test_workflowRegisteredHandler/correctly_generates_the_workflow_name (1/25) 4.0%
|---- Test_workflowRegisteredHandler/success_with_active_workflow_registered (2/25) 8.0%

Trunk.io has these tests and tickets listed for this package:

diagnose runs with --iterations 100 after the fixes show a 0% flake rate!

Review

There are a lot of linting fixes in this PR that we did as part of running experiments that can largely be ignored for reviews. Focus on changes in:

Copilot AI review requested due to automatic review settings May 10, 2026 01:24
@kalverra kalverra requested review from a team as code owners May 10, 2026 01:24
@github-actions
Copy link
Copy Markdown
Contributor

👋 kalverra, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 10, 2026

CORA - Pending Reviewers

Codeowners Entry Overall Num Files Owners
* 💬 1 @smartcontractkit/foundations, @smartcontractkit/core
/core/services/ocr*/ 💬 7 @smartcontractkit/foundations, @smartcontractkit/core
/core/services/ocr2/plugins/ocr2keeper/ 💬 2 @smartcontractkit/dev-services
/core/services/workflows/ 🚫 13 @smartcontractkit/keystone
.tool-versions 💬 1 @smartcontractkit/core

Legend: ✅ Approved | ❌ Changes Requested | 💬 Commented | 🚫 Dismissed | ⏳ Pending | ❓ Unknown

For more details, see the full review summary.

@github-actions
Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 10, 2026

✅ No conflicts with other open PRs targeting develop

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Risk Rating: MEDIUM

This PR improves the tools/test diagnose harness UX and performance by speeding up the ephemeral Postgres instance, enhancing progress/output behavior, and enriching the analyze/report pipeline (build-failure detection, slow reporting, runtime estimates, and metadata capture).

Changes:

  • Speed up test Postgres containers (tuned settings + tmpfs) and persist additional run metadata (e.g., Postgres version, has DB).
  • Improve diagnose runner output: better live-progress coordination, “analyzing” live timer, longest-possible runtime estimate, and stop-on-build-failure behavior.
  • Expand analyze/reporting: detect build failures from go test -json, adjust slow reporting to include top packages, and tweak summary/overall formatting.

Scrupulous human review recommended (high-impact logic):

  • tools/test/internal/runner/analyze.go: slow/top-packages merging and its interaction with summary metrics/CSV output.
  • tools/test/internal/runner/runner.go: new fail-fast-on-build-failure behavior and new AI-output markers (lpr_s:*, bf_stop ...).
  • tools/test/internal/db/db.go: Postgres container tuning (durability disabled, tmpfs) and any CI/platform implications.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tools/test/internal/runner/runner.go Diagnose runner: build-failure stop, runtime estimate output, improved progress/analyzing behavior, run metadata updates.
tools/test/internal/runner/runner_test.go Updated/added unit tests for analyzing progress, timeout parsing, runtime estimates, build-failure stop, serial progress mutex.
tools/test/internal/runner/diagnose_results_dir.go Include -run pattern in diagnose results directory slug.
tools/test/internal/runner/diagnose_progress.go More robust package-pattern detection when flags appear after packages; progress time clamping cleanup.
tools/test/internal/runner/diagnose_progress_test.go Added regression tests for progress-line vs digest-line merging behavior.
tools/test/internal/runner/analyze.go Build failure signals, slow-report restructuring (top packages), summary/overall formatting changes, new run meta fields.
tools/test/internal/runner/analyze_test.go Added coverage for build-failure detection and severity color output in summary.
tools/test/internal/repo/repo.go Use strings.SplitSeq iteration for go.mod parsing.
tools/test/internal/output/output.go Add NewForTest helper to control live-inline behavior in unit tests.
tools/test/internal/output/output_test.go Add test coverage for NewForTest live-inline behavior and AI-output interaction.
tools/test/internal/db/db.go Speed up test Postgres container via config knobs and tmpfs.
tools/test/internal/config/config.go Use strings.SplitSeq for fail-fast-on parsing.
tools/test/.agents/skills/chainlink-test-diagnosis/SKILL.md Update agent skill guidance and references layout.
tools/test/.agents/skills/chainlink-test-diagnosis/references/flaky-patterns/filter.md Add reference doc for a common flaky filter/logpoller pattern.
tools/test/.agents/skills/chainlink-test-diagnosis/eval/real-fix-shas.json Add eval metadata file mapping PR SHA(s) to real fixes.
core/services/ocr2/plugins/ocr2keeper/integration_test.go Switch pointer helpers to new(...) usage for config fields.
core/services/ocr2/plugins/ocr2keeper/evmregistry/v21/logprovider/integration_test.go Switch pointer helpers to new(...) usage for config fields.
core/services/ocr2/plugins/mercury/plugin_test.go Switch pointer helpers to new(...) usage in test fixtures.
core/services/ocr2/plugins/mercury/helpers_test.go Switch pointer helpers to new(...) usage; keep local ptr helper for remaining cases.
core/services/ocr2/plugins/llo/onchain_channel_definition_cache_integration_test.go Use maps.Copy for merging definitions; switch SHA3 import usage.
core/services/ocr2/plugins/llo/integration_test.go Misc test cleanups; config pointer creation updates; remove skipped subtest in favor of commented block.
core/services/ocr2/plugins/llo/helpers_test.go Listener creation with net.ListenConfig; test assertions changed to avoid hard-failing HTTP handler after read errors; pointer helper removal.
core/services/ocr2/plugins/llo/config/config.go Simplify validation control flow for channel definitions vs contract address.
core/services/ocr2/plugins/llo/config/config_test.go Tighten error assertions with require.EqualError.
.gitignore Ignore diagnose-attempted-fixes-*.jsonl.

Comment thread tools/test/internal/runner/analyze.go Outdated
Comment thread tools/test/internal/runner/analyze.go
Comment thread tools/test/internal/runner/analyze.go Outdated
@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented May 10, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@kalverra kalverra requested a review from a team as a code owner May 11, 2026 03:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 41 changed files in this pull request and generated 3 comments.

Comment thread tools/test/internal/runner/runner.go Outdated
Comment thread tools/test/internal/runner/analyze.go
Comment thread tools/test/internal/runner/analyze.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 41 changed files in this pull request and generated 3 comments.

Comment thread tools/test/internal/runner/runner.go
Comment thread tools/test/internal/runner/analyze.go
Comment thread core/services/ocr2/plugins/llo/integration_test.go Outdated
@kalverra kalverra changed the title [DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] Postgres diagnose Improvements + Fix Flaky Tests [DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] diagnosecmd + Skill Improvements + Fix Flaky Tests May 11, 2026
Comment thread tools/test/internal/db/db.go
Comment thread tools/test/internal/db/db.go
@kalverra kalverra enabled auto-merge May 11, 2026 14:39
@kalverra kalverra requested a review from Tofel May 11, 2026 15:11
Comment thread tools/test/internal/runner/analyze.go Outdated
Comment thread core/services/ocr2/plugins/llo/integration_test.go Outdated
Comment thread core/services/ocr2/plugins/llo/integration_test.go Outdated
@kalverra kalverra changed the title [DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] diagnosecmd + Skill Improvements + Fix Flaky Tests [DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] diagnose cmd + Skill Improvements + Fix Flaky Tests May 11, 2026
@kalverra kalverra requested a review from jmank88 May 11, 2026 17:21
jmank88
jmank88 previously approved these changes May 11, 2026
postgres.WithUsername("postgres"),
postgres.WithPassword("postgres"),
testcontainers.WithCmdArgs("-c", "max_connections=1000"),
testcontainers.WithCmdArgs(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this maintain parity with the one we use in our unit tests? Otherwise, we may see different errors using the diagnose tool and what we see in practice.

And if these flags reduce flakes, then perhaps they should be present for the instance running with our unit tests?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed we want to keep parity. Made this PR to help us do so: smartcontractkit/.github#1549

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread tools/test/internal/runner/runner.go Outdated
@cl-sonarqube-production
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants