[DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] diagnose cmd + Skill Improvements + Fix Flaky Tests#22368
[DX-3764] [CRE-3579] [CRE-3578] [CRE-3577] diagnose cmd + Skill Improvements + Fix Flaky Tests#22368kalverra wants to merge 18 commits into
diagnose cmd + Skill Improvements + Fix Flaky Tests#22368Conversation
|
👋 kalverra, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
CORA - Pending Reviewers
Legend: ✅ Approved | ❌ Changes Requested | 💬 Commented | 🚫 Dismissed | ⏳ Pending | ❓ Unknown For more details, see the full review summary. |
|
I see you updated files related to
|
|
✅ No conflicts with other open PRs targeting |
There was a problem hiding this comment.
Pull request overview
Risk Rating: MEDIUM
This PR improves the tools/test diagnose harness UX and performance by speeding up the ephemeral Postgres instance, enhancing progress/output behavior, and enriching the analyze/report pipeline (build-failure detection, slow reporting, runtime estimates, and metadata capture).
Changes:
- Speed up test Postgres containers (tuned settings + tmpfs) and persist additional run metadata (e.g., Postgres version, has DB).
- Improve diagnose runner output: better live-progress coordination, “analyzing” live timer, longest-possible runtime estimate, and stop-on-build-failure behavior.
- Expand analyze/reporting: detect build failures from
go test -json, adjust slow reporting to include top packages, and tweak summary/overall formatting.
Scrupulous human review recommended (high-impact logic):
tools/test/internal/runner/analyze.go: slow/top-packages merging and its interaction with summary metrics/CSV output.tools/test/internal/runner/runner.go: new fail-fast-on-build-failure behavior and new AI-output markers (lpr_s:*,bf_stop ...).tools/test/internal/db/db.go: Postgres container tuning (durability disabled, tmpfs) and any CI/platform implications.
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/test/internal/runner/runner.go | Diagnose runner: build-failure stop, runtime estimate output, improved progress/analyzing behavior, run metadata updates. |
| tools/test/internal/runner/runner_test.go | Updated/added unit tests for analyzing progress, timeout parsing, runtime estimates, build-failure stop, serial progress mutex. |
| tools/test/internal/runner/diagnose_results_dir.go | Include -run pattern in diagnose results directory slug. |
| tools/test/internal/runner/diagnose_progress.go | More robust package-pattern detection when flags appear after packages; progress time clamping cleanup. |
| tools/test/internal/runner/diagnose_progress_test.go | Added regression tests for progress-line vs digest-line merging behavior. |
| tools/test/internal/runner/analyze.go | Build failure signals, slow-report restructuring (top packages), summary/overall formatting changes, new run meta fields. |
| tools/test/internal/runner/analyze_test.go | Added coverage for build-failure detection and severity color output in summary. |
| tools/test/internal/repo/repo.go | Use strings.SplitSeq iteration for go.mod parsing. |
| tools/test/internal/output/output.go | Add NewForTest helper to control live-inline behavior in unit tests. |
| tools/test/internal/output/output_test.go | Add test coverage for NewForTest live-inline behavior and AI-output interaction. |
| tools/test/internal/db/db.go | Speed up test Postgres container via config knobs and tmpfs. |
| tools/test/internal/config/config.go | Use strings.SplitSeq for fail-fast-on parsing. |
| tools/test/.agents/skills/chainlink-test-diagnosis/SKILL.md | Update agent skill guidance and references layout. |
| tools/test/.agents/skills/chainlink-test-diagnosis/references/flaky-patterns/filter.md | Add reference doc for a common flaky filter/logpoller pattern. |
| tools/test/.agents/skills/chainlink-test-diagnosis/eval/real-fix-shas.json | Add eval metadata file mapping PR SHA(s) to real fixes. |
| core/services/ocr2/plugins/ocr2keeper/integration_test.go | Switch pointer helpers to new(...) usage for config fields. |
| core/services/ocr2/plugins/ocr2keeper/evmregistry/v21/logprovider/integration_test.go | Switch pointer helpers to new(...) usage for config fields. |
| core/services/ocr2/plugins/mercury/plugin_test.go | Switch pointer helpers to new(...) usage in test fixtures. |
| core/services/ocr2/plugins/mercury/helpers_test.go | Switch pointer helpers to new(...) usage; keep local ptr helper for remaining cases. |
| core/services/ocr2/plugins/llo/onchain_channel_definition_cache_integration_test.go | Use maps.Copy for merging definitions; switch SHA3 import usage. |
| core/services/ocr2/plugins/llo/integration_test.go | Misc test cleanups; config pointer creation updates; remove skipped subtest in favor of commented block. |
| core/services/ocr2/plugins/llo/helpers_test.go | Listener creation with net.ListenConfig; test assertions changed to avoid hard-failing HTTP handler after read errors; pointer helper removal. |
| core/services/ocr2/plugins/llo/config/config.go | Simplify validation control flow for channel definitions vs contract address. |
| core/services/ocr2/plugins/llo/config/config_test.go | Tighten error assertions with require.EqualError. |
| .gitignore | Ignore diagnose-attempted-fixes-*.jsonl. |
…postgresTestImprovements
diagnose Improvements + Fix Flaky Testsdiagnosecmd + Skill Improvements + Fix Flaky Tests
diagnosecmd + Skill Improvements + Fix Flaky Testsdiagnose cmd + Skill Improvements + Fix Flaky Tests
| postgres.WithUsername("postgres"), | ||
| postgres.WithPassword("postgres"), | ||
| testcontainers.WithCmdArgs("-c", "max_connections=1000"), | ||
| testcontainers.WithCmdArgs( |
There was a problem hiding this comment.
Should this maintain parity with the one we use in our unit tests? Otherwise, we may see different errors using the diagnose tool and what we see in practice.
And if these flags reduce flakes, then perhaps they should be present for the instance running with our unit tests?
There was a problem hiding this comment.
Agreed we want to keep parity. Made this PR to help us do so: smartcontractkit/.github#1549
|




Improve
fix-chainlink-testsSkillchainlink-diagnose-teststofix-chainlink-tests, better aligned with best practices.diagnoseloop.fix-attempt-*.jsonlfile to preserve past attempts and reduce context compaction consequences.Faster (kinda not really) Postgres Tests
Tuned our test postgres instances to discard some production protections that don't help us, and only slow down tests. This wasn't very effective at speeding up tests, but it also didn't hurt.
Leftis before the changes,Rightis after.If anything, this indicates to me that our bottleneck for test speed IS NOT the postgres instance.
Fix Flaky Tests
Show off the skills
Fix flaky tests in
core/services/workflows/syncer/packageTests in this package were using hardcoded DB keys, meaning they would stomp on each other all the time. The
diagnoseruns found these flake rates:Trunk.io has these tests and tickets listed for this package:
24% flaky| CRE-357724% flaky| CRE-357818% flaky| CRE-3579diagnoseruns with--iterations 100after the fixes show a0%flake rate!Review
There are a lot of linting fixes in this PR that we did as part of running experiments that can largely be ignored for reviews. Focus on changes in:
tools/test/: Updates to thediagnoseharness and the AI skill.