Skip to content

[Test] Improve debuggability, stability and coverage of test_dcv_configuration and test_dcv_remote_access#7322

Open
gmarciani wants to merge 1 commit intoaws:developfrom
gmarciani:wip/mgiacomo/3160/test-dcv-0325-1
Open

[Test] Improve debuggability, stability and coverage of test_dcv_configuration and test_dcv_remote_access#7322
gmarciani wants to merge 1 commit intoaws:developfrom
gmarciani:wip/mgiacomo/3160/test-dcv-0325-1

Conversation

@gmarciani
Copy link
Copy Markdown
Contributor

@gmarciani gmarciani commented Mar 30, 2026

Description of changes

Improve stability, debuggability and coverage of test_dcv_configuration and test_dcv_remote_access

  • debuggability: retrieve, print and analyze a comprehensive report of crashes (not only the crash filename, but the stack trace of the crash). Also, moved from hard assertions to soft assertions to have a final report of all the observed failures.
  • stability: prevent false positive failures, by ignoring harmless crashes related to gnome, unrelated to nvidia or dcv. Also fixed a gap that was causing failures when multiple instances of this test are executed in parallel by serializing the modifications to ssh known_hosts.
  • coverage: the test is now able to detect crashes on all supported OSs, not only Ubuntu.

Tests

  1. test_dcv_configuration and test_dcv_remote_access executed on all OSs on both g5g.2xlarge and c5.xlarge. The test now fails only for relevant crashes reporting detailed information about those. The only relevant crash detected is related to an issue with DCV, so it is expected to report it.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/test-dcv-0325-1 branch from 9816039 to 8add2d0 Compare March 30, 2026 22:32
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.08%. Comparing base (0a83ff2) to head (8add2d0).
⚠️ Report is 6 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #7322   +/-   ##
========================================
  Coverage    90.08%   90.08%           
========================================
  Files          182      182           
  Lines        16730    16730           
========================================
  Hits         15071    15071           
  Misses        1659     1659           
Flag Coverage Δ
unittests 90.08% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gmarciani gmarciani changed the title DRAFT [Test] Make test_dcv_configuration able to print out the conten… DRAFT [Test] Improve DCV test crash file detection with cross-OS support Mar 31, 2026
@gmarciani gmarciani added skip-changelog-update Disables the check that enforces changelog updates in PRs 3.x Test labels Mar 31, 2026
@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/test-dcv-0325-1 branch 2 times, most recently from 24f0bcb to 301e69a Compare March 31, 2026 16:29
@gmarciani gmarciani changed the title DRAFT [Test] Improve DCV test crash file detection with cross-OS support [Test] Improve DCV test crash file detection with cross-OS support Mar 31, 2026
@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/test-dcv-0325-1 branch 5 times, most recently from 1b91b14 to 48f5768 Compare April 3, 2026 13:43
@gmarciani gmarciani changed the title [Test] Improve DCV test crash file detection with cross-OS support [Test] Improve stability, debuggability and coverage of test_dcv_configuration Apr 3, 2026
@gmarciani gmarciani changed the title [Test] Improve stability, debuggability and coverage of test_dcv_configuration [Test] Improve debuggability, stability and coverage of test_dcv_configuration Apr 3, 2026
@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/test-dcv-0325-1 branch from 48f5768 to b89788c Compare April 3, 2026 13:54
@gmarciani gmarciani marked this pull request as ready for review April 3, 2026 13:54
@gmarciani gmarciani requested review from a team as code owners April 3, 2026 13:54
@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/test-dcv-0325-1 branch 3 times, most recently from de5798c to 7b54217 Compare April 3, 2026 14:20
@gmarciani gmarciani changed the title [Test] Improve debuggability, stability and coverage of test_dcv_configuration [Test] Improve debuggability, stability and coverage of test_dcv_configuration and test_dcv_remote_access Apr 3, 2026
@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/test-dcv-0325-1 branch from 7b54217 to e5f9c88 Compare April 3, 2026 14:39
@himani2411
Copy link
Copy Markdown
Contributor

himani2411 commented Apr 3, 2026

[Non-blocking] Is it possible to keep this crash report irrespective of the DCV test i.e can we diagnose all the integ test clusters to run this crash report irrespective of which test we are running?

This can generate false positives (some of which you have already identified) during our test runs. The advantage i see is that we will be aware of other components which could cause crashes in our product.

This same script can be extended to detect Slurm related Coredumps (/var/log, /var/spool/slurmd, /var/tmp/ or var/spool/abrt) and generate Backtrace

[1] https://slurm.schedmd.com/faq.html#backtrace
[2] https://slurm.schedmd.com/faq.html#core_dump

This can be de-scoped to another PR.

@gmarciani
Copy link
Copy Markdown
Contributor Author

[Non-blocking] Is it possible to keep this crash report irrespective of the DCV test i.e can we diagnose all the integ test clusters to run this crash report irrespective of which test we are running?

This can generate false positives (some of which you have already identified) during our test runs. The advantage i see is that we will be aware of other components which could cause crashes in our product.

This same script can be extended to detect Slurm related Coredumps (/var/log, /var/spool/slurmd, /var/tmp/ or var/spool/abrt) and generate Backtrace

[1] https://slurm.schedmd.com/faq.html#backtrace [2] https://slurm.schedmd.com/faq.html#core_dump

This can be de-scoped to another PR.

This is already possible. The logic to build the crash report is fully contained into the script get_crash_report.sh introduced in this PR. That script does not depend on the test itself. when we will work on the diagnosis script, we will include this script in the tollset.

@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/test-dcv-0325-1 branch from 8b8ec60 to 765a35e Compare April 10, 2026 17:10
@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/test-dcv-0325-1 branch from 765a35e to dd05b23 Compare April 10, 2026 17:40
@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/test-dcv-0325-1 branch 5 times, most recently from 0d46567 to cf045fe Compare April 14, 2026 19:41
…figuration` and `text_dcv_remote_access`:

  * debuggability: retrieve, print and analyze a comprehensive report of crashes (not only the crash filename, but the stack trace of the crash).  Also, moved from hard assertions to soft assertions to have a final report of all the observed failures.
  * stability: prevent false positive failures, by ignoring harmless crashes related to gnome, unrelated to nvidia or dcv. Also fixed a gap that was causing failures when multiple instances of this test are executed in parallel by serializing the modifications to ssh known_hosts.
  * coverage: the test is now able to detect crashes on all supported OSs, not only Ubuntu.
@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/test-dcv-0325-1 branch from cf045fe to 3012848 Compare April 14, 2026 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.x skip-changelog-update Disables the check that enforces changelog updates in PRs Test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants