[Test] Improve debuggability, stability and coverage of test_dcv_configuration and test_dcv_remote_access#7322
Conversation
9816039 to
8add2d0
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7322 +/- ##
========================================
Coverage 90.08% 90.08%
========================================
Files 182 182
Lines 16730 16730
========================================
Hits 15071 15071
Misses 1659 1659
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
24f0bcb to
301e69a
Compare
1b91b14 to
48f5768
Compare
test_dcv_configuration
test_dcv_configurationtest_dcv_configuration
48f5768 to
b89788c
Compare
de5798c to
7b54217
Compare
test_dcv_configurationtest_dcv_configuration and test_dcv_remote_access
7b54217 to
e5f9c88
Compare
|
[Non-blocking] Is it possible to keep this crash report irrespective of the DCV test i.e can we diagnose all the integ test clusters to run this crash report irrespective of which test we are running? This can generate false positives (some of which you have already identified) during our test runs. The advantage i see is that we will be aware of other components which could cause crashes in our product. This same script can be extended to detect Slurm related Coredumps (/var/log, /var/spool/slurmd, /var/tmp/ or var/spool/abrt) and generate Backtrace [1] https://slurm.schedmd.com/faq.html#backtrace This can be de-scoped to another PR. |
This is already possible. The logic to build the crash report is fully contained into the script get_crash_report.sh introduced in this PR. That script does not depend on the test itself. when we will work on the diagnosis script, we will include this script in the tollset. |
8b8ec60 to
765a35e
Compare
765a35e to
dd05b23
Compare
tests/integration-tests/tests/common/diagnosis/get_crash_report.sh
Outdated
Show resolved
Hide resolved
0d46567 to
cf045fe
Compare
…figuration` and `text_dcv_remote_access`: * debuggability: retrieve, print and analyze a comprehensive report of crashes (not only the crash filename, but the stack trace of the crash). Also, moved from hard assertions to soft assertions to have a final report of all the observed failures. * stability: prevent false positive failures, by ignoring harmless crashes related to gnome, unrelated to nvidia or dcv. Also fixed a gap that was causing failures when multiple instances of this test are executed in parallel by serializing the modifications to ssh known_hosts. * coverage: the test is now able to detect crashes on all supported OSs, not only Ubuntu.
cf045fe to
3012848
Compare
Description of changes
Improve stability, debuggability and coverage of
test_dcv_configurationandtest_dcv_remote_accessTests
test_dcv_configurationandtest_dcv_remote_accessexecuted on all OSs on both g5g.2xlarge and c5.xlarge. The test now fails only for relevant crashes reporting detailed information about those. The only relevant crash detected is related to an issue with DCV, so it is expected to report it.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.