Modify URM Test Runner#459
Conversation
5717167 to
fc0863e
Compare
| # ---------- Lock (avoid concurrent runs on same host) ---------- | ||
| LOCKFILE="/tmp/${TESTNAME}.lock" | ||
| # Delete stale lock file, if it exists | ||
| rm "${LOCKFILE}" |
There was a problem hiding this comment.
Do not remove the lock file before flock. With flock, stale lock files are harmless because the lock is tied to the open file descriptor, not just the path. Keep the file and let flock -n decide whether another run is active. If stale directory cleanup is required for the no-flock fallback, handle only the directory fallback and only when it is provably stale. Do not remove the flock file before acquiring the lock.
There was a problem hiding this comment.
The issue is that even while attempting fresh runs (i.e. flashing a new image and then running the tests through Axiom) we are running into the error that: "another userspace resource manager run" is active. Which should not be happening, since this is the first run.
Also, as mentioned, each time a CI/CD job is triggered a new image (from qli-artifacts) will be flashed on the device (and we won't be using this device directly manually), hence I do not see much use for this lock-protection code in general.
There was a problem hiding this comment.
@smuppand even we tried this as well.
if command -v flock >/dev/null 2>&1; then
exec 9>"$LOCKFILE"
if ! flock -n 9; then
log_warn "Another $ run is active; skipping"
echo "$TESTNAME SKIP" >"$RES_FILE"
exit 0
fi
trap 'exec 9>&-' EXIT INT TERM [use to release the lock]
still not working.
| fi | ||
|
|
||
| case $score in | ||
| case $rc in |
There was a problem hiding this comment.
Only return codes 0 and 1 are handled. Any other non-zero return code, such as:
124 from timeout
126 command found but not executable
127 command not found
signal-based exits
will not be explicitly handled. Depending on shell behavior, this can cause run_one() to return success or an unintended status, leading to wrong PASS/FAIL accounting.
There was a problem hiding this comment.
The URM test framework handles most of the exceptions internally (through exception handling) and returns an aggregate status code (https://github.com/qualcomm/userspace-resource-manager/blob/main/tests/Utils/URMTests.cpp#L123), which will always be 0 or 1. I wanted to map the script behavior more closely with that of the URM Test Framework.
Also, since we are checking whether the executable exists (and whether it is executable) earlier in the script already, hence those issues won't factor in later.
In any case, i'll add a default (wildcard) to the shell script switching logic as a fallaback.
|
@kartnema Please include an appropriate commit message and sign off on the commit to resolve the CI failures. |
- Add support to delete any stale lock files created in previous runs - Remove log-based result mapping (parse_and_score_log) and solely rely on the return code (0 or 1). The URM Test Runner already handles the return code, which the script can rely on. Judging pass / fail based on the logs results in some false positives. - Modify test skip logic, a suite should only be skipped if it is not in the approved list, of if the required configs are not found, or if the test executable itself is absent (or is not marked executable). Skipping of one or more tests part of the testsuite, should not result in an overall SKIP qualification, it should be either PASS OR FAIL. Signed-off-by: Kartik Nema <kartnema@qti.qualcomm.com>
fc0863e to
61baee1
Compare
Uh oh!
There was an error while loading. Please reload this page.