Skip to content

Modify URM Test Runner#459

Open
kartnema wants to merge 1 commit into
qualcomm-linux:mainfrom
kartnema:improve-urm-test-runner
Open

Modify URM Test Runner#459
kartnema wants to merge 1 commit into
qualcomm-linux:mainfrom
kartnema:improve-urm-test-runner

Conversation

@kartnema
Copy link
Copy Markdown
Contributor

@kartnema kartnema commented May 14, 2026

  • Add support to delete any stale lock files created in previous runs
  • Remove log-based result mapping (parse_and_score_log) and solely rely on the return code (0 or 1). The URM Test Runner already handles the return code, which the script can rely on. Judging pass / fail based on the logs results in some false positives.
  • Modify test skip logic, a suite should only be skipped if it is not in the approved list, of if the required configs are not found, or if the test executable itself is absent. Skipping of one or more tests part of the testsuite, should not result in an overall SKIP qualification, it should be either PASS OR FAIL.

@kartnema kartnema force-pushed the improve-urm-test-runner branch from 5717167 to fc0863e Compare May 14, 2026 07:44
# ---------- Lock (avoid concurrent runs on same host) ----------
LOCKFILE="/tmp/${TESTNAME}.lock"
# Delete stale lock file, if it exists
rm "${LOCKFILE}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not remove the lock file before flock. With flock, stale lock files are harmless because the lock is tied to the open file descriptor, not just the path. Keep the file and let flock -n decide whether another run is active. If stale directory cleanup is required for the no-flock fallback, handle only the directory fallback and only when it is provably stale. Do not remove the flock file before acquiring the lock.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that even while attempting fresh runs (i.e. flashing a new image and then running the tests through Axiom) we are running into the error that: "another userspace resource manager run" is active. Which should not be happening, since this is the first run.

Also, as mentioned, each time a CI/CD job is triggered a new image (from qli-artifacts) will be flashed on the device (and we won't be using this device directly manually), hence I do not see much use for this lock-protection code in general.

Copy link
Copy Markdown

@varunsinghal29 varunsinghal29 May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smuppand even we tried this as well.

if command -v flock >/dev/null 2>&1; then
exec 9>"$LOCKFILE"
if ! flock -n 9; then
log_warn "Another $ run is active; skipping"
echo "$TESTNAME SKIP" >"$RES_FILE"
exit 0
fi
trap 'exec 9>&-' EXIT INT TERM [use to release the lock]

still not working.

fi

case $score in
case $rc in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only return codes 0 and 1 are handled. Any other non-zero return code, such as:

124 from timeout

126 command found but not executable

127 command not found

signal-based exits

will not be explicitly handled. Depending on shell behavior, this can cause run_one() to return success or an unintended status, leading to wrong PASS/FAIL accounting.

Copy link
Copy Markdown
Contributor Author

@kartnema kartnema May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URM test framework handles most of the exceptions internally (through exception handling) and returns an aggregate status code (https://github.com/qualcomm/userspace-resource-manager/blob/main/tests/Utils/URMTests.cpp#L123), which will always be 0 or 1. I wanted to map the script behavior more closely with that of the URM Test Framework.

Also, since we are checking whether the executable exists (and whether it is executable) earlier in the script already, hence those issues won't factor in later.

In any case, i'll add a default (wildcard) to the shell script switching logic as a fallaback.

@smuppand
Copy link
Copy Markdown
Contributor

@kartnema Please include an appropriate commit message and sign off on the commit to resolve the CI failures.

- Add support to delete any stale lock files created in previous runs
- Remove log-based result mapping (parse_and_score_log) and solely rely on the return code (0 or 1).
  The URM Test Runner already handles the return code, which the script can rely on. Judging pass / fail
  based on the logs results in some false positives.
- Modify test skip logic, a suite should only be skipped if it is not in the approved list, of if the
  required configs are not found, or if the test executable itself is absent (or is not marked executable).
  Skipping of one or more tests part of the testsuite, should not result in an overall SKIP qualification,
  it should be either PASS OR FAIL.

Signed-off-by: Kartik Nema <kartnema@qti.qualcomm.com>
@kartnema kartnema force-pushed the improve-urm-test-runner branch from fc0863e to 61baee1 Compare May 15, 2026 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants