CI: enable NTFC tests for qemu-armv8a (arm64)#18481
Conversation
c688a09 to
fe2d941
Compare
662b2ed
fe2d941 to
662b2ed
Compare
988ff81 to
a4917aa
Compare
73ba699 to
4fcbe2d
Compare
Enable NTFC for qemu-armv8a (arm64). QEMU for aarch64 architecture should be already on Docker image, so it should work. Signed-off-by: p-szafonimateusz <p-szafonimateusz@xiaomi.com>
allow ntfc tests for cmake builds Signed-off-by: p-szafonimateusz <p-szafonimateusz@xiaomi.com>
all targets should be build if CI-realted tools was changed Signed-off-by: p-szafonimateusz <p-szafonimateusz@xiaomi.com>
There was a problem hiding this comment.
Thanks @szafonimateusz-mi! Wonder if we can preempt any possible spikes in usage of GitHub Runners...
(1) Do we have the Complete CI running in your repo? We can analyse the number of hours consumed by the GitHub Runners. Update: I'm testing the Complete CI Build here
(2) Actually I'm very keen to analyse: How many GitHub Runner Hours are consumed across all our QEMU CI Checks? This might need more work.
(3) During High Loads: Do you think we should disable QEMU CI Checks? Maybe through a configurable flag, in some .github/*.yml file?
(4) If we're implementing the flag: I think we should exclude QEMU CI Checks during NuttX Release Builds. CI Checks are not really helpful for NuttX Release. And we always see a Spike in GitHub Runner Usage during NuttX Release Periods.
(5) Update: I think we should disable QEMU CI Checks for NuttX Apps Repo, during peak periods. Thanks :-)
@simbit18 Would you have anything to add?
|
@lupyuen hi, this adds only 15 minutes to total CI execution time: Each NTFC run returns a report of the test execution time, so it should be possible to extract this information from the CI logs for monitoring purposes. But I've never done it. The more tests we run, the longer it will take. The longest tests are the LPT tests that are enabled now only for the sim and risc-v 32bit targets. I think that automated testing with NTFC is much more valuable than building a lot of small configurations that basically test nothing and duplicate compilation coverage. We can get more benefits by disabling some of the builds than by disabling NTFC, e.g., by selecting specific configurations to compile, instead of compiling all the configs in boards. Unfortunately, this is a lot of work. Regarding disabling NTFC for nuttx-apps, this is also problematic because we lose coverage for apps. Disabling NTFC for releases probably makes more sense. |
|
Disabling NTFC during heavy load seems like the worst solution to me. When we have a heavy CI load, it means a lot of changes are being pushed upstream, which is the most error-prone period. It's precisely at these moments NTFC is most useful. Compilation errors are trivial to detect and fix, errors in the operation of the OS that NTFC allows to detect are difficult to detect and then repair. So detecting the latter should be a priority. |
|
@szafonimateusz-mi Actually there's a simpler way to check the impact of any CI Change on our usage of GitHub Runners. In my repo: I applied your patches and did a complete build. Under the GitHub Build Log > Run Details > Usage, I see that the Arm64 build consumes 54 minutes of GitHub Runners... https://github.com/lupyuen9/nuttx/actions/runs/22611386877/usage
Let's compare this with the NuttX Mirror Repo, which reports that the Arm64 build usually consumes 57 minutes of GitHub Runners... https://github.com/NuttX/nuttx/actions/runs/22611366354/usage
So we actually reduced the usage of GitHub Runners by 3 minutes? (Assuming I applied your patches correctly) Or maybe the difference is negligible :-) |
There was a problem hiding this comment.
Disabling NTFC during heavy load seems like the worst solution to me.
OK I can't be around here forever monitoring our GitHub Load. I just hope: If we see a spike in GitHub Usage again, we can all work together to solve it.
@simbit18 In case I'm not around, and Infra Team is threatening to shut down GitHub CI, I suggest we seriously consider (temporarily) disabling / renaming all the citest*/defconfg files in NuttX Repo and NuttX Apps Repo (also ask everyone to Rebase To Master)...
$ ls boards/*/*/*/configs/citest*/defconfig
boards/arm/imx6/sabre-6quad/configs/citest/defconfig
boards/arm64/fvp-v8r/fvp-armv8r/configs/citest_smp/defconfig
boards/arm64/fvp-v8r/fvp-armv8r/configs/citest/defconfig
boards/arm64/qemu/qemu-armv8a/configs/citest_smp/defconfig
boards/arm64/qemu/qemu-armv8a/configs/citest/defconfig
boards/risc-v/qemu-rv/rv-virt/configs/citest/defconfig
boards/risc-v/qemu-rv/rv-virt/configs/citest64/defconfig
boards/sim/sim/sim/configs/citest/defconfig|
To quickly disable NTFC, we can simply remove the "executable" flag for |
|
Ah I thought something looks sus :-) So if I change the repo name, it will work?
Let's wait for the results thanks :-) https://github.com/lupyuen9/nuttx/actions/runs/22618072154 Updated: I restarted the Arm64 job here: https://github.com/lupyuen9/nuttx/actions/runs/22621116534 |
|
@lupyuen yes, it should work this way :) |
|
OK the Arm64 Job now consumes 70 minutes of GitHub Runners... https://github.com/lupyuen9/nuttx/actions/runs/22621116534/usage
Compare this with the NuttX Mirror Repo, which reports that the Arm64 build usually consumes 57 minutes of GitHub Runners... https://github.com/NuttX/nuttx/actions/runs/22611366354/usage
So the new Arm64 Job takes 13 additional minutes (23% increase) of GitHub Runners. Let's remember this benchmark as we enable more NTFC tests. Thanks :-) |
Hi Lup, did you notice that MacOS sim-03 also is taking a long time? Any idea why it is happening? |
Thank you @raiden00pl, but my question was more: why specifically macos sim-03 is delaying about 2h to build? |
Hi @lupyuen , as a temporary solution in this case, we can also temporarily disable the workflow. |
|
I agree with @raiden00pl that running an actual CI test suite is almost infinitely more valuable than just the build tests, but the concerns about runner usage are valid. We have insanely long workflows and they are really only understood/maintained by a few people (thankfully we have @lupyuen and @simbit18 !). We seem to revisit this discussion very frequently, but it's probably time that we nail down some concrete actions for reducing the CI resource consumption (ideally without compromising quality). I wonder if we can get some CI experts to help us out, since most people here are very embedded-focused. EDIT: dare I suggest we look at FreeRTOS/Zephyr's CI workflows for inspiration? |
|
@lupyuen , I agree with @raiden00pl that it is preferable to run NTFC tests and perhaps reduce the configurations by combining them in Jumbo and build with CMake. |
|
By the way @szafonimateusz-mi , could you mention this addition here: https://nuttx.apache.org/docs/latest/testing/citests.html |
|
@linguini1 it's already mentioned: webpage probably hasn't been refreshed yet |
|
@simbit18 FYI I ran a Complete Build with CI Tests all disabled... The Build Without CI Tests consumed 26 hours of GitHub Runners: https://github.com/lupyuen10/nuttx/actions/runs/22662992534/usage
Compare this with a Normal Build With CI Tests, which consumes 29 hours of GitHub Runners: https://github.com/lupyuen9/nuttx/actions/runs/22621116534/usage
Which means: Disabling all CI Tests might reduce our GitHub Runners by 10%. (Yeah not really significant) Someday we could possibly offload the CI Tests to an External Server / External GitHub Account. And the External Server will Post a Status to the PR, to indicate whether the CI Tests were successful / failed. |
|
Thank you very much, @lupyuen |
|
@simbit18 I wonder if we could run a script to identify the Overlapping Defconfigs? Then we skip the Smaller Defconfigs that are wholly included inside Bigger Defconfigs? Hmm... |
|
@lupyuen , that's definitely a great idea. It could be useful for better understanding how many Defconfigs we can avoid. |
|
Hi @simbit18 @lupyuen I have been brainstorming about this as well. A solution I've thought of to help reduce some resources (just one step) is to compare path names of the modified files. This way, a modification to bcm2711 files only (for example) will only trigger builds for the RPi4B configs and not all of arm64. I have no working prototype yet as I've just been trying to get an understanding of the existing CI to figure out how to integrate that without breaking things. |












Summary
all targets should be build if CI-realted tools was changed
Impact
citest:
citest_smp:
in total, this adds ~15 min to the test execution time
Testing
CI