Skip to content

Ambient LED mode: some optimizations and bugfixes#645

Open
DrFlarp wants to merge 7 commits intoLoveRetro:mainfrom
DrFlarp:ambient-speedup
Open

Ambient LED mode: some optimizations and bugfixes#645
DrFlarp wants to merge 7 commits intoLoveRetro:mainfrom
DrFlarp:ambient-speedup

Conversation

@DrFlarp
Copy link

@DrFlarp DrFlarp commented Feb 9, 2026

I'm not really married to any of these ideas, but the thing about spending lots of CPU time calculating the color of LEDs seems worth a review.

  1. Remove the second for loop - sometimes this function does a whole second pass over every pixel which doubles our CPU time

  2. Don't use fminf() and fmaxf() because this casts to float internally - writing out min and max the tedious way is roughly 16% faster for free

  3. Don't sample every pixel; blend the resulting color with the previous frame to reduce noisiness.

The last one of these has by far the most impact on the runtime of the function, directly proportional to the number of pixels skipped. I can imagine a worst-case scenario for skipping pixels would be a scrolling grid pattern. With the way that retro tile-based graphics work I have a hunch that scanning every 7 lines instead of 8 would be a simple and effective way to mitigate this - I haven't tested this yet. I haven't looked into the possibility of SIMD here btw, what do you guys think?

on CPU Speed Normal here is what I profiled (MGBA, 1000 frame test sequence, take exact numbers with a grain of salt)
881us - base performance (depends on test sequence - this was not a worst-case scenario w.r.t. the extra for loop)
830us - remove inner for loop
696us - don't use fminf() and fmaxf()
16us - only check every 8 pixels in both x and y directions (?!?)

Open to feedback as usual :)

@frysee
Copy link
Member

frysee commented Feb 9, 2026

Thanks for picking something thats probably too far down on my list for the forseeable future, much appreciated! :)
I havent looked at the sampling implementation in forever, but all your points seem sensible.

A little NEON might go a long way as well, but you'll have to see if you can find a good way to combine that with your manual downsampling approach (without having to copy stuff around).

If you blend the result across multiple frames, you could also think about modulating the sample indices you end up using to make sure you catch all possible colors on static content.

@DrFlarp
Copy link
Author

DrFlarp commented Feb 10, 2026

https://github.com/DrFlarp/NextUI/blob/27c2d9499b5b08f75c91c66f6d473305e0669e02/workspace/all/common/api.c#L541

Are these values the same across all devices? Or will we have to eventually move some of this to platform.c?

@frysee
Copy link
Member

frysee commented Feb 10, 2026

I hate magic numbers. What is 1/2/5? I'm assuming its either mode or area?

@DrFlarp
Copy link
Author

DrFlarp commented Feb 10, 2026

I hate magic numbers. What is 1/2/5? I'm assuming its either mode or area?

mode appears to be the integer value of ambient mode setting itself, i.e. which LEDs to animate. (All, Top, FN, etc...). Should probably be an enum.

lightsambient[x] is nextui's index for the lights - is this consistent across all supported platforms or do we have to factor this part out to platform.c?

@frysee
Copy link
Member

frysee commented Feb 10, 2026

Hm, we could either do a generic approach and designate an enum that can be used as a bitmask, or keep it as simple as possible. The whole thing could be a lot more elegant, depends on how much time you want to spend.

@DrFlarp
Copy link
Author

DrFlarp commented Feb 13, 2026

Going to submit this as-is for now, maybe I'll take another look in the future :)

@DrFlarp DrFlarp changed the title Optimizations for ambient LED mode Ambient LED mode: some optimizations and bugfixes Feb 13, 2026
@DrFlarp DrFlarp marked this pull request as ready for review February 13, 2026 12:25
max_c = max_c > b ? max_c : b;

// min_c = min(min(r, g), b)
uint8_t min_c = r < g ? r : g;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can save one comparison here (due to the fact that we know max_c already) - speedup is probably not worth it though.

Copy link
Author

@DrFlarp DrFlarp Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting something like this?

TBH I prefer the extra comparison for the better readability but I can give this a profile to see how it goes on hardware.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint8_t max_c = r;
if (g > max_c) max_c = g;
if (b > max_c) max_c = b;

uint8_t min_c;

if (max_c == r)
    min_c = g < b ? g : b;
else if (max_c == g)
    min_c = r < b ? r : b;
else
    min_c = r < g ? r : g;

Saves one comparison. Its probably either negligible or -02 is smart enough to optimize it that way. Not the most critical thing in the world 🤷‍♂️

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna still profile this for funsies :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants