Ambient LED mode: some optimizations and bugfixes#645
Ambient LED mode: some optimizations and bugfixes#645DrFlarp wants to merge 7 commits intoLoveRetro:mainfrom
Conversation
|
Thanks for picking something thats probably too far down on my list for the forseeable future, much appreciated! :) A little NEON might go a long way as well, but you'll have to see if you can find a good way to combine that with your manual downsampling approach (without having to copy stuff around). If you blend the result across multiple frames, you could also think about modulating the sample indices you end up using to make sure you catch all possible colors on static content. |
…n RGB565 and RGB888 cores
…ve effects of scrolling 2D tiles
|
Are these values the same across all devices? Or will we have to eventually move some of this to platform.c? |
|
I hate magic numbers. What is 1/2/5? I'm assuming its either mode or area? |
mode appears to be the integer value of ambient mode setting itself, i.e. which LEDs to animate. (All, Top, FN, etc...). Should probably be an enum. lightsambient[x] is nextui's index for the lights - is this consistent across all supported platforms or do we have to factor this part out to platform.c? |
|
Hm, we could either do a generic approach and designate an enum that can be used as a bitmask, or keep it as simple as possible. The whole thing could be a lot more elegant, depends on how much time you want to spend. |
|
Going to submit this as-is for now, maybe I'll take another look in the future :) |
| max_c = max_c > b ? max_c : b; | ||
|
|
||
| // min_c = min(min(r, g), b) | ||
| uint8_t min_c = r < g ? r : g; |
There was a problem hiding this comment.
We can save one comparison here (due to the fact that we know max_c already) - speedup is probably not worth it though.
There was a problem hiding this comment.
Are you suggesting something like this?
TBH I prefer the extra comparison for the better readability but I can give this a profile to see how it goes on hardware.
There was a problem hiding this comment.
uint8_t max_c = r;
if (g > max_c) max_c = g;
if (b > max_c) max_c = b;
uint8_t min_c;
if (max_c == r)
min_c = g < b ? g : b;
else if (max_c == g)
min_c = r < b ? r : b;
else
min_c = r < g ? r : g;Saves one comparison. Its probably either negligible or -02 is smart enough to optimize it that way. Not the most critical thing in the world 🤷♂️
There was a problem hiding this comment.
Gonna still profile this for funsies :)
I'm not really married to any of these ideas, but the thing about spending lots of CPU time calculating the color of LEDs seems worth a review.
Remove the second for loop - sometimes this function does a whole second pass over every pixel which doubles our CPU time
Don't use fminf() and fmaxf() because this casts to float internally - writing out min and max the tedious way is roughly 16% faster for free
Don't sample every pixel; blend the resulting color with the previous frame to reduce noisiness.
The last one of these has by far the most impact on the runtime of the function, directly proportional to the number of pixels skipped. I can imagine a worst-case scenario for skipping pixels would be a scrolling grid pattern. With the way that retro tile-based graphics work I have a hunch that scanning every 7 lines instead of 8 would be a simple and effective way to mitigate this - I haven't tested this yet. I haven't looked into the possibility of SIMD here btw, what do you guys think?
on CPU Speed Normal here is what I profiled (MGBA, 1000 frame test sequence, take exact numbers with a grain of salt)
881us - base performance (depends on test sequence - this was not a worst-case scenario w.r.t. the extra for loop)
830us - remove inner for loop
696us - don't use fminf() and fmaxf()
16us - only check every 8 pixels in both x and y directions (?!?)
Open to feedback as usual :)