Skip to content

Conversation

@ZERICO2005
Copy link
Contributor

@ZERICO2005 ZERICO2005 commented Jan 5, 2026

makes memcpy 1F faster and memmove 2F faster in the common case.

The timings can be adjusted slightly for memmove (src != dst version), depending on which is faster/better on average

    ; src >= dst | LDIR | 30F + 15R + 1 ; jr false
    ; src <  dst | LDDR | 34F + 12R + 2 ; jr true

    ; src >= dst | LDIR | 31F + 15R + 2 ; jr true
    ; src <  dst | LDDR | 33F + 12R + 1 ; jr false

@mateoconlechuga
Copy link
Member

Looks good, let's remove the PREFER_OS_LIBC option and just use these directly. I assume these optimizations are valid @calc84maniac?

_memcpy:
ld iy, -1
; size > 0 : 25F + 15R + 1 + LDIR
; size >= 65536 : 32F + 16R + 3 + LDIR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is no longer applicable, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I will fix that

@calc84maniac
Copy link
Contributor

If the two memmove implementations are now identical aside from a ret z, maybe put only that in the .if 0 to keep it more maintainable (or remove the one we're not using entirely)

@ZERICO2005
Copy link
Contributor Author

ZERICO2005 commented Jan 7, 2026

They differ slightly. The src == dst version has to have to jr to LDDR and fall-through to LDIR to keep timings equal. Whereas the src != dst version can either have the LDIR or LDDR path be the faster path (by 1F + 1)

The LDIR path has an extra 3R and the LDDR path has an extra 3F, so I wasn't sure which would be better to allocate the 1F + 1 savings to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants