Skip to content

Conversation

@yzewei
Copy link
Contributor

@yzewei yzewei commented Jan 19, 2026

Issue: #3369

This PR fixes the issue where numpy failed to load on LoongArch due to missing X86_V2 baseline features (specifically LAHF from leaf 0x80000001).

Even though tmp32u is uint32_t, on LoongArch it is passed in a 64-bit register with dirty high bits. The switch statement was compiling into a 64-bit comparison, causing the check for 0x80000001 to fail.

The tmp32u function interface is retained.
Current, NumPy imports successfully without error.(slowly)

@yzewei yzewei marked this pull request as draft January 19, 2026 09:17
@yzewei
Copy link
Contributor Author

yzewei commented Jan 19, 2026

Or is there a better way to write it? @ptitSeb @ksco @xiangzhai

@ptitSeb
Copy link
Owner

ptitSeb commented Jan 19, 2026

If RAX is used, then the tmp32u interface should be removed, because it will not be used.

Also, this seems like an ABI issue there, because the C code is normal, and there is no undefined behaviour in this function as far as I can see it. In how many other palces (like function wrapping) this same issue is happening?!!!

Also, can't an alernative be something like tmp32u &= 0xffffffffULL?

@yzewei
Copy link
Contributor Author

yzewei commented Jan 19, 2026

If RAX is used, then the tmp32u interface should be removed, because it will not be used.

Also, this seems like an ABI issue there, because the C code is normal, and there is no undefined behaviour in this function as far as I can see it. In how many other palces (like function wrapping) this same issue is happening?!!!

Also, can't an alernative be something like tmp32u &= 0xffffffffULL?

[BOX64] Warning, CPUID command 80000001 unsupported (ECX=00000000)
4100952|0x100356b8c: Calling memset(0xFFF5F3ED60, 0x0, 0x38, ...) => return 0xFFF5F3ED60
4100952|0x100356a15: Calling strlen("NumPy was built with baseline optimizations:
(X86_V2) but your machine doesn't support:
(%s).") => return 0x5E

This prevents the high-order bits from being cleared to zero.

@yzewei
Copy link
Contributor Author

yzewei commented Jan 19, 2026

It would be best to delete tmp32u!

@ptitSeb
Copy link
Owner

ptitSeb commented Jan 19, 2026

There would be some cleanup to do on each dynarec code also, to avoid copying RAX to 1st parameter, but I guess that can be done later.

@ptitSeb
Copy link
Owner

ptitSeb commented Jan 19, 2026

More simply, I suspect the issue might be because of the dynarec not cleaning up upper 32bits of rax, so maybe try to just add ZEROUP2(xRAX, xRAX) in la64_dynarec_0f.cline 1428 before theCALL_(...)` maybe.

@yzewei
Copy link
Contributor Author

yzewei commented Jan 19, 2026

There would be some cleanup to do on each dynarec code also, to avoid copying RAX to 1st parameter, but I guess that can be done later.

Right, follow-up task.

@yzewei
Copy link
Contributor Author

yzewei commented Jan 19, 2026

More simply, I suspect the issue might be because of the dynarec not cleaning up upper 32bits of rax, so maybe try to just add ZEROUP2(xRAX, xRAX) in la64_dynarec_0f.cline 1428 before theCALL_(...)` maybe.

Let me test it in later.

@yzewei
Copy link
Contributor Author

yzewei commented Jan 20, 2026

More simply, I suspect the issue might be because of the dynarec not cleaning up upper 32bits of rax, so maybe try to just add ZEROUP2(xRAX, xRAX) in la64_dynarec_0f.cline 1428 before theCALL_(...)` maybe.

The issue of clearing high-order bits to zero in Dyname doesn't exist, but I think it should be retained to avoid similar problems.

root@kubernetes-master-1:/home/yzw/python-trans/box64-up/build# git diff ../src/dynarec/la64/dynarec_la64_0f.c
diff --git a/src/dynarec/la64/dynarec_la64_0f.c b/src/dynarec/la64/dynarec_la64_0f.c
index e49a4414..7d897819 100644
--- a/src/dynarec/la64/dynarec_la64_0f.c
+++ b/src/dynarec/la64/dynarec_la64_0f.c
@@ -1425,6 +1425,7 @@ uintptr_t dynarec64_0F(dynarec_la64_t* dyn, uintptr_t addr, uintptr_t ip, int ni
         case 0xA2:
             INST_NAME("CPUID");
             NOTEST(x1);
+            ZEROUP2(xRAX, xRAX);
             CALL_(const_cpuid, -1, 0, xRAX, 0);
             // BX and DX are not synchronized durring the call, so need to force the update
             LD_D(xRDX, xEmu, offsetof(x64emu_t, regs[_DX]));
root@kubernetes-master-1:/home/yzw/python-trans/box64-up/build# BOX64_DYNAREC=1  BOX64_LOG=0 box64 ./../../python-standalone/bin/python3 -c "import numpy;"  
[BOX64] Box64 loongarch64 v0.4.1 8744f024 with Dynarec built on Jan 20 2026 09:26:05
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/yzw/python-trans/python-standalone/lib/python3.11/site-packages/numpy/__init__.py", line 112, in <module>
    from numpy.__config__ import show_config
  File "/home/yzw/python-trans/python-standalone/lib/python3.11/site-packages/numpy/__config__.py", line 4, in <module>
    from numpy._core._multiarray_umath import (
  File "/home/yzw/python-trans/python-standalone/lib/python3.11/site-packages/numpy/_core/__init__.py", line 22, in <module>
    from . import multiarray
  File "/home/yzw/python-trans/python-standalone/lib/python3.11/site-packages/numpy/_core/multiarray.py", line 11, in <module>
    from . import _multiarray_umath, overrides
RuntimeError: NumPy was built with baseline optimizations: 
(X86_V2) but your machine doesn't support:
(X86_V2).

I will write some tests to check if other branches have similar issues.

@yzewei
Copy link
Contributor Author

yzewei commented Jan 20, 2026

There would be some cleanup to do on each dynarec code also, to avoid copying RAX to 1st parameter, but I guess that can be done later.

I almost forgot, you previously suggested removing the RAX copy here. I can do all of this at once to avoid multiple PRs.

@ksco
Copy link
Collaborator

ksco commented Jan 20, 2026

I didn't follow this PR, but please remove LoongArch from the PR title and commit message.

Signed-off-by: Zewei Yang <yangzewei@loongson.cn>
@yzewei yzewei changed the title [CPUID] Fix my_cpuid leaf detection failure on LoongArch (ABI issue) [EMU] Fix ABI issue in my_cpuid leaf detection Jan 20, 2026
@yzewei yzewei marked this pull request as ready for review January 22, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants