fix: display the real details for aliases when requested, even if the alias is an uncompressed instruction#2923
Conversation
… alias is an uncompressed instruction
When did we add this? I need to look at it in more detail. How are alias defined in the ISA? |
It's much older than that, it's probably as old as the noalias flag itself.. since the very beginning in November 2025 or so. The noalias flag was reflecting that before noaliascompressed was introduced.
The short answer is that they aren't, the ISA never defines such a thing as an alias, it defines two things: pseudo instructions and compressed equivalents. 1- Pseudo instructions are basically assembly-time macros that allow you to write 2- compressed equivalents are actual instructions with actual encodings, the CPU decoder is aware of them, but they happen to semantically correspond exactly to a restricted use of an equivalent non-compressed instruction (e.g. the compressed add corresponds to a To my humble intuition, those two things look very much the same from a user perspective. They're both "this instruction is actually the same as this other one", with the meaning of "the same" being defined in two slightly different ways each time. Are there any precedent in other architectures that allow us to go one way or another ? I know for a fact ARM has thumb mode which is their compressed mode, but I don't know if they have their own notion of pseudo instructions. (PS: note that this entire PR is about separating the details from the alias text. That is, we can still go with the decision to NOT consider compressed instructions as aliases, but still also allowing the real details flag to populate their details with the non-compressed equivalent details. This is very convienent for Rizin and any downstream consumers of Capstone, as it allows you to basically ignore all the compressed instructions, after all every single one corresponds to a special case of non-compressed instructions.) |
|
Hi @moste00, can you please give more precise examples of where LLVM returns an instruction that is/isn't an alias as expected? I'm a bit confused about what the desired result should be. $ riscv64-linux-gnu-as -march=rv64gc -al - <<< 'add sp, sp, s0'
1 0000 2291 add sp,sp,s0
$ riscv64-linux-gnu-objdump -d -M no-aliases a.out
0: 9122 c.add sp,s0
$ cstool riscv64 2291
0 22 91 add sp, sp, s0
$ cstool riscv64+noalias 2291
0 22 91 c.add sp, s0I don't see the inconsistency at first... |
I just mean that all compressed instructions aren't understood by LLVM core as aliases, maybe the CLI tools implement this on top of the core (as they should, IMO), but the core itself has a function called There IS an equivalent of Like you noticed, most CLI tools probably intuitively know that the user doesn't care about this pedantic distinction, and quietly just redefine "alias" to mean both things, but LLVM doesn't think that decompressed instructions are aliases, so we will be departing from them there. (There are some consequences if we do this, for example we would have no alias ID for decompressed instructions, alias IDs are only assigned to the "fake" pseudo instructions that LLVM considers as aliases, compressed instructions are real from LLVM's POV, they have a real instruction ID and no alias ID.) |
|
Yea, we both understand that these aliases (pseudoinstructions) are just a programmer's convenience and, in a way, a relief from hard-coded decisions on which architecture will execute this. E.g., you just write If you want to have an alias ID for compressed instructions, then we should have to add a table for it, right? Or even better, to just link them somehow to the existing table of aliases, because there is not really a compressed alias instruction. It's just an alias that is or is not compressed. As u said, from the user perspective, an alias represents a functionality, and there is no care if that functionality took 2 or 4 bytes of memory :) |
|
Also, I didn't reiterate that there is no difference between CLI tools and Capstone because CLI tools show the same string as cstool does |
This is my view, but another view is that we should do EXACLTY what LLVM core do, and LLVM core doesn't see compressed instructions as aliases. Maybe we can give them another flag, for example
Yes but this is its own deviation from LLVM too, we will define a manual table and maintain it with no auto-sync from LLVM. So whatever path you go, you will always have to face that you're going against LLVM convention. |
|
Let's backtrack a bit. I'm confused a lot 😅 cstool -d riscv64 67800000
0 67 80 00 00 ret
ID: 31 (jalr)
Is alias: 1698 (ret) with ALIAS operand set
Groups: jump
cstool -d riscv64 8280
0 82 80 ret
ID: 513 (c_jr)
Is alias: 1698 (ret) with ALIAS operand set
Groups: HasStdExtCOrZca jump
alias ID is ret (1698) for both |
|
Ah, so the problem is that those that are aliased only as compressed instructions, while the real instruction counterpart doesn't have an alias... |
|
@slate5 good point, actually now I'm confused too :D I didn't test ret before, but I tested another instruction ( Anyway, let's wait for @Rot127 to do a final judgement call on this, preferably according to the precedent set by ARM. Then we will see the way forward. |
|
Hehe, sext.w (c.addiw t0,0) works well for me XD I think the only "issue" is when you have an "alias" that, in itself, is nothing but the same mnemonic of the real instruction. And then, it only makes sense to call it an "alias" (i.e., alternative name) if it represents a compressed instruction. For example, So, it kinda makes sense, after all, R in RISC-V means reduced, not simple :) |
ARM has aliases :D There it is easy.
Please don't introduce another table we need to maintain. Except it is easy to generate automatically. The purpose of Auto-Sync is to just use the LLVM code as much as possible. Patching here and there a line in is fine. Or extending our LLVM backends to generate it for us of course.
That case is actually a bug (from our POV, not necessarily for LLVM). It usually means that the LLVM definitions have an alias and a real instruction defined with the same mnemonic. You can search for Personally, I wouldn't want the compressed instructions to be counted as "alias". First of all, because this is what it usually means for all other archs. So we can have some consistency between them. If one implements some tool with Capstone they maybe don't care about the mnemonic. IF the compressed instructions are semantically equivalent to the full version of them, we could say that they are an alias. But since the encoding bytes differ, I would prefer to add an extra So something like that: Compressed and not-compressed
Alias
The topology is something like this: Difference: Bytes: 67800000
Alias ID: ret
Real ID: jalr
Detail: cs_insn.details.is_compressed == false
cs_insn.size == 4
if (get_alias_details)
cs_insn.op_count == 0
else
cs_insn.op_count == 1
Bytes: 8280
Alias ID: ret
Real ID: c_jr
Detail: cs_insn.details.is_compressed == true
cs_insn.size == 2
if (get_alias_details)
cs_insn.op_count == 0
else
cs_insn.op_count == 1
wdyt? |
xD very correct, indeed.
This is reasonable, the thing is, compressed instructions satisfy the second condition exactly. Unless I'm misreading the spec/programmer's manual, it really does seem to say that a compressed equivalent MUST do the same effect as the uncompressed inspiration behind it, that's the intention in the first place, to give a size-shortcut to common idioms.
Very reasonable.
We can,
My original use case remains :( I need to be able to treat compressed instructions as basically their non-compressed equivalents, or else lifting would become very painful and repetitive. So one of 3 things: 1- The 2- There is a seperate flag that does the same thing as (1) but is not 3- There is a seperate operands array in RISC-V other than the usual one, the real details flag operates on the usual one, the other flag operates on the other one. Basically, I'm just circling and circling over the idea that I need to be able to obtain the non-compressed details, and since Rizin is just a serious test-drive of Capstone, probably many other tools depending on Capstone will have the same need. |
Sorry, I lost this context while reading. The idea 2 seems good to me, but I would flip it around. By default Because I think your lifting use case is way more common and should require only one flag instead of two. |
|
@Rot127 One final question: Does this mean we no longer treat
|
|
@Rot127 Also, one more note: It's never the case in LLVM that an alias has 2 parents, each alias in LLVM's alias table maps to exactly 1 parent, and most of those parents are the non-compressed. So this presents another difficulty (if we so choose to hande it, ignoring is always an option). Some instruction that "logically" should be aliases, for example a We could handle this: Uncompress the instruction, then if the uncompression maps to an alias and the user hasn't done alias supression, then do print the alias. This way the More work, and this whole topic is surprisingly fractal in complexity and edge cases. |
Yes, I think this follows from it.
That is a tricky one indeed. Generally the assembly output should be as LLVM does it. Being comparable to it is one of the features we have. How is the uncompression done? Does it cost a lot of runtime? @slate5 Feel free to state your opinion as well btw. |
Your checklist for this pull request
Detailed description
Background:
We depart from LLVM in what we count as aliases. LLVM only counts so-called "Pseudo-Instructions", non-compressed specialized uses of normal instructions. For example, LLVM considers the 4-byte
retas a psuedoinstruction that is just a specialized use of the instructionjalrCapstone expands the meaning of "alias" to also mean the compressed instructions equivalence. For example, Capstone considers
c.addto be an alias of the appropriateaddinstruction, whereas LLVM does NOT considers those 2 instructions to be aliases in the ordinary sense.The problem:
Previously we only populated the real details when an instruction was an alias, but this was checked via
printAliasInstr, which is an LLVM-derieved function that only considers the restricted LLVM-sense of the word "alias". This has an implication: Compressed equivalents don't have the details of the instruction they're equivalent to, even when theCS_OPT_DETAILS_REALis set.This change refactors the real details logic to also include Capstone wider usage of "alias", namely uncompressed instructions.
Test plan
...
Closing issues
...