fn ctz(&mut self, dst: WritableReg, src: Reg, size: OperandSize) -> Result<()> {
if self.flags.has_bmi1() {
self.asm.tzcnt(src, dst, size);
} else {
self.with_scratch::<IntScratch, _>(|masm, scratch| {
// Use the following approach:
// dst = bsf(src) + (is_zero * size.num_bits())
// = bsf(src) + (is_zero << size.log2()).
// BSF outputs the correct value for every value except 0.
// When the value is 0, BSF outputs 0, correct output for ctz is
// the number of bits.
masm.asm.bsf(src, dst, size);
masm.asm.setcc(IntCmpKind::Eq, scratch.writable());
masm.asm
.shift_ir(size.log2(), scratch.writable(), ShiftKind::Shl, size);
masm.asm.add_rr(scratch.inner(), dst, size);
});
}
Ok(())
}
winch/codegen/isa/x64/masm.rs, line 1061+:
The fallback path when
!flags.has_bmi1()assumes BSF outputs 0 on 0 input, which conflicts with Intel manual."If the content of the source operand is 0, the content of the destination operand is undefined."
(https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-2a-manual.pdf -- page 210 of the PDF).
Similar issue applies to
bsr, used in the fallback path ofclz()when!flags.has_lzcnt()