Skip to content

avr: large variable shifts take up to 128 iterations #5136

@niaow

Description

@niaow

When lowering a variable shift, the compiler creates a shift followed by a select.
This normally works fine, but on AVR, the shift is lowered to a loop that uses the shift length as a counter.

For example:

package main

func main() {
	for i := 0; i < 256; i++ {
		println("1 <<", i, "=", shl(uint8(i)))
	}
}

//go:noinline
func shl(sh uint8) uint16 {
	return 1 << sh
}

Is tuned into the LLVM IR:

; Function Attrs: minsize mustprogress nofree noinline norecurse nosync nounwind optsize willreturn memory(none)
define internal fastcc range(i16 0, -32767) i16 @main.shl(i8 %sh) unnamed_addr addrspace(1) #11 !dbg !4130 {
entry:
    #dbg_value(i8 %sh, !4134, !DIExpression(), !4135)
    #dbg_value(i8 %sh, !4134, !DIExpression(), !4136)
  %shift.overflow = icmp ugt i8 %sh, 15, !dbg !4137
  %0 = zext nneg i8 %sh to i16, !dbg !4137
  %1 = shl nuw i16 1, %0, !dbg !4137
  %shift.result = select i1 %shift.overflow, i16 0, i16 %1, !dbg !4137
  ret i16 %shift.result, !dbg !4138
}

Which compiles to:

00000ff4 <main.shl>:
     ff4: 28 2f        	mov	r18, r24
     ff6: 81 e0        	ldi	r24, 0x1
     ff8: 90 e0        	ldi	r25, 0x0
     ffa: 32 2f        	mov	r19, r18
     ffc: 3a 95        	dec	r19
     ffe: 1a f0        	brmi	.+6
    1000: 88 0f        	lsl	r24
    1002: 99 1f        	rol	r25
    1004: fb cf        	rjmp	.-10
    1006: 20 31        	cpi	r18, 0x10
    1008: 10 f0        	brlo	.+4
    100a: 80 e0        	ldi	r24, 0x0
    100c: 90 e0        	ldi	r25, 0x0
    100e: 08 95        	ret

The lsl + rol loop can run up to 128 times (larger shifts will trigger the brmi immediately). I do not think this is an expected performance penalty.

Metadata

Metadata

Assignees

No one assigned

    Labels

    avrAVR (Arduino Uno, etc.)core

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions