This optimization in the memset function results in a 75% reduction in the number of instructions required by the RISC-V CPU to perform a memset operation. The key factor behind this improvement lies in handling memory in multiples of 4 bytes. When the number of bytes to be set is divisible by 4, the algorithm uses word-sized operations, allowing the CPU to set multiple bytes at once, significantly reducing the total number of iterations and instructions. This approach minimizes the overhead involved in checking and setting individual bytes, especially in cases where memory is aligned, resulting in faster and more efficient execution.
The unoptimized memset function performs the following operations:
mv t1, a0: 1 instructionbeqz a2, 2f: 1 instruction (conditional branch)- Inside the loop (when n > 0):
sb a1, 0(t1): 1 instructionadd a2, a2, -1: 1 instructionadd t1, t1, 1: 1 instructionbnez a2, 1b: 1 instruction (branch ifa2 != 0)
- After the loop:
ret: 1 instruction (return)
The total number of instructions executed depends on the value of n, the number of bytes to be set:
-
When
n > 0, the setup instructions (2 instructions) are executed, the loop runs forniterations (4 instructions per iteration), and the return instruction is executed once.
Thus, the total number of instructions is:
f(n) = 3 + 4nforn > 0 -
When
n = 0, only the setup instructions (2 instructions) and the return instruction are executed.
Thus, the total number of instructions is:
f(0) = 3
The number of instructions run by the unoptimized memset function is:
f(n) = 3 + 4n
The optimized memset function can be analyzed in different cases based on the value of the register a2, which stores the number of bytes to be set in memory. Below, we will provide explanations for two cases: when a2 is zero and when a2 is less than 7.
In this case, the value of a2 is zero, meaning no bytes need to be set. The function executes as follows:
- The first branching condition
beqz a2, endis executed, which checks ifa2is zero. Since it is zero, the function immediately jumps to theendlabel. - Only the return instruction
retis executed after the jump.
Hence, the total number of instructions executed is 2, consisting of:
f(0) = 2
In this case, the number of bytes to be set (a2) is less than 7, so the following sequence of instructions is executed:
- The first branching instruction
blt t2, t3, set byte in memoryis evaluated. Sincea2is less than 7, this results in no branching, and the code continues to the next section. This is the first instruction executed. - The temporary registers are then set, which involves 4 instructions under the
set temporary registerslabel. - The branching instruction
blt t2, t3, set byte in memoryis executed again, which branches to theset byte in memorylabel. - The
set byte in memorylabel runs a loop, where 4 instructions are executed for each byte set. This loop runs for each byte, so fornbytes, the total number of instructions is4n. - Finally, the function executes a
retinstruction to return.
Therefore, the total number of instructions executed in this case is:
f(n) = 1 + 4 + 1 + 4n + 1 = 7 + 4n
f(n) = 7 + 4n
We analyze the total number of instructions executed, f(n), for the provided assembly program in the following scenario:
- The starting address is aligned to a 4-byte memory boundary.
- The number of bytes to set is expressed as
4α + β, whereβ < 4.
Summing up all the instructions:
f(n) = 1 + 4 + 5 + 7 + (4α + 1) + 4β + 1
Simplifying:
f(n) = 4α + 4β + 19